Vibe Coding a Harvey Competitor: Fun Demo, Expensive Reality
Weekend-built legal AI tools look good on LinkedIn. Here's why production software is harder than it looks (and when you should build your own anyway).
Every week, I see a new LinkedIn post about how someone “built a Harvey competitor,” “replicated Gavel Exec” or “copied Spellbook” in an afternoon using a custom GPT or a little vibe coding. The demos honestly look pretty slick. The comments go wild with thousands of likes, and a few comments ask for access.
What’s striking though is who is missing from these conversations. The vendors whose products are being “replaced” are rarely part of the discussion or even respond in the thread.
That’s why it’s worth slowing down and why I’m entering the chat.
Here are 5 reasons why you shouldn’t vibe code your own legal tech, and when it actually is a viable and good choice.
TL;DR: Weekend-built “Harvey competitors” look good in LinkedIn demos, but the hard part isn’t getting AI to write text. It’s making it work reliably inside lawyers’ tools like Microsoft Word. Vibe-coded DIY legal tech is good for narrow, internal tools. But for core legal workflows, dependable software wins over those viral demos.
(Note: As I was writing this, Richard Tromans of Artificial Lawyer interviewed Jamie Tso, one of these viral vibe coders, on this very topic. It’s one of the more thoughtful discussions because it’s honest about where internal tools work well and where they start to break down.)
1. If it’s not inside Microsoft Word, you haven’t cracked the code.
Most vibe-coded legal tech tools that get attention live in a chat box or a lightweight editor. As I’ve been explaining to our team at Gavel for 5 years, real lawyers live in Microsoft Word (unfortunately for our engineers!).
Getting something to work well inside Word is not a matter of “adding a plugin.” Why? Because, in order for an agent to actually do work for you, you need true real-time interaction with the document content inside Microsoft Word. This means:
Being able to interact with a Word document and all the features that Word supports. Under the hood, .docx files (what Microsoft Word uses) are packaged Open Office XML files, which describe all of the document structure, revisions and styling. Anything beyond simple text replacement, like inserting clauses in the “right” spot - while preserving numbering, headings, cross-references, tables, tracked changes, comments, and defined terms - gets hairy (honestly, for reasons way beyond my pay grade).
Similarly, numbering in Word is notoriously fragile because it’s driven by styles and numbering definitions that can be reused, overridden, or corrupted across documents. Comments and ranges can be anchored to specific spans that shift as you insert or delete text. Tables, fields, and cross-references can break if you don’t update them correctly. And if your “AI suggestion” round-trips through HTML/Markdown or a simplified editor, you can lose formatting or, worse, produce a document that looks fine until Word starts behaving strangely.
Being aware of the order in which edits need to be applied. Applying revisions and comments in the wrong order can lead to a truly unintelligible document, however smart the AI is. To handle this, Microsoft Word’s APIs provide an asynchronous, batch-based programming model, which adds complexity
Your AI agents need to keep updated on the real state of the document as you keep on editing it. Beyond the initial snapshot, or continuously copy-pasting your updates to your agent, this means building true infrastructure that can keep the agent’s representation in sync with what you currently have on your laptop, securely.
This is why “it wrote a clause” is the easy part. APIs exist, and they’re powerful, but building something stable on top of them is real engineering work. The Artificial Lawyer interview with Jamie alludes to this indirectly when he talks about hitting the limits of no-code tools and wanting to go deeper. Integrations make that job much more complex.
2. The small-document demo breaks on real contracts.
The LinkedIn demo is almost always a short, clean document. Real contracts are long, messy, and inconsistent. They’re full of defined terms, exhibits, tables, numbering quirks, and tracked changes layered on top of each other.
A production system has to handle very large documents without slowing to a crawl or crashing. It has to chunk and manage context so the model stays accurate. It has to insert language in the right place, preserve formatting, and apply tracked changes and comments in a way lawyers trust.
Working with documents at scale was hard long before AI showed up. AI just adds more ways to be wrong.
This is why many of the tools that are vibe-coded work best in tightly scoped contexts. Once you try to generalize them across document types, practice groups, or jurisdictions, the complexity grows nonlinearly.
3. Training data is not optional because market standards are important for redlining, but especially for free-form drafting.
There’s another issue that usually gets glossed over in these demos: training data. Legal AI is only as good as the examples it’s learned from. Market-standard redlines, playbooks, and clause positions don’t emerge automatically without giving the model access to it.
For example, at Gavel, we’ve worked directly with practicing attorneys to build dozens of playbooks based on real deals, real negotiations, and gold‑standard documents and templates.
That’s how you teach a system what “market” actually means in practice. Without that grounding, a tool might sound plausible, but it won’t reliably reflect how lawyers actually negotiate or draft in the real world, especially in areas that lawyer doesn’t have source documents for.
4. Security is not optional, and vibe‑coded tools are a soft target.
Legal work is incredibly sensitive, and vibe‑coded tools are especially exposed because they rely heavily on prompts and untrusted inputs. The Artificial Lawyer interview notes that security tooling is improving, and that’s true in a narrow sense. Better infrastructure can reduce obvious failures like credential leaks, unsafe deployments, or sloppy file handling. But there are larger risks.
Prompt injection is a good example. Contracts and uploaded documents are all treated as text that the model must reason over, which makes prompt injection a structural risk. The UK’s National Cyber Security Centre has warned that it’s not like SQL injection and can’t be patched away with a clever prompt. OWASP lists it as the top risk category for LLM applications, and studies of user‑built GPTs regularly find instruction leakage and unintended exposure of attached files or data sources.
5. Maintenance is the hidden cost no one posts about.
Even if you build something decent, you’re signing up for ongoing work. With the pace of development in the field, models and providers change often and new failure modes appear. Meanwhile, the companies you’re trying to replicate have teams shipping improvements every week, and the gap widens fast.
Tso is explicit about this tradeoff. Once someone forks a repo and builds on it, maintenance is on them. That’s fine for experimentation, but it is a very different proposition when the tool becomes something people rely on day after day.
Moreover, reliability doesn’t come from a clever prompt. It comes from evaluation frameworks, test sets, regression testing, and feedback loops. It also comes from UX that lets users audit changes, see sources, and easily accept or reject suggestions. Some of these early experiments and viral LinkedIn tools didn’t go far, not because the ideas were bad, but because production-grade accuracy takes sustained effort, tooling, and iteration.
At Gavel, as we have agents work in real time within Microsoft Word, we are running into the performance limits of Office itself, which was designed for inputs at human speed, not AI speed. For example, for Gavel Exec, this has led us to build our own APIs on top of the OOXML standard instead of using Microsoft’s.
So when should you build your own?
Build when the problem is narrow, internal, and clearly bounded, and when nothing off-the-shelf quite does what you want it to.
Buy or partner when the workflow is constantly used, mission-critical, user-facing, lives in Word, touches sensitive data, or needs to be dependable across many matters and many users.
The viral demo is fun. The real work is making software boring, dependable, and safe. That’s the part that doesn’t go viral, and the part that’s actually worth paying for.
What other topics would you like my unfiltered perspective on? Reply or leave a comment. I read them all.
About this newsletter
I’m Dorna Moini, a former Big Law attorney who practiced for nearly a decade, last at Sidley Austin. I started Gavel after building automation tools to support parts of my own practice, starting with a consumer app that helped domestic violence survivors navigate the legal system. That work showed me how much of legal work is rules-based and how much better it could be with the right technology.
Today, I’m the founder and CEO of Gavel, an AI and automation platform that law firms and in-house teams use to automate document drafting and contract review through our two products: Gavel Exec and Gavel Workflows.
In this newsletter, I share practical notes on legal tech and how lawyers can use automation and AI to build better, more scalable practices.
Track your time (even if you’re not billing)
As a lawyer-turned-founder, I still track my time, even though I’m not billing clients. I bucket it simply with tags like: marketing, sales, internal meetings, management, and Slack/emails.
At the end of each week, I review where my time actually went. Then I drop it into ChatGPT and ask: What’s over-indexed? What’s under-invested? What should I delegate or stop doing? It’s one of the fastest ways to spot problems, and it matters for founders, law firm owners, and especially in-house teams.
For a free tool that gets the job done, I recommend Toggl.
What I’m reading
AI is now filling prescriptions in Utah. I’ve always been fascinated by how law and medicine are transformed by technology and the parallels between the two.
Discovery data dumps. A news org wins the fight to access 20M ChatGPT logs and wants more.
Is it possible for your GC to be too good? This piece dives into what happens in M&A when you get everything you wanted in a negotiation, but then comes the merger.
Thanks for reading.
Connect with me on LinkedIn.

