Legal was one of the earliest industries with serious AI pilots and one of the latest with serious production deployments. The gap is about review — legal teams are professionally skeptical, correctly so, and any AI system that outputs claims without a verifiable trail to authority gets quietly shelved. This post is what works in real legal deployments, from contract review products to internal legal-ops tools.

Review-safe architecture

Every claim cites a span. Every suggestion links to authority. No span = no claim. Lawyer reviews every edit, with audit trail per action.

The patterns that pass review

Clause extraction with span attribution

Given a contract, extract specific clauses (termination, indemnity, governing law, payment terms, confidentiality). The non-negotiable design rule: every extracted claim is linked to the exact character span in the original document. The UI lets the reviewer click through and see the source text. No hallucinated summaries floating disconnected from the doc — every claim is citable to origin.

Risk flagging against a playbook

Firms have internal positions on specific clause structures. A 'low-risk termination clause' at one firm is a specific wording; deviations need review. The AI pattern: compare extracted clauses against the playbook, flag deviations, suggest standard language, link to the approved precedent in the playbook library. This is where AI saves the most real time — the 40-page NDA review that used to take a senior associate four hours becomes a 45-minute review of flagged sections.

Due-diligence document set review

M&A due diligence involves hundreds of documents. AI triage ranks documents by likely relevance to specific issue lists (change-of-control, IP assignments, material contracts, litigation). The associate reads the top 20% manually and spot-checks the rest. Time reduction: 60-70% in the workflows we've deployed.

Research synthesis with citations

Legal research with RAG over case law and statutes. Critical design: every answer cites specific cases with paragraph-level references. If the AI can't cite, it doesn't claim. A small number of senior lawyers will still want to read the primary sources; most junior lawyers will get the answer 5x faster for the same quality.

What fails review

Summarization without spans: '10-page memo on the key terms' without ability to drill into which clause supports which claim. Reviewers can't verify; they re-read the whole thing anyway. Legal judgment suggestions: 'this clause is unreasonable' without framing as comparison to playbook. Lawyers reserve judgment calls; AI's role is to flag, not to opine. General legal advice: 'you should consider suing.' Malpractice risk, obvious liability.

The unifying principle: AI suggests, lawyer decides. The moment the UI reads like the AI is exercising judgment rather than gathering facts, legal teams lose trust. Keep the output grounded in observations and linked to authority.

The ops details that matter

Client confidentiality: contracts often contain privileged information. Deploy with data residency guarantees, no training on client data, audit logs. If you're hosting, BAA-equivalent agreements are often required by firm policy — not just for healthcare-adjacent work but for any serious client matter.

PII and financial-data redaction: see our PII redaction post. Legal documents regularly contain SSNs, account numbers, and personal identifiers. Strip on the way in, restore in the user-facing response only when authorized.

Evaluation against lawyer judgment: your eval set should contain examples where senior lawyers have scored the AI's output. Not just 'is the extraction correct' but 'is the risk flag calibrated to what a senior would flag.' This is the metric that matters for product quality.

AI for legal teams: patterns that pass review

The patterns that pass review

Clause extraction with span attribution

Risk flagging against a playbook

Due-diligence document set review

Research synthesis with citations

What fails review

The ops details that matter

Continue the thread.

Six RAG patterns that actually work in production

Chunking strategies: the unglamorous key to RAG quality

PII redaction patterns for LLM pipelines

Want to talk about this?