HyDE — Hypothetical Document Embeddings — is one of the most effective RAG techniques published in recent years. Simple idea: instead of embedding the user's question, have an LLM write a hypothetical answer, then embed that. Documents match documents better than questions match documents. The technique often improves retrieval quality by 10-20% with minor engineering cost. This post is the mechanics, the variants, and when HyDE is the right tool.

HyDE mechanics

Traditional: embed(query) to vector DB. HyDE: LLM generates hypothetical answer; embed(hypothetical) to vector DB. Better alignment with document embedding space.

The core idea

Embedding spaces are trained on similarity between similar texts. A user question and a document are semantically similar but stylistically very different. A question has interrogative structure; a document has declarative structure. Same topic, different linguistic form.

HyDE leverages this. LLM generates a plausible answer document based only on the query. This hypothetical answer has the declarative form of a real document. Embedding it lands closer to real documents in embedding space.

Result: retrieval brings back more relevant real documents. The hypothetical answer is often wrong in details (LLM hallucinates without context), but that's fine — we don't show it to the user. We only use its embedding to retrieve real, correct documents.

Implementation

Simple. LLM call with a prompt like: 'Given the question, write a short paragraph that would be a reasonable answer. Don't worry about perfect accuracy — give your best guess.' Use a fast small model.

Embed the output. Use the embedding for vector search. Rest of the retrieval pipeline unchanged.

Latency: adds 100-300ms from the LLM call. Cost: $0.0001 per query typically. Very small price for the retrieval quality improvement.

HyDE variants

Multi-HyDE. Generate several hypothetical answers with different angles or phrasings; embed each; query with each; merge results. Recall increases at the cost of more LLM calls. Useful for questions with multiple valid angles.

HyDE + expansion. Generate one hypothetical answer; embed it; also run expansion-based retrieval on the original query. Merge results. Combines strengths of both techniques.

HyDE with structure. For domains where documents have specific structure (code, legal filings, medical records), prompt the LLM to match that structure in the hypothetical. The embedding lands even closer to real docs.

Conditional HyDE. Use HyDE only for queries the router thinks will benefit. Skip for queries the system can already handle well. Saves LLM calls.

When HyDE helps most

Question-to-document semantic gap is large. Short user questions vs long technical documents is the classic case. HyDE bridges the gap.

Domain-specific vocabulary. If documents use domain jargon users don't (medical codes, legal terminology), the LLM's hypothetical often includes the jargon, improving retrieval.

Terse query patterns. Users who enter one or two-word queries benefit most from HyDE because the original embedding has very little signal to work with.

When HyDE doesn't help

When the query is already document-like. 'Refund policy for enterprise customers' matches documents well; HyDE adds latency without improvement.

When retrieval quality is already high. If recall@5 is above 0.95, HyDE has little room to help; optimize elsewhere.

When users prefer deterministic retrieval. The hypothetical answer introduces LLM variance into retrieval. For use cases requiring reproducibility (legal, compliance), HyDE can complicate audit trails.

Pitfalls

Caching the hypothetical. Same query should produce same hypothetical (or at least same embedding). Cache the hypothetical by normalized query. See caching patterns post.

Relying on correctness. Never show the hypothetical to users. It's LLM hallucination by design. It's only an embedding-space landmark.

Eval rigor. Measure with and without HyDE on your eval set before rolling out broadly. Most teams see improvement; some see regression in specific domains. Don't assume. See eval post.

Query rewriting in general — HyDE is one member of the rewriting family. See query rewriting post. Use together: rewrite query, then HyDE, then retrieve. Compounds benefits.

HyDE and query expansion: hypothetical documents for retrieval

The core idea

Implementation

HyDE variants

When HyDE helps most

When HyDE doesn't help

Pitfalls

Continue the thread.

Query rewriting: the retrieval upgrade most teams skip

Six RAG patterns that actually work in production

Embedding models compared: OpenAI vs Cohere vs Jina vs BGE vs Nomic

Want to talk about this?

HyDE and query expansion: hypothetical documents for retrieval

The core idea

Implementation

HyDE variants

When HyDE helps most

When HyDE doesn't help

Pitfalls

Related patterns

Continue the thread.

Query rewriting: the retrieval upgrade most teams skip

Six RAG patterns that actually work in production

Embedding models compared: OpenAI vs Cohere vs Jina vs BGE vs Nomic

Want to talk about this?