eazyware
Engineering·July 15, 2024·10 min read

Query rewriting: the retrieval upgrade most teams skip

User queries are often under-specified. Rewriting expands, clarifies, and decomposes queries before retrieval. Simple technique, big impact on quality.

KR
Kushal R.
Engineering lead

Query rewriting is one of the cheapest and most effective RAG improvements available. Users write questions that don't match how documents are written. An LLM rewrite pass bridges that gap, often improving retrieval quality by 10-30% with minimal infrastructure change. This post covers the specific rewriting techniques we deploy and how to pick among them.

Rewriting techniques
Query rewriting patterns Expansion "pricing" → "pricing plans cost" "tiers fees subscription" Clarification "when do i pay?" → "billing cycle timing" "payment due dates" Decomposition "compare X and Y" → "what is X" "what is Y" Impact on retrieval quality · Vocabulary mismatch: users use different words than documents — rewriting bridges · Under-specification: short queries (< 5 words) retrieve poorly; expansion helps · Compound queries: single retrieval cannot answer well; decomposition breaks them down · Typical uplift: 10-30% on recall@k for production RAG systems
Expansion, decomposition, step-back, hypothetical document. Each serves different query types; router picks appropriate rewrite.

Why rewriting helps

User queries are often short, ambiguous, colloquial. 'refund policy' is a two-word question; the refund policy document is a 2000-word corporate policy. Semantic similarity between them is not as high as you'd hope.

Rewriting transforms queries into forms that match document language. 'What is our refund policy for enterprise customers?' becomes a richer embedding target. Retrieval precision improves.

Bonus: rewritten queries are more stable. 'refund' and 'refunds' and 'refunding' all rewrite to the same expanded form; retrieval becomes more consistent.

Query expansion

LLM rewrites the query to include synonyms and related terms. 'refund policy' becomes 'refund policy, return policy, money-back guarantee, cancellation terms.' Multi-query retrieval: run each variant, merge results, deduplicate.

Improves recall. Useful when user queries are terse. Cheap: one LLM call upfront, then multiple retrievals (often faster than a single retrieval in a bigger index).

Query decomposition

Complex queries broken into simpler sub-queries. 'Tell me about Apple's AI strategy vs Microsoft's' becomes ['Apple AI strategy 2024', 'Microsoft AI strategy 2024']. Retrieve separately, combine in context.

Critical for multi-hop questions. See multi-hop retrieval post. Decomposition is the first step in most multi-hop pipelines.

Step-back prompting

LLM rewrites the specific query to a higher-level question. 'Why did my react hook fire twice?' becomes 'How do React hooks work?' Retrieval on the general question returns foundational context; specific answer emerges from LLM reasoning over that context.

Helps when the specific question has too narrow a semantic footprint to retrieve well. Also helps when the underlying principle is in docs but the specific scenario isn't.

HyDE — hypothetical document embeddings

LLM generates a hypothetical answer document based on the query. Embed the hypothetical answer (not the query). Retrieve based on the hypothetical answer's embedding. See HyDE post for deeper dive.

Powerful because documents match documents better than they match questions. The hypothetical answer is closer in embedding space to the real answer than the raw question.

When to use which

Start with expansion — cheapest, broadly helpful. Add decomposition if you have identifiable multi-part questions. Add step-back for domain-knowledge queries where users ask specifics but docs are general. HyDE is most useful when user queries are very different in style from documents.

Query router: LLM classifier picks the rewrite strategy. Small model, low cost. Questionable benefit vs always applying expansion — start simple.

Cost and latency

Rewriting adds one LLM call per query. On a slow model, this can dominate end-to-end latency. Use fast models (small Claude, GPT-4o-mini, Gemini Flash) for the rewrite — rewriting is a simple task that doesn't need frontier capability.

Cost of a rewrite is typically 100-500 input tokens, 50-200 output tokens. On a fast model, roughly $0.0001 per rewrite. Negligible compared to the final answer generation.

Pitfalls

Over-expansion. 'Refund' expanded to include 20 variants dilutes the signal. Keep expansion to 3-5 terms. Prompt carefully.

Wrong decomposition. Complex question decomposed incorrectly produces wrong sub-queries and misleading retrievals. Eval sets should catch this.

Caching. Rewrites are cacheable — same query gets same rewrite. Exact match or semantic cache on query input; response is the rewrite. See caching patterns post.

Measuring impact

A/B test: retrieval quality (NDCG, recall@k) with rewriting vs without. Most clients see 15-25% improvement. Downstream answer quality often improves more than retrieval quality alone would suggest — the LLM gets better context to work with.

Read next
Multi-hop retrieval: questions that span documents
Read next
HyDE and query expansion: hypothetical documents for retrieval
Read next
Six RAG patterns that actually work in production
Tags
query rewritingretrievalRAG
/ Next step

Want to talk about this?

We love debating this stuff. 30-minute call, no pitch, just engineering conversation.

~4h
avg response
Q2 '26
next slot
100%
NDA on request