eazyware
Engineering·September 12, 2025·10 min read

Hybrid search: why pure vector search isn't enough

Semantic search misses exact matches. Keyword search misses intent. Hybrid beats both — when you tune it right.

KR
Kushal R.
Engineering lead

Vector search is not enough. That's the one-sentence version. Pure semantic search misses exact matches in ways that users notice constantly — product codes, legal citations, function names, customer IDs. Keyword search misses intent. Hybrid beats both, reliably, when you tune it right. This post is how we tune hybrid search in production systems, with the specific numbers that make the tradeoffs concrete.

BM25 + vector union
Hybrid search: BM25 + vector via reciprocal rank fusion BM25 exact matches product codes function names citations Vector semantic matches paraphrases concepts intent RRF merged results → reranker average +20% precision@5 over pure vector across our deployments
Reciprocal rank fusion merges ranked lists from both retrievers. Documents ranked high in either get surfaced; documents high in both dominate.

Why pure vector search fails

Embeddings compress text into dense vectors that capture semantic meaning. They're excellent at finding passages that mean the same thing even in different words. They're bad at finding exact lexical matches — which is paradoxical until you see the mechanism. Embeddings project everything into a continuous space; exact tokens become fuzzy neighbors of near-by tokens. 'E-5071' and 'E-5072' embed to very similar vectors, and the retrieval can't reliably distinguish them.

Real cases where this hurts:

  • Product catalogs: specific SKUs or model numbers.
  • Legal and regulatory: citation codes, statute references.
  • Technical docs: function names, API method names, error codes.
  • Customer support: ticket IDs, order numbers.
  • Code bases: identifiers, specific syntax patterns.

What hybrid search is

Hybrid search runs two searches in parallel — a dense vector search and a sparse keyword search (BM25 is the standard) — and combines the results using a fusion algorithm. The union of both retrievals catches both exact and semantic matches. Reciprocal Rank Fusion (RRF) is the most common combination method: each document gets scored by its rank position in each list, and the scores sum.

RRF formula (simplified): for each document, score = sum over retrievers of 1/(k + rank_in_retriever). k is typically 60. The document's final score is the sum. Sort by final score, take top N. This weights top-ranked documents in either retriever higher than middle-ranked documents in both.

Tuning hybrid search

The parameters that matter:

  • Weight of each retriever. Start 50/50, adjust based on eval results. Typically BM25 does better on corpora with meaningful terminology; vectors do better on natural-language corpora.
  • k in RRF. Default 60 is usually fine; smaller k weights top hits more aggressively.
  • Number of candidates from each retriever. Start 50 from each; the union is 100, then rerank down to 5-10. More is slower but rarely higher quality past 50.
  • Reranking model on top. Reranking after fusion almost always helps — see our RAG patterns post.

Implementations

Good options in 2026:

  • Qdrant: native hybrid support, fast, self-hostable. Our default for new deployments.
  • Weaviate: strong hybrid implementation with BM25 built in.
  • Elasticsearch 8+: good keyword search plus k-NN on top. Works well if you already run Elasticsearch.
  • OpenSearch: similar to Elasticsearch, open source.
  • Pinecone: recently added hybrid support; solid if you already use it.

Benchmarks from our deployments

Across six client deployments, hybrid search vs pure vector search (precision@5 on held-out evals):

  • Technical documentation corpus: +28% precision with hybrid.
  • Legal contracts corpus: +31% precision.
  • Customer support KB: +18% precision.
  • Product catalog: +41% precision (lots of SKUs, BM25 catches them).
  • Natural-language research corpus: +6% precision (hybrid helps less when exact matches are rare).
  • Code documentation: +22% precision.

The takeaway: hybrid wins always in our test sets, but wins most when exact-match signal is meaningful. Plan tuning effort accordingly.

Downsides of hybrid

  • Two retrievals means ~2x retrieval cost (usually still small vs LLM cost).
  • Some latency added (100-300ms), though BM25 is fast.
  • Needs BM25 index alongside vector index, which adds operational complexity.
  • Parameter tuning adds a learning curve.

None of these are show-stoppers, but all are real. Budget 2-3 days for initial tuning and 1 day of ongoing maintenance per quarter.

Closing

Hybrid search is almost never wrong for production RAG. If you're shipping a new RAG system, default to hybrid from day one — retrofitting it later is painful. If you have an existing pure-vector system showing quality issues on exact matches, adding BM25 and fusion is typically a 1-2 week effort with 20-30 point precision gains. Pair with a reranker for the full stack.

Read next
Six RAG patterns that actually work in production
Read next
Chunking strategies: the unglamorous key to RAG quality
Read next
Why evaluation infrastructure matters more than prompts
Tags
hybrid searchBM25vector searchreranking
/ Next step

Want to talk about this?

We love debating this stuff. 30-minute call, no pitch, just engineering conversation.

~4h
avg response
Q2 '26
next slot
100%
NDA on request