Sparse (BM25, TF-IDF) and dense (embedding) retrieval have different strengths. Sparse excels at exact matching — names, codes, rare terms. Dense excels at semantic matching — rephrased queries, synonyms, conceptual similarity. The best RAG systems use both. This post is the practical guide to combining them, where each shines, and how to tune the mix for your workload.

Sparse vs dense tradeoffs

Sparse: exact matches, rare terms, product codes. Dense: semantic similarity, paraphrases, conceptual queries. Hybrid combines strengths.

Sparse retrieval (BM25)

BM25 is the workhorse of classical information retrieval. Ranks documents by term frequency and inverse document frequency. Well-understood, fast, deterministic, explainable.

Strengths: exact term matching. Product codes (SKU-4837), names (Priya Raman), specific technical terms, numeric identifiers. Dense embeddings often miss these because the tokens aren't semantically meaningful — BM25 treats them as the specific tokens they are.

Weaknesses: no understanding. 'Car' and 'automobile' are unrelated to BM25. Any query using different terminology from documents gets zero signal. Synonyms, paraphrases, and conceptual queries suffer.

Dense retrieval (embeddings)

Neural embedding models produce vector representations. Similar meanings map to similar vectors. Retrieval is nearest-neighbor search in embedding space.

Strengths: semantic similarity. 'Car repair' retrieves documents mentioning 'automobile maintenance.' Paraphrases work. Conceptual queries find topically-related docs.

Weaknesses: exact match is unreliable. A document mentioning 'SKU-4837' once among 2000 words might not be retrievable with 'SKU-4837' as query because the embedding averages over the whole doc. Proper nouns and rare terms suffer.

Hybrid retrieval

Run both, merge results. Reciprocal Rank Fusion (RRF) is the standard merge technique — combines rankings from multiple retrievers with good theoretical properties. See hybrid search post.

Most production RAG systems use hybrid. Quality improvement over either alone is 10-25% typical. Cost: two retrievals instead of one, but both are fast and parallelizable.

Tuning the mix

Weight the two retrievers. Pure hybrid gives them equal weight; domain-specific tuning can improve quality. For technical documentation with many product codes, weight sparse higher. For conceptual Q&A, weight dense higher.

Query-adaptive weighting. A classifier or LLM decides per-query which retriever to emphasize. 'Tell me about the warranty' is conceptual → weight dense. 'Find doc with serial 4837-AB' is exact → weight sparse.

Rank fusion variants beyond RRF. CombSUM, CombMNZ, and learned fusion all have niche advantages. RRF is a strong default; don't optimize fusion method until you've tuned more impactful things first.

Learned sparse retrieval

SPLADE and similar models produce sparse representations learned by neural networks. Each document/query is represented as a sparse vector over the vocabulary with learned weights.

Advantages: combines neural-network-learned semantics with sparse retrieval infrastructure (Elasticsearch, fast inverted indexes). Can use BM25 infrastructure with learned weights.

Still worth comparing against hybrid BM25 + dense. SPLADE shines in some domains and not others. Benchmark on your specific task.

Infrastructure patterns

Unified store: Elasticsearch/OpenSearch with vector support can serve both. Single infrastructure, single index, one query that does both. Popular at mid-scale.

Separate stores: Elasticsearch for sparse, dedicated vector DB for dense. More infrastructure complexity; better performance at scale; specialized features per index.

Meilisearch, Typesense, Weaviate all support hybrid out of the box. See vector databases post.

Common pitfalls

Assuming dense is always better. False. For exact-match heavy workloads, pure BM25 outperforms pure dense. The modernity of embeddings doesn't automatically beat the specificity of term matching.

Skipping hybrid. Teams deploy dense-only because it feels more 'AI.' Hybrid is almost always better and costs little additional complexity.

Ignoring BM25 tuning. The k1 and b parameters in BM25 affect retrieval quality. Defaults work for most cases; domain-specific tuning can extract additional quality.

Sparse vs dense retrieval: SPLADE, BM25, and dense embeddings

Sparse retrieval (BM25)

Dense retrieval (embeddings)

Hybrid retrieval

Tuning the mix

Learned sparse retrieval

Infrastructure patterns

Common pitfalls

Continue the thread.

Hybrid search: why pure vector search isn't enough

Embedding models compared: OpenAI vs Cohere vs Jina vs BGE vs Nomic

Six RAG patterns that actually work in production

Want to talk about this?