Modern search is AI-powered — hybrid retrieval, neural reranking, LLM answer synthesis. The traditional keyword search tower has been augmented (not replaced) with semantic understanding. This post is the architecture patterns we use for production search systems in 2026 and the specific techniques that drive quality.
Query understanding
Intent classification. Is the query navigational (find specific thing), informational (learn about topic), or transactional (do something)? Drives ranking strategy.
Entity extraction. Named entities (people, places, products) for targeted search. Disambiguation important.
Query rewriting. Expand acronyms, fix typos, add synonyms. AI-powered query rewriting improves recall.
Personalization. User context, history, preferences. Delicate balance between relevance and filter bubble.
Retrieval
Lexical (BM25). Keyword-based matching. Fast, interpretable. Good recall on term overlap.
Vector retrieval. Embedding-based. Semantic matching beyond keyword overlap. Good for paraphrased queries.
Hybrid fusion. Both approaches combined; results merged via RRF (Reciprocal Rank Fusion) or similar. Outperforms either alone in most benchmarks.
Tunable balance. RRF weights adjustable per query type. Navigational favors lexical; conversational favors vector.
Metadata filters. Date ranges, content types, permissions. Applied in retrieval or post-retrieval depending on system.
Reranking
Cross-encoder rerank. Deep model considers query and each candidate document. Better precision than retrieval alone.
LLM rerank. LLM scores candidates. Expensive but powerful; reserved for top-K where K is modest.
Learning-to-rank. Models trained on click-throughs, explicit ratings. Produces ranking specific to your data and users.
Latency-quality tradeoff. Reranking adds latency; don't rerank too many candidates. Typical: retrieve 100, rerank top 20.
Result presentation
Diversification. Avoid near-duplicate results. MMR (maximum marginal relevance) or similar diversification technique.
Answer synthesis. For informational queries, LLM synthesizes answer from top results. Perplexity-style.
Citations. Answers must cite sources. Critical for trust, fact-checking.
Faceting and navigation. Filters, facets, categories. Helps users refine without typing new queries.
Quality patterns
Hybrid retrieval beats either pure approach. BM25 alone misses semantic matches; vector alone misses exact matches. Hybrid wins.
Cross-encoder rerank adds 10-20 pts on MRR (mean reciprocal rank) at modest latency cost.
LLM rerank for hardest cases. Most expensive; use when quality matters enormously (enterprise search, regulated domains).
Learning-to-rank from click data. Valuable where data available. Products with high user engagement benefit most.
Evaluation
Offline eval. Labeled datasets; NDCG, MRR, recall@K. Benchmarking changes before shipping.
Online eval. A/B testing changes with real users. Click-through rate, session success, satisfaction.
User feedback. Thumbs up/down on results. Explicit feedback loops.
Human relevance judging. Periodic sampling; expert graders. Ground truth for offline eval.
Tools and infrastructure
Elasticsearch, OpenSearch, Solr. Mature lexical search. Often host vector search alongside.
Pinecone, Weaviate, Qdrant, Chroma. Dedicated vector databases. Purpose-built for semantic search.
Sentence-transformers, OpenAI embeddings, Cohere. Embedding models for vector search.
Cohere Rerank, Voyage AI. Specialized reranking services.
Enterprise search
Document permissions. Search must respect access control. Complex in large enterprises.
Source diversity. SharePoint, Google Drive, Slack, wikis, databases. Unified search across sources.
Glean, Hebbia, Elastic, Microsoft — active space. Enterprise search a category itself.
See RAG architectures post for related patterns.