eazyware
Engineering·October 28, 2024·11 min read

Vector index tuning: HNSW, IVF, and the parameters that matter

Recall, latency, memory, build time — the four axes that matter. HNSW ef_construction, M, ef_search; IVF nlist and nprobe. What to actually tune.

KR
Kushal R.
Engineering lead

Vector index tuning is one of those topics every RAG team hits a year or two into production, usually when recall starts to matter more or latency starts to hurt. The right parameters for HNSW or IVF change by 10x depending on whether you're optimizing for recall, latency, memory, or build time. This post is the practical tuning guide — what to tune, in what order, and how to iterate toward a working configuration.

HNSW vs IVF
HNSW vs IVF — parameter tradeoffs HNSW — the default for most cases M (16-48): graph degree · ↑ quality, ↑ memory ef_construction (100-500): build accuracy ef_search (32-256): query recall dial fast queries, slow build, high memory best < 100M vectors with memory IVF — at extreme scale nlist (√N): num centroids · rule of thumb nprobe (1-256): query recall dial combine with PQ for massive compression cheap memory, moderate queries best > 100M vectors Tuning process 1. Define target recall@k (often 0.95) and latency budget (often p95 < 50ms) 2. Start with library defaults (HNSW M=32, efc=200; IVF nlist=√N) 3. Bisect the ef_search / nprobe knob until you hit target recall 4. If latency exceeds budget, drop M / nlist; rebuild; retest 5. Re-tune when data shape shifts 20%+ — indexes are not set-and-forget
HNSW is the default for most cases under 100M vectors. IVF combines with PQ for extreme scale. Tuning follows a bisection process on the recall dial.

HNSW — the default

Hierarchical Navigable Small World graphs are the default index in most vector databases (Qdrant, Weaviate, Pinecone, pgvector). They offer fast queries with high recall at moderate memory cost. Three parameters matter: M, ef_construction, and ef_search.

M (default 16-32) controls graph degree. Higher M means better recall but more memory and slower builds. ef_construction (default 100-200) controls how hard the index works during construction to find good neighbors. ef_search (default 64-128) controls how deep the query walks — the primary recall/latency dial you'll tune.

IVF — at extreme scale

Inverted File indexes partition the vector space into clusters; queries search only the most relevant clusters. nlist controls the number of clusters (rule of thumb: sqrt(N) where N is total vectors). nprobe controls how many clusters to search per query — the recall dial.

IVF alone is rarely best; it's almost always combined with PQ (IVF-PQ) for massive compression. IVF-HNSW is another powerful combination: IVF for partitioning, HNSW within each cluster. These combinations are where FAISS shines.

The tuning process

Step 1: define target recall@k and latency budget. Typical targets: recall@10 = 0.95, p95 latency under 50ms. Without these targets, tuning is aimless.

Step 2: start with library defaults. M=32, ef_construction=200 for HNSW. Measure baseline recall and latency on a representative query set.

Step 3: bisect the query-time recall dial (ef_search for HNSW, nprobe for IVF). Binary search to find the smallest value that hits target recall. This is usually 80% of the tuning work.

Step 4: if latency budget is blown, reduce build-time parameters (M for HNSW, nlist for IVF). Rebuild the index. Re-tune the query-time dial. Smaller indexes have inherently lower query latency.

Step 5: if recall is still below target, increase M or ef_construction and rebuild. This is slow (long rebuilds) so leave it for last.

Filtered queries — the hidden cost

Metadata filters (tenant_id, tags, date ranges) interact non-trivially with vector indexes. Post-filtering (search then filter) can miss relevant results. Pre-filtering (filter then search) can be slow if the filtered subset is large. Integrated filtering (most modern DBs support it) is best but has edge cases.

Benchmark your actual queries with actual filters. Unfiltered benchmarks lie about production performance. See hybrid search post for related patterns.

Rebuild cadence and fragmentation

HNSW indexes degrade with heavy deletes — fragmented graphs, residual nodes that skew search. Plan for periodic rebuilds (monthly for high-churn indexes, quarterly for stable ones).

IVF indexes lose recall when data distribution shifts more than 20% from the training set used to compute centroids. Monitor centroid quality; retrain when drift exceeds threshold.

What to monitor in production

Recall@k on a periodic eval set (sampled queries with known-good ground truth). Query latency percentiles (p50, p95, p99). Index size. Rebuild time. Any of these regressing by more than 15% from baseline is an alert worth investigating. See observability post.

Read next
Vector databases in 2026: Pinecone vs Qdrant vs Weaviate vs pgvector
Read next
Embedding models compared: OpenAI vs Cohere vs Jina vs BGE vs Nomic
Read next
Latency budgeting for LLM systems
Tags
vector indexHNSWIVFperformance
/ Next step

Want to talk about this?

We love debating this stuff. 30-minute call, no pitch, just engineering conversation.

~4h
avg response
Q2 '26
next slot
100%
NDA on request