Picking an embedding model in 2026 is harder than it should be. The leaderboards shuffle monthly. The vendors shout different benchmark numbers. The self-hosted options are better than they were a year ago but still require ops work. We rebuild our embedding-selection grid every six months across our client deployments; this is the current state.
The contenders
OpenAI text-embedding-3-large and 3-small
Still the default for most production RAG. 3-small costs $0.02/M tokens and performs within 2-3 points of 3-large on most retrieval tasks. 3-large costs $0.13/M and edges ahead on longer documents and cross-lingual retrieval. API reliability is excellent; batch endpoints help bulk indexing. The only real downside: no self-hosted option and no way to fine-tune.
Cohere embed-v3
Cohere's embedding models pair unusually well with their Rerank model — you can build a tuned retrieve-then-rerank stack without leaving one vendor. They publish separate `search_query` and `search_document` variants which materially help when documents and queries have different surface forms. In our side-by-side testing on legal and customer-support corpora, Cohere v3 + Rerank 3 has consistently outperformed OpenAI + any reranker. Costs about $0.10/M.
Jina Embeddings v3
A strong middle option: open weights, 8K context, priced at $0.02/M via their API or free if you host. We use Jina in pipelines where on-premise is required but engineering capacity to run open-weight inference is limited — Jina's managed API is genuinely decent.
BGE-M3 and BGE-large
The strongest fully-open-weights option in 2026. BGE-M3 handles multilingual retrieval and long-context; BGE-large-en is our default when English-only and self-hosted is required. Requires a GPU for anything beyond small indexes, which pushes total cost of ownership up. See our GPU hosting post for when that math works.
Nomic Embed v2
The surprise entrant of 2025. Open weights, surprisingly good retrieval quality, designed to run efficiently on CPU for small deployments. For internal-tool RAG at sub-million-document scale, Nomic has become our default because the ops overhead collapses to nothing.
How to actually decide
Leaderboard scores are a floor filter. Once a model is on the frontier, the decision comes down to task-specific retest and operational constraints. Our standard process: pick three candidates that cover the Pareto curve, build a retrieval eval set of 50-100 real queries with known-relevant documents, measure precision@5 and recall@10 for each model, weigh against cost and ops burden. This takes about a day and is worth every hour.
Common mistake: assuming the best MTEB score translates to your domain. We've watched this fail repeatedly. Legal, medical, technical, and highly domain-specific corpora regularly reorder the ranking. A 2-point MTEB gap often disappears — or inverts — on domain-specific retrieval.
What we recommend
Default: OpenAI text-embedding-3-small for any project <10M documents where the data can leave your network. Fast to integrate, reliable, cheap enough that you won't care about the bill.
If you're doing serious retrieval work: Cohere embed-v3 + Cohere Rerank 3. The quality lift is real, especially on ambiguous queries.
If you need self-hosted (privacy, data residency, long-term cost control): BGE-M3 for multilingual or long-context needs, Nomic Embed v2 for simpler English-only. Pair either with a Cohere-style reranker or your own cross-encoder for the quality to hold up.
Don't over-optimize the embedding model before you've nailed chunking and hybrid search. Those two steps typically move retrieval quality 2-3x more than embedding choice within the Pareto frontier.