The open-vs-closed debate has been loud for three years and mostly unhelpful. Ideological takes on both sides. Vendor marketing dressed as analysis. Benchmark numbers without context. What operators actually need is a short list of decision criteria — three or four dimensions where the honest answer for your use case points one way or the other. This post is that list, stripped of the noise.
The five dimensions that actually decide
Privacy and data residency
If your data cannot leave your network boundary — regulatory, contractual, or policy — then closed-source APIs from OpenAI, Google, or most Anthropic tiers are out. Azure OpenAI fixes some of this for US customers; Vertex AI for Google Cloud customers. For true on-premise or air-gapped, open-weights is the only path.
This dimension is binary. Either the data can leave or it can't. If it can't, the decision is made; other dimensions don't matter.
Inference volume
Below 10M tokens/day: closed APIs are almost always cheaper in total cost of ownership. The per-token cost difference is small; the ops overhead of self-hosting dominates.
Above 100M tokens/day sustained: self-hosted open models start winning clearly. Economics from our client deployments: Llama 3.3 70B on reserved GPUs comes out 40-60% cheaper than equivalent-quality closed API at this volume.
Between 10M and 100M: it's a judgment call depending on other factors. See our GPU hosting post.
Quality bar
For most production tasks — classification, extraction, summarization, standard RAG — the quality gap between Llama 3.3 70B and GPT-4o is within the margin of per-task tuning. Either works. Pick on other criteria.
For frontier tasks — complex reasoning, advanced agent orchestration, hard coding, multi-step planning — closed models still lead. If your product genuinely needs frontier capability (not just 'it would be nice to have'), closed is the right choice today.
Ask honestly: does our use case need frontier, or does it need reliable-enough? Most products need the latter, and open models serve that just fine.
Customization needs
Heavy fine-tuning, domain adaptation, custom weights: open is better. You can fully fine-tune on proprietary data, distill into smaller models, and maintain full control of the artifact. Most closed providers offer some tuning but with constraints on export and reuse.
Prompt-only customization: closed is fine. Both paths support instruction-following prompts; no structural reason to prefer open here.
Operational capacity
Do you have a team that can run GPU inference reliably? Not 'has experimented with vLLM' — actually has 24/7 on-call for inference infrastructure. If yes, open is viable. If no, closed APIs are the right answer until you build that capability.
This is where most open-source migrations fall apart. Companies decide to self-host for cost reasons, underestimate ops burden, burn six months of engineering time on infrastructure, and end up with a system that works but cost more in total than staying on a closed API.
The mistakes both sides make
Closed-model advocates undercount the long-term cost: per-token pricing feels small until volume scales. A company spending $5M/year on OpenAI is paying for someone else's margin and can't easily escape — vendor lock-in is real.
Open-model advocates undercount the ops cost: running your own inference infrastructure is genuine work. The $ per token is cheaper on paper and equivalent or more expensive in total when you count infra engineering, on-call, hardware purchases, model-update retesting, etc.
Both sides underweight the optionality question. The right answer today is not the right answer forever. Design systems with enough abstraction (model-provider abstraction layer, standardized eval suite) that you can switch in a quarter when the math changes.
Our default recommendation
Start closed (API) for speed. Build your eval suite and abstraction layer early. Revisit the decision every 6-12 months — if inference volume grows, if privacy needs change, if open-model quality catches up to your use case, migrate. Most of our clients on closed APIs in 2024 are still on closed APIs in 2026; a handful have migrated specific workloads to open. That's the healthy pattern.