Decision Guide · May 2025 · 6 min read

Build vs. Buy: Open Source LLMs or Commercial APIs?

The question every AI team hits eventually. Here's the honest breakdown — costs, tradeoffs, and a simple framework to make the call.

Urjit Patel
Urjit Patel
Head of Commercial Data Science · S&P Global

A startup founder called me last year. Smart team. They'd been going back and forth for weeks: should we build on LLaMA or just call the OpenAI API? They'd convinced themselves they needed to figure this out before writing a single line of product code.

I've had this same conversation a dozen times in the last year. The answer isn't obvious, and it's different for everyone. But the way you think through it is pretty consistent. Let me share how I approach it.

First — neither answer is wrong. This isn't open source vs. closed source as an ideology. It's a practical question about cost, control, speed, and risk for your specific situation.

Open Source
Self-Host a Model
LLaMA 3, Mistral, Qwen, DeepSeek...
  • Full data privacy — nothing leaves your infra
  • Lower cost at high volume (you pay for compute, not tokens)
  • Customize deeply — fine-tune on your data
  • No rate limits, no vendor dependency
  • You own the infra — GPUs, scaling, uptime
  • Needs an ML team to run well
  • Behind frontier models by 6–12 months
  • Slower to get started
Commercial API
Call a Provider API
OpenAI, Anthropic, Google, Cohere...
  • Best model quality available today
  • Live in minutes, not days
  • No infra to manage
  • Constant improvements for free
  • Cost scales with usage — can get expensive
  • Data leaves your environment
  • Rate limits and availability risks
  • Less control over the model's behavior

What does it actually cost?

This is where most teams get surprised. Here's a rough picture. Prices change often, so treat these as ballpark figures — the relative differences matter more than the exact numbers.

ModelCost per 1M tokensBest forNotes
GPT-4oAPI ~$10–20 Complex reasoning, best quality Input ~$5, output ~$15
Claude 3.5 SonnetAPI ~$9–15 Long context, careful tasks Strong at nuance and instruction-following
Gemini 1.5 FlashAPI ~$0.20–0.50 High volume, simpler tasks Google's cheapest capable option
LLaMA 3 70B (hosted)API ~$0.50–1 Good quality, lower cost Via Together AI, Groq, or Replicate
LLaMA 3 70B (self-hosted)OSS ~$1–5 small scale Privacy-critical workloads Drops to <$0.50 at high utilization
Mistral 7B (self-hosted)OSS ~$0.10–0.30 Simple tasks at scale Cheapest serious option if you have infra
// the_math_reality

At low volume (<1M tokens/day), commercial APIs almost always win on total cost once you factor in engineering time to manage infra. The crossover point is usually around 50–100M tokens/month — that's when self-hosting starts making financial sense, assuming you have the team to run it.

The decision in 4 questions

Stop debating and answer these. Your answer to the first one that applies is usually the right call.

// decision_guide.flow
① Does your data contain anything sensitive that cannot leave your environment?
Yes: Self-host open source. Full stop. No: Continue ↓
② Do you need the absolute best output quality available today?
Yes: Use GPT-4o or Claude — open source isn't there yet. No: Continue ↓
③ Are you processing more than ~100M tokens per month?
Yes: Model the cost. Open source probably wins at this scale. No: Continue ↓
④ Do you have an ML infrastructure team (or plan to build one)?
Yes: Open source is viable — start when cost justifies it. No: Start with an API. Optimize when costs actually hurt.
Pick the API when
Speed and quality matter more than cost
You're prototyping. You need the best model. Your volume is low-to-medium. You don't have ML infra. Data privacy isn't a blocker. This is most startups for the first 12–18 months.
Self-host when
Cost, control, or privacy are non-negotiable
You're processing at serious scale. Data can't leave your environment. You have the engineering team to manage it. You're willing to trade model quality for control. This is most enterprise teams at maturity.

My honest take

Most teams I talk to should just start with the API. Spend your early energy on the product, not on GPU fleet management. The model quality from OpenAI and Anthropic is genuinely hard to match right now, and the faster you're shipping and learning, the better.

When your API bill hits something that actually hurts — and you can see clearly that you're at scale with a stable workload — that's when you revisit. By then you'll know your token patterns, your quality requirements, and whether you actually need fine-tuning. You'll make a much better decision than you would today.

The startup founder I mentioned? They went with the API. Shipped in three weeks. Now six months in, they're re-evaluating with real usage data. That's the right order to do it in.

Still unsure? Run the numbers yourself.

Interactive calculator · editable prices · break-even analysis

Try the calculator →
👍
0 comments
LinkedIn Email
// discussion 0
Sign in to leave a comment
© 2025 Urjit Patel · urjitlabs.com
Share on LinkedIn ← back to writing