Insights · AI
Choosing an LLM for real work: Claude, GPT, and open models
“Which AI should we use?” is one of the most common questions we get, and people usually expect a single name in response. They’re often surprised that the honest answer is “it depends on the job, and you should be willing to change your mind.” The model landscape (Claude, GPT, and Gemini from the big labs; Llama, Mistral, and others in open source) moves fast, and picking by brand loyalty is how you end up locked into the wrong tool. Here’s how we choose.
There is no single best model
The instinct is to find the best model and standardize on it. But “best” isn’t a property of a model; it’s a property of a model on a specific task under specific constraints. One model might write the most natural prose, another might be the strongest at following complex instructions exactly, another might be cheapest at the volume you need, another might be the only one you can legally run on-premises with sensitive data. There’s no winner of that race because they’re not all running the same race.
So the question is never “which model is best?” It’s “which model is best for this task, at this budget, under these privacy constraints?” That reframing does most of the work.
The dimensions that decide it
When we pick a model for a project, we’re weighing a handful of practical factors:
- Capability on the actual task. Not benchmark scores, the real task. We test candidate models on representative examples of the work and look at the output. Some tasks need the most capable (and expensive) frontier model; many are handled well by a cheaper, faster one.
- Cost at your volume. A model that costs a little more per request is irrelevant at ten requests a day and decisive at ten thousand. The right choice flips with scale.
- Speed. For something a user waits on, latency matters as much as quality. For an overnight batch job, it barely matters at all.
- Privacy and control. If the data is sensitive or regulated, the deciding factor might be whether you can run the model yourself, on infrastructure you control. That’s where open models like Llama and Mistral earn their place, not because they’re “better,” but because they can run where the frontier APIs can’t.
- Reliability and instruction-following. For anything wired into a workflow, a model that follows structured instructions consistently is worth more than one that’s occasionally more eloquent but unpredictable.
No single model wins on every axis. The choice is a fit, not a ranking.
Why brand loyalty is a liability
Teams that pick a model based on which lab they trust, or which name they’ve heard most, tend to make two mistakes: they overpay (using a frontier model for tasks a cheaper one handles fine) and they get stuck (architecting everything around one provider, so switching later is painful).
The better posture is provider-agnostic by design. Build so the model is a component you can swap, not a foundation you’ve poured. Then the question “which model?” becomes a low-stakes, revisitable decision instead of a one-way door, which matters enormously in a field that changes month to month.
The part nobody mentions: the model is the easy bit
Here’s the thing that gets lost in model-comparison debates: which model you pick is usually not the hard part of an AI project. The hard part is everything around it: getting the right context to the model, handling the cases where it gets things wrong, wiring it reliably into your actual tools, and measuring whether it’s genuinely helping. A great model badly integrated is worse than a good model integrated well.
So while we choose models carefully, we don’t agonize over them, and we don’t let the choice become permanent. We pick the right tool for the job today, build so we can change it tomorrow, and spend our real attention on the integration, because that’s what determines whether the thing works.
That’s the approach across our AI work: evaluate models for your use case, stay provider-agnostic, and put the effort where it counts. If you’re trying to figure out what to build on, we’re happy to think it through with you.
Thanks for reading.