The Model Ladder: When to Use Local, When to Use Claude, When to Use GPT

Running every task through a frontier model is like hiring a senior architect to update a spreadsheet. The work gets done, but the economics are absurd, and you're burning your most valuable resource on tasks that don't need it.

Different AI tasks have wildly different complexity profiles. A model that costs $15 per million tokens is overkill for reformatting a JSON blob. A model that runs free on your laptop is dangerously underpowered for designing a system architecture. The solo builders who understand this distinction spend less, move faster, and get better results than the ones who default to the most expensive option for everything.

The Three Tiers

Think of model selection as a ladder with three rungs. Each rung covers a category of work defined not by the tool's marketing copy but by the actual cognitive demand of the task.

  • Local models (7B-32B parameters): Classification, extraction, reformatting, summarization of structured data. These run on your own hardware at zero marginal cost.
  • Mid-tier cloud models (Sonnet, Haiku, GPT-4o-mini): Drafting, routine code generation, document summarization, conversational interactions. Pennies per task, fast responses, good enough for 70% of daily work.
  • Frontier models (Opus, o1, GPT-4.5): Architecture decisions, complex multi-step reasoning, novel problem-solving, strategic analysis. The tasks where getting it wrong costs more than the model charges to get it right.

The ladder matters because most solo builders either run everything through the top rung and wonder why their API bill is $400/month, or they try to do everything locally and wonder why their outputs are mediocre. Neither extreme works. The discipline is in the routing.

The Local Tier: Free and Surprisingly Capable

A 7B parameter model running on a MacBook Pro with 16GB of RAM handles a specific category of work with near-zero latency and exactly zero cost. The tasks that fit this tier share a common trait: the correct output is highly constrained.

Classification is the clearest example. Given a customer support email, categorize it as billing, technical, or general inquiry. A 7B model gets this right 90-95% of the time. A frontier model gets it right 97-98% of the time. For routing emails, that difference does not justify a 100x cost increase.

The same logic applies to extraction. Pull the date, amount, and vendor name from an invoice. Reformat a CSV into JSON. Convert markdown to HTML. These are tasks with deterministic correct answers, and small models handle them reliably.

Real numbers from a production pipeline: Qwen 2.5 32B running locally processes classification tasks at ~1.5 seconds each. That is roughly 2,400 classifications per hour at $0. The equivalent on Claude Haiku would cost about $0.30 per thousand. Small numbers in isolation, but a solo builder processing thousands of items per week sees the difference compound.

Local models also solve a problem that has nothing to do with cost: data sovereignty. Client data that never leaves your machine is client data that never appears in someone else's training set. For builders working with sensitive documents, legal records, or financial data, local processing is not an optimization. It is a requirement.

The Mid-Tier: The Workhorse

The mid-tier handles the bulk of daily AI work for most solo builders. Sonnet, Haiku, GPT-4o-mini, Gemini Flash. These models are fast, cheap, and good enough for tasks where "good enough" is the right standard.

Drafting is the canonical mid-tier task. First drafts of blog posts, client emails, project proposals, documentation. The output needs human review and editing regardless of which model generates it. Paying 10x more for a marginally better first draft is wasted spend when you are editing it anyway.

Routine code generation falls here too. Writing a CRUD endpoint, generating test boilerplate, converting a function from one language to another. These are well-understood patterns with abundant training data. Sonnet writes a competent Express route handler for $0.003. Opus writes a slightly more elegant one for $0.03. Both require the same review process before deployment.

Summarization of long documents is another mid-tier sweet spot. Condense a 40-page contract into key terms. Summarize a week of client communications. Extract action items from meeting notes. The task requires reading comprehension and concision, not deep reasoning. Mid-tier models handle this well, and they handle it fast.

The cost profile tells the story. Claude Sonnet runs about $3 per million input tokens and $15 per million output tokens. GPT-4o-mini is even cheaper. For a solo builder processing 50 drafting and summarization tasks per day, the monthly cost stays under $30. Push all those tasks to Opus and the bill climbs past $200 without a meaningful quality improvement in the final output.

The Frontier Tier: When the Stakes Justify the Price

Frontier models earn their cost on tasks where the gap between a good answer and a great answer has real consequences. These tasks share a pattern: the solution space is large, the constraints are ambiguous, and domain expertise is required to evaluate the output.

System architecture is the clearest example. Designing how components interact, choosing between consistency and availability tradeoffs, mapping data flows across services. A mid-tier model produces something that looks right. A frontier model produces something that accounts for edge cases you had not considered. When the architecture decision affects six months of development work, the $0.50 you spent on the Opus call is irrelevant.

Complex debugging is another frontier task. Not "why does this function return null" but "why does this system work correctly in development, fail intermittently in staging, and produce corrupted data in production under load." Multi-step reasoning across layers of abstraction is exactly what frontier models are trained for, and exactly what smaller models fumble.

Novel problem-solving belongs here too. Tasks without established patterns. Designing a pricing model for a new service. Evaluating whether a technical approach is feasible before investing weeks of build time. Synthesizing information from five different domains into a coherent strategy. These are the tasks where you need the model to think, not just pattern-match.

The cost math flips at this tier. Claude Opus runs about $15 per million input tokens and $75 per million output tokens. A complex architecture discussion might consume 50K input tokens and 10K output tokens, costing roughly $1.50. If that conversation saves you from a design mistake that would have taken two weeks to unwind, it is the highest-ROI dollar you will spend all month.

The Routing Decision

The practical question is not which model is best. It is which model is best for this specific task, right now. A simple decision framework:

  • Is the correct output highly constrained? Classification, extraction, reformatting. Use local.
  • Will you edit the output regardless? Drafts, boilerplate, summaries. Use mid-tier.
  • Does the quality of reasoning directly affect downstream decisions? Architecture, debugging, strategy. Use frontier.
  • Does the data need to stay on your machine? Use local, regardless of task complexity.

Two additional factors that most guides ignore. First: latency matters. Local models respond in 1-3 seconds. Cloud API calls take 5-30 seconds depending on complexity. For tasks embedded in tight feedback loops, like iterating on a regex or testing classification prompts, the speed advantage of local models outweighs any quality difference.

Second: context windows matter more than people think. If your task requires ingesting 100K tokens of context, your local 7B model is either incapable or so slow it is effectively useless. Context-heavy tasks get pushed up the ladder automatically. A long codebase review that needs full-project context is a frontier task not because the reasoning is hard but because the input is large.

A Month in Practice

Here is what model routing looks like for a solo builder running a consulting practice with 15-20 active clients.

  • Local (Ollama, ~200 tasks/day): Email classification, document tagging, data reformatting, embedding generation. Cost: $0/month. Hardware: MacBook Pro M-series already on the desk.
  • Mid-tier (Sonnet/Haiku, ~50 tasks/day): Client email drafts, meeting summaries, code generation, documentation. Cost: ~$25-40/month.
  • Frontier (Opus, ~5-10 tasks/day): Architecture reviews, complex debugging sessions, strategic analysis, novel problem-solving. Cost: ~$30-60/month on API, or $200/month flat on Claude Max.

Total monthly AI spend with routing: $55-100 on API, or ~$225 with a Max subscription covering frontier use. Total without routing, running everything through Opus: $400-800/month. The routing discipline cuts costs by 60-75% while improving results on tasks where local models are actually faster and more appropriate.

The Catch

Model routing adds a decision to every task. That overhead is real. In the first week, you will second-guess yourself constantly. Is this task really mid-tier, or should I use Opus? Will the local model handle this edge case, or will I waste twenty minutes debugging a bad output?

The answer is to start conservative. Default to one tier higher than you think you need. If the output is good, try one tier lower next time. Within two weeks, you develop intuition for the boundaries. Classification is always local. First drafts are always mid-tier. Anything where you catch yourself thinking "I need this to be really good" is a frontier task.

The other catch is model churn. The tiers shift every few months as new releases push capabilities downward. Tasks that required Opus a year ago run fine on Sonnet today. Tasks that required Sonnet run locally. The ladder is not fixed. Revisiting your routing assumptions every quarter keeps your costs dropping as capabilities improve.

Why This Matters

For teams, model costs are a line item in a larger budget. For solo builders, every dollar of overhead comes directly out of margin. A $500/month AI bill on $10K/month revenue is a 5% drag. Routing that bill down to $100/month puts $400 back in your pocket, twelve times a year.

But the cost argument is actually the less important one. The real advantage is speed and appropriateness. Local models respond instantly. Mid-tier models respond in seconds. Frontier models can take 30-60 seconds on complex tasks. Running classification through Opus does not just cost more. It is slower. Your pipeline moves faster when each task hits the right model for its complexity.

The builders who figure out model routing early build an operational advantage that compounds over time. Their costs stay flat while their throughput scales. Their systems respond faster because they are not bottlenecked on API calls that did not need to be API calls. Their sensitive data stays local because the routing framework made that the default, not an afterthought.

The model ladder is not about finding the best model. It is about matching the right tool to the right task, every time, automatically. The semi truck stays in the garage until you actually need to haul something heavy. The rest of the time, you take the car.