agentic-engineeringmodel-selectioncompound-aiai-infrastructure

Compound Engineering: 5x More Iterations by Matching Model Speed to Thinking Speed

@kieranklaassen

article4 Mar 20265 min read

Compound AI systems that match model speed to task type can multiply creative output 5x — making model selection strategy as important as model capability for leaders building AI-powered operations.

Compound Engineering: 5x More Iterations by Matching Model Speed to Thinking Speed

By Kieran Klaassen

The Insight Nobody Talks About

Everyone debates which AI model is best. The more useful question is: which model is right for this moment in your workflow?

I've been running a compound engineering system for Coras — 27 agents, 21 commands, and 14 skills working in concert to handle everything from triaging GitHub issues to planning features to iterating on UI. When GPT-5.3-Codex-Spark became available, I put it through its paces inside that system. What I found reframed how I think about model selection entirely.

Speed as a Creative Multiplier

Spark isn't the most powerful model I use. But in tasks built around brainstorming and rapid iteration — UI design cycles, feature exploration, quick planning loops — it's the right tool. The reason is simple: iteration speed compounds.

During a recent design sprint on the Coras UI, I ran approximately 10 design iterations in the time a heavier model would have completed 2–3. That's not a marginal improvement. That's a fundamentally different creative process. When feedback loops tighten, you think differently. You take more swings. You find better answers.

The heavier models still have their place — deeper reasoning, complex architecture decisions, nuanced code generation. But forcing those models into every step of a workflow is like driving at 20 mph on a motorway because your vehicle can go 120. You're paying a cost in time and momentum that rarely shows up on a benchmark, but shows up every day in your output.

Upgrading the Production Stack

Separately, I upgraded Coras's email classification and summarization pipeline from Gemini Flash 2.0 to Flash 2.5. The results were clear across three dimensions:

Classification accuracy improved — emails routed more reliably to the right categories

Summaries got cleaner — less noise, more signal in the output

Reliability under high demand held up — no degradation when volume spiked

This is live for all Coras users now. It's a reminder that incremental model upgrades within existing pipelines often deliver outsized value with low implementation risk — especially when you're upgrading within a model family where the API interface stays stable.

What This Means for Your AI Architecture

If you're building or scaling AI workflows, the compound engineering model — multiple specialised agents, commands, and skills working together — gives you a critical advantage: you can route tasks to the right model for the right moment.

That means:

Fast models for high-frequency, creative, or exploratory tasks — brainstorming, iteration, triage, drafting

Powerful models for low-frequency, high-stakes decisions — architecture, complex reasoning, final review

Regular model upgrades within each role — as new versions release, swap them into existing slots and measure the delta

The goal isn't to find the one best model. It's to build a system intelligent enough to use the right tool at the right time — and fast enough to actually outpace the way you think.

The Compound Effect

Twenty-seven agents. Twenty-one commands. Fourteen skills. None of that complexity matters if the system is slow enough to break your thinking rhythm. Matching model speed to thinking speed isn't a technical detail — it's the difference between AI that accelerates you and AI that you wait for.

View original source