Or: How we learned to stop constraining and love the hill-climb
Companion to Why AI Keeps Surprising You
Chapter 1
When you first work with LLMs, specialization feels like wisdom. The models are unreliable. They hallucinate. They go off-script. The obvious solution: constrain them.
Build a tight loop. Wrap the model in guardrails. Define exactly what it can and can't do. Parse its outputs. Retry when it fails. This is the specialized agent pattern: the model as a component, managed by deterministic code.
And it works! For a while. Your coding agent stops inventing APIs. Your research agent stays on topic. Your workflow becomes predictable. You've traded capability ceiling for reliability floor.
"Don't let the model think too much. Define the workflow. Constrain the outputs. Make it predictable."
This philosophy emerges from real pain. The demo was magic, but production is chaos.
But it comes with a hidden cost.
Encodes a known workflow. Model is a component. Orchestrator manages state. Optimizes for predictability on tasks we already understand.
Keeps the model in the driver's seat. Agent manages its own memory, selects its own tools, iterates until done. Optimizes for capability on tasks we don't yet know how to solve.
Super agents can be domain-specific. Claude Code only does coding. But the model decides what to do next, not an orchestrator. The distinction is about control, not scope.
Chapter 2
The problem with specialization isn't that it fails. It's that it succeeds—at the wrong thing.
Think of problem-solving as navigating a landscape. The height at any point represents solution quality. Valleys are bad solutions; peaks are good ones. Your goal is to find the highest peak.
A specialized agent is like a hiker who can only go uphill and can only see ten feet ahead. They'll find a peak quickly. But it's probably not the highest one—just the nearest one to where they started. They're stuck on a local maximum.
Interactive
Watch how different agent architectures explore a solution space. The specialized agent gets stuck; the super agent keeps searching.
The specialized agent finds a solution and stops. No mechanism for asking "is there something better over that ridge?" It optimized. It's done.
The super agent keeps exploring. It can backtrack, try different approaches, maintain a broader view. It might take longer to converge, but it's more likely to find something genuinely good.
Specialized agents optimize for current practice. Super agents can discover the next one.
Chapter 3
Here's the objection: "I don't care about finding the global maximum. I care about not falling off a cliff."
The mental shift here is from raising the floor to raising the ceiling.
Teams are optimizing for predictable, shippable results that they can measure. They'd rather land on a known local peak every time than risk the agent wandering forever or returning nonsense.
The fear isn't "stuck on local max." It's:
These are real concerns. Specialized agents address them by constraining the search space. If you can only go uphill on a known path, you'll reach a peak fast.
Trade ceiling for floor. Give up the possibility of breakthroughs in exchange for the guarantee of acceptable results. No negative surprises, even if it means no positive surprises either.
This makes sense for production systems with SLAs. User clicks button, needs response in 2 seconds, needs it defensible. "The agent is still exploring" doesn't fly.
But here's what's missed: the floor is rising.
Your constraints are calibrated to today's models. They encode beliefs about what the model will mess up. As models improve, those failure modes become less common. Your constraints start preventing successes more than failures.
The super agent approach handles this differently. Instead of constraining the search, you equip the agent to search well:
| Concern | Specialized Solution | Super Agent Solution |
|---|---|---|
| Wandering forever | Fixed workflow, limited steps | Task management, explicit goals, timeout budgets |
| Falling off cliffs | Output validation, constrained actions | Self-verification skills, confidence calibration |
| Unpredictable runtime | Deterministic pipelines | Complexity estimation, progressive refinement |
| Unexplainable outputs | Fixed reasoning templates | Explicit reasoning traces, scratchpad logs |
The difference: specialized agents solve these problems by removing capability. Super agents solve them by adding skills. One approach gets worse as models improve. The other gets better.
Don't lower the ceiling to raise the floor. Raise the floor by making the agent better at knowing when it's done, when it's wrong, and when to ask for help.
Chapter 4
When specialized agents work together, information dies at the boundaries.
Consider a typical specialized architecture: an orchestrator routes tasks to domain-specific agents. One agent handles research. Another handles synthesis. Another handles output formatting. Clean separation of concerns.
But watch what happens to context as it flows through this system:
Interactive
Step through a workflow and watch as context degrades at each handoff.
The orchestrator can only pass what it thinks is relevant. But the research agent might discover something that reframes the entire task. That observation gets compressed into a result summary. The insight dies.
A super agent doesn't have this problem. Everything stays in context—or gets written to the file system where it can be retrieved. The agent that notices something unexpected is the same agent that can act on it.
Breakthroughs happen at the boundaries—when you notice that a pattern in one area echoes something in another, or that an anomaly implies a deeper issue upstream. Specialized agents structurally prevent these connections.
Chapter 5
A super agent isn't just an unconstrained model. It's a model with specific capabilities that enable sustained reasoning.
The difference between "let the model ramble" and "let the model think deeply" comes down to four mechanisms:
| Mechanism | Super Agent | Specialized Agent |
|---|---|---|
| State Persistence | File system as extended memory. Writes notes, logs failures, tracks hypotheses across context windows. | State lives in orchestrator. Passed as compressed payloads between calls. |
| Tool Selection | Model decides which tools to use based on the problem. Can discover new approaches. | Router decides. Model gets the tool it's given, whether or not it's the right one. |
| Iteration | Can try something, evaluate, backtrack, try something else. Maintains goal across attempts. | One shot per handoff. Failure means retry or escalate, not pivot. |
| Self-Direction | Recitation, task lists, explicit "current state" tracking. Uses language to extend attention. | Direction comes from code. Model follows the script. |
These aren't luxuries. They're the mechanisms that enable hill-climbing on hard problems. An agent that can write "I've ruled out X and Y, now investigating Z" is using language itself as a cognitive tool—exactly what makes humans effective at complex tasks.
A super agent can write code to solve subproblems, execute it, and use the results. This means it can effectively create new tools on the fly—not just use the tools you gave it. A specialized agent is limited to its predefined capabilities.
Chapter 6
People building specialized agents often say: "But my job is to bring context to the model." This misunderstands what super agents need.
Yes, models need context. Yes, your domain expertise matters. But providing context is not what distinguishes specialized agents from super agents. Both architectures need rich context. That's a separate concern entirely.
The question isn't whether to give the model context. It's who decides what to do with it.
You provide context. You define the workflow. You decide when to call the model and what to do with its output. The model is a component in your system.
You provide context. The model decides what to do with it. The model selects tools, manages state, iterates on approaches. The model is the driver of the system.
Your domain expertise—the schemas, the edge cases, the quality checks, the institutional knowledge—is more valuable in a super agent architecture, not less. But it shows up differently:
| Your Expertise | In a Specialized Agent | In a Super Agent |
|---|---|---|
| Domain knowledge | Hardcoded into prompts and workflows | Provided as reference docs the agent can consult |
| Quality checks | Deterministic validation code | Skills the agent invokes when appropriate |
| Edge cases | Branching logic you wrote | Examples and guidance the agent learns from |
| Best practices | Baked into the orchestration | Techniques the agent can choose to apply |
The work doesn't disappear. It transforms. Instead of writing code that controls the model, you're creating resources that inform the model. Instead of deciding the workflow, you're enriching the agent's judgment.
It's not "should we give the model context?" (Yes, obviously.) It's "should we let the model decide what to do next, or should we?" The super agent answer is: let the model drive, but give it everything it needs to drive well.
Chapter 7
Every architectural choice is a bet about the future. What are you betting on?
Bet: Models won't get much better
Optimize current practice
Bet: Models will keep improving
Capture future capability
Specialization makes sense if you think model capabilities are roughly fixed. You're accepting current limitations and engineering around them. The guardrails and constraints you build encode today's understanding of what models can and can't do.
But if models improve—if next year's model can do things this year's model can't—then those guardrails become ceilings. Every constraint you added to work around GPT-4's limitations will prevent you from benefiting from GPT-5's capabilities.
We've seen this movie before. Organizations that built elaborate retrieval systems to compensate for small context windows are now rearchitecting for million-token models. Organizations that built complex chain-of-thought orchestration are finding that newer models reason better when you let them think freely.
If you believe in scaling laws—if you think compute translates predictably to capability—then super agents are the rational architecture. You're building for the models you'll have, not just the models you have. See: Why AI Keeps Surprising You →
This doesn't mean specialized agents are useless. They have their place:
But these are deployment choices, not discovery choices. If your goal is to find novel insights, uncover non-obvious patterns, or solve problems you don't yet know how to solve—that work happens in the super agent, not in the recipe library.
Conclusion
Don't build specialized agents. Build skills for super agents.
The work that currently goes into specialized agents—encoding domain expertise, refining workflows, handling edge cases—shouldn't stand as separate systems. It should feed into your super agent as skills and techniques it can deploy when relevant.
A specialized SQL agent becomes a set of query patterns and data quality checks the super agent can draw on. A specialized summarization agent becomes guidance on compression and salience that the super agent internalizes. The expertise isn't lost—it's absorbed into a system that can combine it with everything else it knows.
This is harder than building isolated agents. It requires thinking about how capabilities compose, how context flows, how skills get triggered. But it's the architecture that scales with model capability rather than against it.
Specialized agents are a transitional form. The endgame is super agents with deep skills.