Back to AI Workflows

Stop Chaining Agents: The Controller Pattern for AI Pipelines

Agent-chaining looks elegant until it breaks. Here's why a deterministic controller with stateless stage workers is a better foundation for any multi-step AI pipeline.

Agent-chaining looks elegant on a whiteboard. Research agent finishes, triggers the draft agent. Draft agent finishes, triggers review. Review triggers visuals, visuals trigger publish. One handoff, then the next, and the whole thing flows.

The problem is what happens when something goes wrong.

Why does agent-chaining feel so natural?

The appeal is real. If you've read anything about ReAct agents or multi-agent systems in the last couple of years, the handoff model is everywhere. Mirrors how humans delegate. Feels like the AI is "thinking forward" instead of waiting for instructions. And for simple two-step pipelines, it actually holds up.

We started building our content pipeline this way. Research agent passes a brief to a write agent. Write agent produces a draft and hands it to a review agent. The code was clean. The prompts were clear. The first few runs looked great.

Then we started hitting the edges.

What actually breaks when agents chain themselves?

The first failure mode is control flow hiding inside prompts. When agent B is triggered by agent A, the logic for "should B run right now?" lives in A's output. Nobody owns the decision except the agent that just finished. If the prior stage produced garbage, B doesn't know, and neither do you until B produces worse garbage downstream.

The second is context bloat. Each agent in a chain tends to carry forward more than it needs: prior outputs, reasoning from earlier stages, fragments of the brief it didn't use. Token counts creep up with every hop. By the time you reach a review or publish step, you're paying for the whole conversation history whether it's relevant or not.

Retries are the worst. If the facts stage fails halfway through a run, who handles that? In a chained system, the answer is usually "the agent that triggered it" or "you, manually." Neither is great. The first requires writing retry logic into every agent prompt. The second defeats the point of automation.

The failure mode we hit hardest was artifact drift. Our style report, facts report, and publish gate were each reading different versions of "the post." No one had told them to stay in sync because nothing was in charge of keeping them there. The chain had no memory of what changed between stages.

What does the controller pattern look like?

One deterministic controller owns the pipeline. Agents act as stage workers that produce structured outputs.

Flowchart
graph TD
    C[Controller] --> R[Research Worker]
    R -->|brief.md + brief.json| C
    C --> W[Write Worker]
    W -->|draft.md| C
    C --> CL[Cleanup]
    CL -->|patched draft.md| C
    C --> ST[Style Worker]
    ST -->|style-report.json| C
    C --> FA[Facts Worker]
    FA -->|facts-report.json| C
    C --> SE[SEO Worker]
    SE -->|seo-report.json| C
    C --> V[Visuals Worker]
    V -->|visual-manifest.json| C
    C --> FR[Final Review]
    FR -->|review-report.json| C
    C -->|approval gate| P[Publish]
    P -->|publish-result.json| C

    style C fill:#1a1a2e,color:#fff
    style P fill:#16213e,color:#fff
Rendered from Mermaid source.

The controller reads run state, invokes one worker for one stage, validates that the required artifacts exist, and records the result before advancing. Workers never trigger the next stage.

Our current pipeline runs eleven stages in order: research, write, cleanup, baseline review, style, facts, seo, visuals, final review, publish, live QA. Every stage produces a stage-result.json with status, the model used, input and output artifact hashes, and whether any blocking issues were found. If the status isn't completed, nothing moves forward.

How do workers stay stateless?

Each worker reads only the artifacts it needs for its stage. The style worker reads draft.md and the style guide. The facts worker reads draft.md and the claims list. Neither of them needs to know what happened in the research stage, what the SEO score is, or whether visuals are planned.

That constraint is the feature. Stateless workers are cheap to run. They don't accumulate context from prior stages. They can be retried in isolation without rerunning anything upstream. And when they produce a bad output, the failure is contained to that stage.

Workers produce two things: the stage artifact (a report, a patched draft, a manifest) and a structured result JSON. The result JSON is what the controller trusts. Not the prose summary at the end of an agent response. Not whether the agent "said" it succeeded. The file either exists and passes validation, or the stage is blocked.

What does the controller actually own?

The controller owns stage order, retries, cooldowns, blocking, the approval gate, and the final publish decision. It applies structured patches from review workers rather than letting agents rewrite freely. Deterministic validators run before and after each stage to confirm the draft improved.

What it doesn't do: trust agent output. An agent that says "the draft looks solid" is not a passing gate. (Ask me how many times we shipped on vibes before wiring that check.) The controller reads the structured artifacts and confirms they exist, are valid JSON, and contain no blocking issues.

The publish step is mechanical. After every upstream stage has passed, the controller runs a final validation suite, then calls the publish endpoint. If any gate fails, it blocks. Pushing past a blocked gate requires an explicit --override "reason", and that reason gets logged into the run state.

Does this actually cost less to run?

Yes, though the savings compound over time more than they show up in a single run.

The biggest win is that review workers return structured patches rather than full rewrites. A style worker that returns {"old": "...", "new": "..."} pairs runs at a fraction of a "please rewrite this entire post with better rhythm" pass. The controller applies the patches deterministically. No second full-context pass to decide whether the post is ready.

Stateless workers also mean you can use lighter models for most stages. Facts extraction and SEO validation don't need the same horsepower as the write stage. We run most review stages on Sonnet and escalate only when a stage blocks and needs a harder call to resolve.

The final review is worth calling out. We deliberately avoided a "main agent rereads the whole post and decides if it's ready" step. That pattern is expensive and adds one more full-context pass where an agent can hallucinate a verdict. Final review in our pipeline is a deterministic merge of the stage reports, not another agent opinion.

When does agent-chaining still make sense?

Two places. Short pipelines with two or three steps where a failure means you restart everything anyway. And tasks where each step genuinely requires the full live context of the previous step rather than a structured artifact.

For anything with more than three stages, explicit approval gates, or a publish step you can't easily roll back, the controller pattern is worth the upfront wiring.

Frequently asked questions

What is the controller pattern for AI pipelines? A deterministic controller script owns stage order, retries, and approval gates. AI agents act as stateless workers, invoked by the controller, producing structured artifacts. The controller validates outputs before advancing to the next stage. Workers never trigger the next stage themselves.

What's wrong with chaining AI agents? Control flow hides inside prompts. Retries require agents to handle their own failure. Context accumulates with each hop, increasing token costs. Artifact drift occurs when downstream stages read stale or inconsistent versions of the same content.

How does this compare to tools like Prefect or Airflow? The concept is the same: a scheduler owns task execution, workers stay stateless, and the pipeline is explicit. Prefect and AWS Step Functions solve this for data pipelines. The controller pattern applies the same principle to AI agent pipelines where the "tasks" are LLM calls rather than data transforms.

Do workers need to know about other stages? No, and they shouldn't. A worker reads its stage inputs and produces its outputs. The controller holds the run state. Keeping workers ignorant of other stages is what makes them cheap to run and easy to retry.


We're still building this out. The controller in our pipeline started as a Python script with a few stage checks and is now a proper state machine with retry logic, cooldowns, and a two-pass remediation cycle before it gives up and blocks. None of that logic lives in a prompt. It lives in code, where it belongs.

If you're building something multi-stage with AI agents and you keep hitting the same debugging sessions trying to figure out which agent handed what to which, it's probably time to pull the control flow out of the prompts.

Tested with Claude Code v2.1.78

Share