Skip to main content

Command Palette

Search for a command to run...

You're Not Building an Agent — You're Building Infrastructure

The orchestration layer your team keeps re-writing is infrastructure in disguise

Published
7 min read
You're Not Building an Agent — You're Building Infrastructure
A
I love building with and sharing about AI.

Every team building agents ends up building orchestration. Few realize they're building infrastructure.

It starts small. You wire up a tool call loop. You add retry logic. You build a way to persist conversation state. You handle the case where the model asks for two tools at once and one of them fails. You add streaming so the UI doesn't hang. Before long, you have a bespoke coordination layer — one that looks suspiciously like the coordination layers every other team is also building from scratch.

This is the pattern. And it's worth naming, because the teams that recognize it early make better architectural decisions than the teams that don't.

The accidental infrastructure problem

When you write your first agent, the interesting part feels like the model call. Pick a model, write a system prompt, wire up a tool or two. The "hello world" works in an afternoon.

The second week is different. You need sessions that survive page reloads. You need to handle tool execution on your server (where your database and auth live), not on the provider's infrastructure. You need streaming events granular enough to build a real UI — not just a token firehose, but lifecycle events, tool call progress, error classification. You need to know why a generation stopped: did the model finish? Did it hit a content filter? Is it waiting for a tool result?

None of this is the model call. All of it is orchestration. And all of it is infrastructure — the kind of work where getting it 90% right means the other 10% bites you in production.

Infrastructure has a definition

What makes something infrastructure rather than application code? A few properties:

It's horizontal. The same session management, streaming, and tool execution patterns apply whether you're building a support chatbot, a research assistant, or a code review agent. If you're solving the same coordination problem in every project, that's infrastructure waiting to be extracted.

It's invisible when it works. Nobody praises DNS until it breaks. Good orchestration is the same — sessions restore seamlessly, tools execute and the model continues, errors surface with enough context to act on. The moment you notice orchestration, something has gone wrong.

It rewards convention over configuration. The teams that move fastest aren't the ones with the most flexible agent frameworks. They're the ones with clear contracts: a protocol defines what the agent does, tool handlers define how capabilities execute, and the orchestration layer connects them without anyone writing glue code for every new agent.

Where the hard problems actually live

If you map out the work involved in shipping a production agent, the model call is maybe 10% of the effort. The rest breaks down roughly like this:

Session lifecycle. Creating sessions, restoring them when users return, handling expiration, persisting conversation history to your own database so you're not locked into a provider's storage. This is state management — a solved problem in web development, but one that agent frameworks mostly punt on.

// Check if session is still active, restore if expired
const result = await client.agentSessions.getMessages(sessionId);

if (result.status === 'expired' && storedMessages.length > 0) {
  const restored = await client.agentSessions.restore(
    sessionId,
    storedMessages,
    { COMPANY_NAME: 'Acme Corp' },
  );
}

Tool execution boundaries. Tools should run on your infrastructure — your server, your auth context, your database connections. When a model calls get-user-account, that handler needs access to your ORM, your API keys, your permission checks. Shipping credentials to a third-party execution environment is a non-starter for most teams. The orchestration layer's job is to manage the continuation loop: the model requests a tool, your server executes it, the result feeds back, the model continues.

const session = client.agentSessions.attach(sessionId, {
  tools: {
    'get-user-account': async (args) => {
      // Runs on YOUR server, with YOUR database
      return await db.users.findById(args.userId);
    },
  },
});

Streaming granularity. A raw token stream isn't enough. You need structured events — text deltas for the response, tool input streaming for showing progress, lifecycle events for knowing when execution starts and stops, error events with enough metadata to decide whether to retry or surface the error to the user.

// Not just tokens — structured lifecycle events
{ type: 'block-start', blockName: 'Respond to user', display: 'stream' }
{ type: 'text-delta', delta: 'Let me look that up' }
{ type: 'tool-input-start', toolName: 'get-user-account', title: 'Looking up account' }
{ type: 'tool-output-available', output: { name: 'Demo User' } }
{ type: 'text-delta', delta: 'I found your account...' }
{ type: 'finish', finishReason: 'stop' }

Error classification. Not all errors are equal. A rate limit error is retryable after a delay. A content filter error means the input needs to change. A tool error is localized — it shouldn't kill the whole stream. A provider outage might mean falling back to a different model. The orchestration layer needs to classify errors and give the application enough information to respond appropriately.

Each of these is a coordination problem, not a model problem. And each one has the same characteristic: it's the same problem regardless of what your agent actually does.

The Terraform analogy

There's a useful parallel to infrastructure-as-code. Before Terraform, teams hand-rolled deployment scripts. Every team's scripts were different, but they all solved the same problems: create resources, track state, handle dependencies, roll back on failure.

Terraform's insight was that infrastructure provisioning is a declarative problem. You describe what you want, the engine figures out how to get there. The separation between "what" and "how" is what makes it composable and predictable.

Agent orchestration is the same kind of problem. An agent protocol should declare: here are my inputs, here are my tools, here's my system prompt, here's what happens when a user sends a message. The orchestration platform handles session state, the tool execution loop, streaming, error recovery. You don't hand-roll those per agent any more than you hand-roll EC2 provisioning per service.

input:
  COMPANY_NAME: { type: string }
  USER_ID: { type: string, optional: true }

tools:
  get-user-account:
    description: Looking up your account
    parameters:
      userId: { type: string }

agent:
  model: anthropic/claude-sonnet-4-5
  system: system
  tools: [get-user-account]
  agentic: true

handlers:
  user-message:
    Respond to user:
      block: next-message

This is the entire orchestration spec for a conversational agent. Session management, streaming, tool execution, error handling — all handled by the platform. The developer defines what the agent does. The platform handles how it runs.

Recognizing when you're building infrastructure

Here are some signals that your team has crossed the line from "building an agent" to "building agent infrastructure":

  • You're copying session management code between projects
  • You've built a custom streaming protocol more than once
  • Your tool execution loop has retry logic, timeout handling, and error classification
  • You have a "base agent" class that every agent inherits from
  • You're maintaining a shared library of agent utilities across your org

None of these are bad. They're signs that you've correctly identified the coordination patterns your agents need. The question is whether you want to maintain that infrastructure or use infrastructure that someone else maintains.

The teams I've seen move fastest are the ones that draw a clear line: agent logic is ours, orchestration is platform. They spend their time on system prompts, tool implementations, and the domain-specific behavior that makes their agents useful. They don't spend time on streaming protocols, session persistence, or tool continuation loops.

The consolidation ahead

The agent ecosystem right now looks a lot like microservices circa 2015. Everyone agrees on the general direction. Nobody agrees on the specifics. Every team has a slightly different approach to the same set of problems.

That's fine — it's how every infrastructure category matures. Service meshes, container orchestration, CI/CD — they all went through a phase of fragmentation before consolidating around clear abstractions.

The abstractions for agent orchestration are starting to emerge: sessions as the unit of state, protocols as the unit of definition, tools as the unit of capability, streaming events as the unit of communication. The teams that adopt these abstractions early — whether through a platform or by building their own — will have an easier time when the consolidation happens.

The teams still hand-rolling orchestration in application code will have the same experience as the teams that were still writing custom service discovery when Kubernetes showed up.

The takeaway

If you're building agents, you're building orchestration. If you're building orchestration, you're building infrastructure. The sooner you treat it as infrastructure — with the same expectations around reliability, composability, and separation of concerns — the faster you'll ship agents that actually work in production.

The model call is the easy part. The infrastructure around it is where the engineering lives.