Skip to main content

Command Palette

Search for a command to run...

Agents Don't Eliminate Developer Toil — They Redistribute It

The supervision tax is real. The fix is infrastructure, not better models.

Published
7 min read
Agents Don't Eliminate Developer Toil — They Redistribute It
A
I love building with and sharing about AI.

There's a running joke on developer Twitter right now: coding agents are burning out senior engineers by 11 AM. Not because the agents are bad — but because supervising them is its own kind of exhausting.

You stop writing code. You start reviewing code. You stop thinking about architecture. You start babysitting four parallel agents, each drifting slightly off-task in different ways. The toil didn't disappear. It moved.

This isn't a complaint about agents. It's an observation about where the hard problems actually live. And if you look closely, the pattern is familiar — it's an orchestration problem wearing a new hat.

The Supervision Tax

When you run a single autonomous agent in a notebook, the experience feels magical. The model reasons, calls tools, adjusts, and delivers. But scale that to a real workflow — multiple agents, real infrastructure, production traffic — and you start paying what I'd call the supervision tax.

The supervision tax looks like this:

  • Context babysitting. You're manually feeding the right context to each agent invocation because there's no persistent state between runs.
  • Tool call anxiety. Did it call the right tool? Did it pass the right arguments? You're scanning logs like a hawk because tool execution is a black box.
  • Coordination overhead. Agent A produced output that agent B needs, but there's no structured way to hand it off. So you wire it together with glue code. Again.
  • Failure opacity. Something went wrong three tool calls deep, but the error surfaced as a vague model response instead of a stack trace.

This is developer toil. It's just wearing a different costume than the toil agents were supposed to eliminate.

Why This Happens

Most agent setups today are imperative. You write code that says: create a prompt, call the model, check if it wants a tool, run the tool, feed the result back, loop. You're hand-rolling the orchestration every time.

This is the equivalent of writing raw HTTP request handling instead of using a web framework. It works. It's also a maintenance nightmare at scale. And it forces the developer to be the orchestrator — the human in the loop isn't making decisions about the task, they're making decisions about the execution machinery.

The root causes map cleanly:

Supervision problemRoot cause
Context babysittingNo stateful sessions — every call starts from scratch
Tool call anxietyTool execution is opaque and unstructured
Coordination overheadNo formal protocol for agent-to-agent handoff
Failure opacityErrors aren't surfaced through the tool execution layer

These aren't model problems. They're infrastructure problems.

Declarative Protocols Reduce the Surface Area

One pattern that directly attacks supervision overhead is declarative agent definition. Instead of writing imperative code that is the orchestration, you declare what the agent should do and let the platform handle how.

Here's what this looks like in practice with an Octavus protocol:

triggers:
  user-message:
    input:
      USER_MESSAGE: { type: string }

tools:
  get-user-account:
    description: Looking up your account
    parameters:
      userId: { type: string }
  create-support-ticket:
    description: Creating a support ticket
    parameters:
      summary: { type: string }
      priority: { type: string }

agent:
  model: anthropic/claude-sonnet-4-5
  system: system
  tools: [get-user-account, create-support-ticket]
  agentic: true
  thinking: medium

handlers:
  user-message:
    Add user message:
      block: add-message
      role: user
      prompt: user-message
      input: [USER_MESSAGE]

    Respond to user:
      block: next-message

No tool execution loop. No manual context threading. No glue code for "what happens after the model responds." The protocol declares the agent's behavior, tools, and execution flow. The platform handles the rest — including the tool call cycle, state management, and event streaming.

The supervision surface area shrinks because you're not responsible for the orchestration machinery anymore. You're responsible for the agent's logic: its prompt, its tools, and its protocol.

Stateful Sessions Kill Context Babysitting

The most underrated piece of agent infrastructure is the session. Not chat history — execution context.

A stateful session tracks conversation history, resources, and variables across interactions. When a user comes back, the agent picks up where it left off. When a tool call populates a variable, that variable is available to the next handler. When a session expires, it can be restored from stored state.

// Create a session with initial context
const sessionId = await client.agentSessions.create('support-chat', {
  COMPANY_NAME: 'Acme Corp',
  USER_ID: 'user-123',
});

// Later — attach and execute. The session remembers everything.
const session = client.agentSessions.attach(sessionId, {
  tools: {
    'get-user-account': async (args) => {
      return await db.users.findById(args.userId as string);
    },
  },
});

const events = session.execute({
  type: 'trigger',
  triggerName: 'user-message',
  input: { USER_MESSAGE: 'What's my account status?' },
});

You're not rebuilding context every call. You're not manually passing conversation history. The session is the execution context, and it persists across interactions with a 24-hour TTL (with restore capability after expiration).

This eliminates an entire category of supervision: the "did I pass the right context?" anxiety that comes from stateless agent architectures.

Workers Make Coordination a Protocol, Not Glue Code

The coordination overhead — agent A produces something, agent B consumes it — is where teams burn the most supervision cycles. Without structure, it's custom wiring every time.

Workers in Octavus are agents designed for task-based execution. They run steps sequentially, can use different models at different stages, and return a typed output value. More importantly, interactive agents can call workers as sub-tasks — either deterministically or by letting the LLM decide.

# A worker that researches and analyzes a topic
input:
  TOPIC: { type: string }

variables:
  RESEARCH_DATA: { type: string }
  ANALYSIS: { type: string }

steps:
  Start research:
    block: start-thread
    thread: research
    model: anthropic/claude-sonnet-4-5
    system: research-system
    tools: [web-search]
    maxSteps: 5

  Add research request:
    block: add-message
    thread: research
    role: user
    prompt: research-prompt
    input: [TOPIC]

  Generate research:
    block: next-message
    thread: research
    output: RESEARCH_DATA

  Start analysis:
    block: start-thread
    thread: analysis
    model: anthropic/claude-sonnet-4-5
    system: analysis-system

  Add analysis request:
    block: add-message
    thread: analysis
    role: user
    prompt: analysis-prompt
    input: [RESEARCH_DATA]

  Generate analysis:
    block: next-message
    thread: analysis
    output: ANALYSIS

output: ANALYSIS

The research thread feeds into the analysis thread through a declared variable (RESEARCH_DATA). No glue code. No manual output parsing. The protocol defines the data flow, and the platform executes it.

An interactive agent can invoke this worker with a single block:

handlers:
  user-message:
    Research topic:
      block: run-worker
      worker: research-assistant
      input:
        TOPIC: USER_MESSAGE
      output: RESEARCH_RESULT

This is composability at the orchestration layer. The supervision cost of coordinating multiple agents drops to defining the handoff in YAML.

Tool Execution on Your Terms

Tool call anxiety — the "what is the agent actually doing?" problem — comes from tools executing in opaque environments. When tool execution happens on someone else's infrastructure, you lose visibility, auth control, and the ability to inject request context.

In Octavus, server-side tools execute in your code, on your infrastructure:

const session = client.agentSessions.attach(sessionId, {
  tools: {
    'get-user-account': async (args) => {
      const userId = args.userId as string;
      const user = await db.users.findById(userId);
      if (!user) throw new Error(`User not found: ${userId}`);
      return { name: user.name, plan: user.subscription.plan };
    },
    'create-support-ticket': async (args) => {
      return await ticketService.create({
        summary: args.summary as string,
        priority: args.priority as string,
        source: 'ai-chat',
      });
    },
  },
});

You control the implementation. You have access to your database, your auth context, your logging. When a tool fails, it throws an error that you can log, trace, and debug — not a vague model hallucination about what went wrong.

Tools without a server handler get forwarded to the client as client-tool-request events, so browser-only operations (confirmations, UI interactions) work without server-side hacks.

This split — server tools for infrastructure, client tools for interaction — means tool execution is never a black box. You can log every call, validate every argument, and trace every failure.

The Pattern

The supervision tax is real, and it's not going away by making models smarter. Better models still need someone to manage their sessions, route their tool calls, coordinate their sub-tasks, and surface their failures.

The pattern for reducing it is the same pattern the industry has landed on for every orchestration problem: declare intent, let the platform execute. Separate what the agent should do from how the execution machinery works.

Stateful sessions eliminate context babysitting. Declarative protocols eliminate orchestration hand-rolling. Composable workers eliminate coordination glue. Server-side tool execution eliminates opacity.

None of this is glamorous. But it's the work that determines whether your agents can run without a developer watching the logs.