Stop Reaching for Multi-Agent — Start With Stateful Sessions

Most teams building AI agents hit the same inflection point. The first prototype works — a single agent, a single prompt, a single conversation. Then requirements pile up: "Can we add a research step?" "Can it hand off to a specialist?" "Can two agents collaborate?"

The instinct is to reach for multi-agent orchestration. Add another agent. Wire up communication. Introduce a supervisor. Before long, you're debugging race conditions in a system where a single stateful session would have been enough.

The pattern we see again and again: teams jump to multi-agent before they've exhausted what a single agent with proper state management can do. And they pay for it in complexity they didn't need to take on.

The complexity cliff is real

A single agent with access to tools, persistent state, and session history is already capable of remarkable work. It can look up user information, update records, maintain context across dozens of turns, and adapt its behavior based on accumulated state.

The moment you introduce a second agent, you're no longer building a chatbot — you're building a distributed system. You inherit coordination overhead, shared-state problems, message-passing semantics, and failure modes that didn't exist before. Every message between agents is a potential point of failure, data loss, or inconsistency.

That's not an argument against multi-agent systems. They solve real problems. But they should be the answer to "I've genuinely exhausted what a single agent can do," not "I think my problem sounds like it needs two agents."

What stateful sessions actually give you

The core insight is that a well-designed session — one with proper state management — collapses a surprising number of "multi-agent" use cases back into single-agent problems.

Consider what a session can track:

Conversation history — full message thread, persisted across interactions
Resources — persistent state that the agent reads and writes, synced to your application in real-time
Variables — internal state for intermediate results that don't need to reach the client
Inputs — immutable context set at session creation (user identity, configuration, policies)

Here's a concrete example. In Octavus, a session stores all of these as first-class concepts:

input:
  COMPANY_NAME:
    type: string
    description: The company name for support context
  USER_ID:
    type: string
    description: Current user's ID
    optional: true

resources:
  CONVERSATION_SUMMARY:
    type: string
    description: Running summary of the conversation
    default: ''
  USER_CONTEXT:
    type: unknown
    description: Cached user information
    default: {}

variables:
  CHAT_TRANSCRIPT:
    type: string
    description: Serialized conversation for processing

The input is set once at session creation — immutable context like user identity and configuration. Resources persist across every interaction and sync to the client, so your frontend can react to state changes (a running conversation summary, for example). Variables are internal — intermediate results the agent needs but the user doesn't see.

This isn't a key-value store bolted onto a prompt. It's a structured state model with clear scoping rules:

Type	Scope	Persistence	Synced to Client
Input	Session	Immutable	At creation
Resources	Session	Across triggers	Yes, via callbacks
Variables	Session	Across triggers	No, internal only

When a resource updates, your frontend receives a resource-update event in the stream. No polling. No separate state sync layer. The agent writes to CONVERSATION_SUMMARY, and your UI renders it.

useOctavusChat({
  onResourceUpdate: (name, value) => {
    if (name === 'CONVERSATION_SUMMARY') {
      setSummary(value);
    }
  },
});

This is the kind of state management that, without first-class session support, drives teams toward multi-agent setups. "We need one agent to summarize and another to respond" is often better solved as "we need one agent that writes a summary to a resource after each turn."

Handlers replace coordinators

The second pattern that pulls teams toward multi-agent prematurely is sequential task execution. "First we need to research, then analyze, then respond." That sounds like a pipeline of agents.

But look at what a single agent's handler can do:

handlers:
  request-human:
    Serialize conversation:
      block: serialize-thread
      format: markdown
      output: CONVERSATION_TEXT

    Generate summary:
      block: next-message
      output: SUMMARY

    Create ticket:
      block: tool-call
      tool: create-support-ticket
      input:
        summary: SUMMARY
      output: TICKET

This handler serializes the conversation, generates a summary, and creates a support ticket — three distinct steps, executed in sequence, within one session. No inter-agent communication. No coordinator. No shared message bus.

Each block's output becomes the next block's input through variables. The state flows through the handler like data through a pipeline. It's declarative, inspectable, and runs within a single session's state boundary.

When you actually need workers

Sometimes a task genuinely doesn't fit within a conversational agent's handler. Maybe it requires a different model. Maybe it needs its own tool set. Maybe it's a discrete unit of work that you want to reuse across multiple agents.

That's what workers are for — task-based agents that execute a sequence of steps and return an output value. They're not multi-agent orchestration. They're composable subtasks.

# A worker: research-assistant
input:
  TOPIC:
    type: string
    description: Topic to research

variables:
  RESEARCH_DATA:
    type: string
  ANALYSIS:
    type: string

steps:
  Start research:
    block: start-thread
    thread: research
    model: anthropic/claude-sonnet-4-5
    system: research-system
    tools: [web-search]
    maxSteps: 5

  Add research request:
    block: add-message
    thread: research
    role: user
    prompt: research-prompt
    input: [TOPIC]

  Generate research:
    block: next-message
    thread: research
    output: RESEARCH_DATA

  Start analysis:
    block: start-thread
    thread: analysis
    model: anthropic/claude-sonnet-4-5
    system: analysis-system

  Add analysis request:
    block: add-message
    thread: analysis
    role: user
    prompt: analysis-prompt
    input: [RESEARCH_DATA]

  Generate analysis:
    block: next-message
    thread: analysis
    output: ANALYSIS

output: ANALYSIS

The key difference between a worker and "another agent" is scope. A worker runs, returns a value, and is done. It doesn't maintain its own conversation. It doesn't coordinate with other agents. It's a function call with LLM steps inside.

An interactive agent can call workers deterministically from a handler:

handlers:
  user-message:
    Research topic:
      block: run-worker
      worker: research-assistant
      input:
        TOPIC: USER_MESSAGE
      output: RESEARCH_RESULT

Or the LLM can call workers as tools when it decides they're needed:

agent:
  model: anthropic/claude-sonnet-4-5
  system: system
  workers: [research-assistant, generate-title]
  agentic: true

This gives you decomposition without coordination overhead. Each worker gets its own threads, its own model configuration, even its own tool set — but the parent agent's session manages the lifecycle.

Sessions survive disconnects

One advantage of treating sessions as the primary unit of state: they're designed to survive interruptions.

Sessions expire after a configurable period of inactivity (24 hours by default). When a user returns, you check the session status and either resume or restore:

const result = await client.agentSessions.getMessages(sessionId);

if (result.status === 'active') {
  // Session is live — resume where we left off
  return { sessionId: result.sessionId, messages: result.messages };
}

// Session expired — restore from stored messages
if (storedMessages.length > 0) {
  const restored = await client.agentSessions.restore(
    sessionId,
    storedMessages,
    { COMPANY_NAME: 'Acme Corp' },
  );

  if (restored.restored) {
    return { sessionId: restored.sessionId, messages: storedMessages };
  }
}

// Nothing to restore — start fresh
const newSessionId = await client.agentSessions.create('support-chat', {
  COMPANY_NAME: 'Acme Corp',
});
return { sessionId: newSessionId, messages: [] };

In a multi-agent system, restoring state after a disconnect means restoring state for every agent involved, plus whatever coordination state existed between them. In a single-agent session, it's one restore call.

The progression that works

Based on what we've seen teams build successfully, the progression looks like this:

1. Start with a single agent and proper sessions. Define your inputs, resources, and variables. Use handlers with multiple blocks for sequential work. This handles more than you think.

2. Extract workers for discrete, reusable tasks. When a subtask needs its own model, tools, or system prompt — and especially when you want to reuse it across agents — pull it into a worker. Workers are composable units, not autonomous entities.

3. Introduce multi-agent only for genuinely autonomous coordination. When you need agents that maintain their own long-running state, make independent decisions, and communicate asynchronously — that's when multi-agent orchestration earns its complexity.

Most teams building production agents today are somewhere between step 1 and step 2. The trap is jumping to step 3 because the architecture sounds right, not because the problem demands it.

Simplicity is not a consolation prize

There's a perception that single-agent systems are somehow less sophisticated. That "real" agent work requires multiple agents talking to each other.

But the engineering that matters most — session lifecycle, state persistence, tool execution, streaming, error handling, restore mechanics — lives entirely within the single-agent boundary. Getting those primitives right is the hard work. Multi-agent adds a coordination layer on top of that work. It doesn't replace it.

If your single-agent sessions aren't solid, your multi-agent system won't be either. Every weakness in your state management gets multiplied by the number of agents involved.

Start with sessions. Get the state model right. Layer complexity only when the problem forces your hand.

Stop Reaching for Multi-Agent — Start With Stateful Sessions

The complexity cliff is real

What stateful sessions actually give you

Handlers replace coordinators

When you actually need workers

Sessions survive disconnects

The progression that works

Simplicity is not a consolation prize

Comments

More from this blog

The Best Agent Architectures Are Event-Driven, Not Chat-Driven

Your Agent's Behavior Shouldn't Live in Your Codebase

The Agents That Work in Production Are Built to Check Their Own Work

Your Agent's Tools Are the Wrong Abstraction

You're Not Building an Agent — You're Building Infrastructure

Command Palette

The complexity cliff is real

What stateful sessions actually give you

Handlers replace coordinators

When you actually need workers

Sessions survive disconnects

The progression that works

Simplicity is not a consolation prize

Comments

More from this blog