Headless Agents: When Your Agent Doesn't Need a Chat Interface

The most interesting agents won't have a chat interface. They'll run in the background — processing data, executing multi-step workflows, calling tools, returning structured output — all without a user typing a single message.

The industry is starting to figure this out. After two years of shoehorning everything into a chat bubble, the realization is landing: the "assistant" paradigm only covers a fraction of what agents can do. The real value is in headless execution — agents as backend primitives, invoked by your code, not your users.

But running an agent without a conversation UI introduces a set of problems that most frameworks don't address. How do you define a multi-step execution flow declaratively? How do you mix different models at different stages? How do you pipe the output of one LLM call into the next? How do you get a structured return value instead of a stream of chat messages?

This post walks through how headless agents work in practice using Octavus's worker protocol — a format built specifically for task-based, UI-less agent execution.

Chat Agents vs. Worker Agents

Most agent frameworks model everything as a conversation. You send a message, the agent responds, maybe it calls some tools along the way, and the conversation continues. This is the interactive pattern — it assumes a human in the loop and a persistent back-and-forth.

Workers flip this model. A worker takes structured input, runs a defined sequence of steps, and returns an output value. There's no conversation to maintain, no session to keep alive across interactions. It's a function call, except the function body is an LLM pipeline.

Here's how the two compare:

Aspect	Interactive Agent	Worker Agent
Structure	`triggers` + `handlers` + `agent`	`steps` + `output`
LLM Config	Global `agent:` section	Per-thread via `start-thread`
Invocation	Fire a named trigger	Direct execution with input
Session	Persists across triggers (24h TTL)	Single execution
Result	Streaming chat	Streaming + output value

The interactive format is right for chatbots, support agents, anything conversational. Workers are right for everything else: background jobs, data pipelines, content generation, classification tasks, research workflows, scheduled automations.

Anatomy of a Worker Protocol

Workers are defined in YAML, same as interactive agents. But the structure is simpler — no triggers, no handlers, just inputs, steps, and an output.

Here's a research worker that takes a topic, does web research, then produces a structured analysis:

input:
  TOPIC:
    type: string
    description: Topic to research
  DEPTH:
    type: string
    optional: true
    default: medium

variables:
  RESEARCH_DATA:
    type: string
  ANALYSIS:
    type: string

tools:
  web-search:
    description: Search the web
    parameters:
      query: { type: string }

steps:
  Start research:
    block: start-thread
    thread: research
    model: anthropic/claude-sonnet-4-5
    system: research-system
    input: [TOPIC, DEPTH]
    tools: [web-search]
    maxSteps: 5

  Add research request:
    block: add-message
    thread: research
    role: user
    prompt: research-prompt
    input: [TOPIC, DEPTH]

  Generate research:
    block: next-message
    thread: research
    output: RESEARCH_DATA

  Start analysis:
    block: start-thread
    thread: analysis
    model: anthropic/claude-sonnet-4-5
    system: analysis-system

  Add analysis request:
    block: add-message
    thread: analysis
    role: user
    prompt: analysis-prompt
    input: [RESEARCH_DATA]

  Generate analysis:
    block: next-message
    thread: analysis
    output: ANALYSIS

output: ANALYSIS

A few things to notice:

Multiple threads, independently configured. The research phase uses web-search with maxSteps: 5 (allowing agentic tool loops). The analysis phase uses a different system prompt and no tools. Each thread gets its own model, tools, and settings — you're not locked into a single configuration for the whole execution.

Data flows through variables. RESEARCH_DATA captures the output of the research thread, then gets passed as input to the analysis thread. Variables are the connective tissue between steps.

The output field declares the return value. When this worker finishes, the caller gets back whatever's in ANALYSIS. It's not a chat message — it's structured data you can use downstream.

Running Workers from Code

On the server side, you have two ways to execute a worker: generate() for simple fire-and-forget execution, and execute() for streaming.

The simple path:

import { OctavusClient } from '@octavus/server-sdk';

const client = new OctavusClient({
  baseUrl: 'https://octavus.ai',
  apiKey: process.env.OCTAVUS_API_KEY!,
});

const { output } = await client.workers.generate(
  'research-assistant-id',
  { TOPIC: 'AI safety', DEPTH: 'detailed' },
  {
    tools: {
      'web-search': async ({ query }) => await searchWeb(query),
    },
  },
);

console.log('Result:', output);

generate() runs the worker to completion and hands you the output. Tool handlers execute on your infrastructure — the LLM decides when to call web-search, but your code does the actual searching.

When you need visibility into what's happening mid-execution, use execute():

const events = client.workers.execute(
  'research-assistant-id',
  { TOPIC: 'AI safety' },
  {
    tools: {
      'web-search': async ({ query }) => await searchWeb(query),
    },
  },
);

for await (const event of events) {
  switch (event.type) {
    case 'worker-start':
      console.log(`Started: ${event.workerSlug}`);
      break;
    case 'block-start':
      console.log(`Step: ${event.blockName}`);
      break;
    case 'text-delta':
      process.stdout.write(event.delta);
      break;
    case 'worker-result':
      console.log('Output:', event.output);
      break;
  }
}

The streaming API emits fine-grained events: step transitions, text deltas, tool calls, and the final output. This is useful for progress tracking, logging, or piping worker events to a client over SSE.

Composing Workers into Larger Systems

Workers get more interesting when you compose them. An interactive agent can call workers as sub-tasks — either deterministically from a handler, or agentically where the LLM decides when to invoke them.

First, declare the worker in your interactive agent's protocol:

workers:
  research-assistant:
    description: Researching topic
    display: stream
    tools:
      search: web-search  # Map worker's "search" tool to parent's "web-search"

Then call it from a handler:

handlers:
  user-message:
    Run research:
      block: run-worker
      worker: research-assistant
      input:
        TOPIC: USER_MESSAGE
      output: RESEARCH_RESULT

Or let the LLM call it as a tool:

agent:
  model: anthropic/claude-sonnet-4-5
  system: system
  workers: [research-assistant]
  agentic: true

The tool mapping line — search: web-search — deserves attention. The worker protocol defines a tool called search. The parent agent has a handler for web-search. The mapping connects them, so when the worker's LLM calls search, the parent's web-search handler executes. Tools stay on your infrastructure; workers compose cleanly without duplicating handler code.

Where Headless Agents Make Sense

The pattern unlocks a category of work that doesn't fit the chat paradigm:

Scheduled jobs. A worker that runs nightly, pulls metrics from three APIs, synthesizes a summary, and posts it to Slack. No conversation needed — just input, process, output.

Pipeline stages. A content moderation worker that classifies user submissions, extracts metadata, and returns a structured verdict. Plug it into your existing pipeline as an async function call.

Agent-to-agent delegation. An interactive support agent that hands off research to a worker, gets back structured findings, and weaves them into the conversation. The user sees a seamless response; under the hood, two agents collaborated.

Background enrichment. A worker triggered by a webhook that takes a new customer record, enriches it with public data, scores it, and writes the result to your database.

In each case, the agent is a backend component — invoked by code, returning data, running without a UI. The declarative protocol means you can version it, review it in a PR, and validate it before deployment. The execution is still an LLM pipeline with tool calls and reasoning. You just stripped away the chat chrome.

The Shift Away from Chat

The chat interface was a useful starting point for agent development. It gave everyone a familiar interaction model and made demos easy. But it also anchored our thinking — we started treating "agent" as a synonym for "chatbot with tools."

The more useful framing: an agent is a program where some of the logic is delegated to a language model. Sometimes that program needs a conversation loop. Often it doesn't. The worker pattern makes the "often it doesn't" case a first-class citizen — defined declaratively, executed programmatically, composed with other agents, and returning structured output your code can use.

Agents are moving into the backend. The interesting question isn't whether they'll have a UI. It's how we make them reliable, composable, and observable when they don't.