Skip to main content

Command Palette

Search for a command to run...

Define Your Agent in a Spec File, Not Your Application Code

The case for declarative agent definitions — and what changes when you separate behavior from implementation

Published
8 min read
Define Your Agent in a Spec File, Not Your Application Code
A
I love building with and sharing about AI.

Your agent's behavior is defined somewhere. The question is whether it's defined in a place that's readable, versionable, and separable from your application code — or whether it's scattered across string literals, nested function calls, and configuration objects buried three layers deep in your backend.

Most agent codebases look like the latter. Prompts live as template strings. Tool definitions are inline objects. The flow of execution is implicit, spread across callback handlers. The "what does this agent do?" question requires reading hundreds of lines of application code to answer.

There's a better pattern, and it's not new. Infrastructure went through this exact transition — from imperative shell scripts to declarative manifests. Kubernetes, Terraform, GitHub Actions. The principle is the same: separate what something should do from how it gets executed.

Agents are ready for the same shift.

The Problem with Imperative Agent Code

Here's a common pattern. You're building a support agent. You've got a model call, some tools, a system prompt, and logic for when to escalate. In most frameworks, this looks something like:

const agent = new Agent({
  model: 'claude-sonnet-4.5',
  systemPrompt: `You are a support agent for ${companyName}.
    Help users with their ${productName} questions.
    If you can't help, offer to escalate.`,
  tools: [
    {
      name: 'get-user-account',
      description: 'Look up user account',
      parameters: { userId: { type: 'string' } },
      handler: async (args) => {
        return await db.users.findById(args.userId);
      },
    },
    {
      name: 'create-support-ticket',
      description: 'Create a support ticket',
      parameters: {
        summary: { type: 'string' },
        priority: { type: 'string' },
      },
      handler: async (args) => {
        return await ticketService.create(args);
      },
    },
  ],
});

This works. It ships. But notice what's happened: the agent's identity (its prompt, tools, and behavioral constraints) is tangled with the implementation (database queries, service calls, handler logic). If a product manager wants to change the system prompt, they're editing application code. If you want to diff what changed about the agent's behavior between deploys, you're reading through code diffs that mix behavioral changes with implementation changes.

Now multiply this by five agents. Or twenty. The "what does each agent do?" question becomes genuinely hard to answer.

Declarative Agent Definitions

The alternative is to pull the agent's behavioral specification out of your code entirely. Define what the agent is — its inputs, tools, triggers, prompts, and execution flow — in a standalone, versionable format. Let the runtime handle how that specification executes.

This is the approach behind Octavus protocols. An agent is a directory with a YAML protocol, markdown prompts, and a settings file:

support-agent/
├── settings.json
├── protocol.yaml
├── prompts/
│   ├── system.md
│   └── user-message.md
└── references/
    └── support-policies.md

The protocol defines everything about the agent's behavior:

input:
  COMPANY_NAME: { type: string }
  PRODUCT_NAME: { type: string }
  USER_ID: { type: string, optional: true }

resources:
  CONVERSATION_SUMMARY:
    type: string
    default: ''

tools:
  get-user-account:
    description: Look up user account information
    parameters:
      userId: { type: string }

  create-support-ticket:
    description: Create a support ticket
    parameters:
      summary: { type: string }
      priority: { type: string }

agent:
  model: anthropic/claude-sonnet-4-5
  system: system
  input: [COMPANY_NAME, PRODUCT_NAME]
  tools: [get-user-account, create-support-ticket]
  agentic: true
  maxSteps: 10

triggers:
  user-message:
    input:
      USER_MESSAGE: { type: string }

handlers:
  user-message:
    Add message:
      block: add-message
      role: user
      prompt: user-message
      input: [USER_MESSAGE]

    Respond:
      block: next-message

And prompts live in their own markdown files:

<!-- prompts/system.md -->
You are a support agent for {{COMPANY_NAME}}.

Help users with questions about {{PRODUCT_NAME}}.

## Guidelines
- Be helpful and professional
- If you can't help, offer to escalate
- Never share internal information

The tool implementations — the actual database queries and service calls — live in your application code where they belong:

const session = client.agentSessions.attach(sessionId, {
  tools: {
    'get-user-account': async (args) => {
      return await db.users.findById(args.userId);
    },
    'create-support-ticket': async (args) => {
      return await ticketService.create(args);
    },
  },
});

The separation is clean. The protocol describes the agent. Your code implements the tools. They evolve independently.

Why This Separation Matters

Prompts Are Content, Not Code

System prompts change far more often than application logic. A product manager tweaks the tone. A legal team adds a compliance clause. A support lead updates the escalation criteria.

When prompts are string literals in your codebase, every change requires a code review, a deploy, and (usually) a developer to make the edit. When prompts are standalone markdown files in a versioned directory, they can be edited by anyone who can write prose and open a pull request.

This isn't a minor convenience — it's a fundamental workflow change. The people closest to the agent's behavior (product, ops, domain experts) can iterate on prompts without touching implementation code.

Diffable Behavior

When an agent's entire specification lives in a protocol file, you get meaningful diffs:

 agent:
-  model: anthropic/claude-sonnet-4-5
+  model: openai/gpt-5
   system: system
   tools: [get-user-account, create-support-ticket]
-  maxSteps: 10
+  maxSteps: 15
+  thinking: medium

That diff tells you exactly what changed about the agent's behavior. Compare that to diffing a 200-line TypeScript file where a model change, a prompt tweak, and a handler refactor are all interleaved.

Validation Before Runtime

A declarative spec can be validated statically — before it ever hits a model. Octavus validates protocols at sync time:

npx octavus validate ./agents/support-agent

This catches real errors early: a handler references a tool that doesn't exist, a prompt uses a variable that isn't declared, a worker's output points to an undefined variable. These are the kinds of bugs that in imperative code only surface at runtime, usually in production, usually at the worst possible time.

Composability

Workers in Octavus are a good example of what declarative specs unlock for composition. A worker is an agent designed for task-based execution — it takes input, runs steps, and returns output:

# Worker: generate-title
input:
  CONVERSATION_SUMMARY:
    type: string

variables:
  TITLE: { type: string }

steps:
  Start thread:
    block: start-thread
    thread: title-gen
    model: anthropic/claude-sonnet-4-5
    system: title-system

  Add request:
    block: add-message
    thread: title-gen
    role: user
    prompt: title-request
    input: [CONVERSATION_SUMMARY]

  Generate:
    block: next-message
    thread: title-gen
    output: TITLE

output: TITLE

An interactive agent can call this worker declaratively, or make it available to the LLM as a tool:

workers:
  generate-title:
    description: Generating conversation title
    display: description

agent:
  workers: [generate-title]
  agentic: true

Workers can use different models, different tools, different thinking levels. Each thread is configured independently. Because the boundaries are explicit in the spec, the runtime knows exactly what resources each unit of work needs — no ambient state leaking between components.

CI/CD and Multi-Environment Deploys

Declarative specs slot into existing deployment workflows. Sync an agent to staging with one key, to production with another:

# Staging
npx octavus --env .env.staging sync ./agents/support-agent

# Production
npx octavus --env .env.production sync ./agents/support-agent

Add a validation step to your GitHub Actions pipeline and you've got the same review-validate-deploy cycle you already use for infrastructure:

- name: Validate agent
  run: npx octavus validate ./agents/support-agent

- name: Sync agent
  run: npx octavus sync ./agents/support-agent

The agent's behavioral spec is reviewed in the PR. The tool implementations are reviewed separately. Both are tested, validated, and promoted through environments independently.

The Terraform Analogy (And Where It Breaks Down)

It's tempting to map this directly onto infrastructure-as-code: the protocol is the Terraform config, the runtime is the provider, the tools are the resources. The analogy holds up to a point.

Where it diverges: agents are non-deterministic. A Terraform plan produces the same result every time given the same state. An agent protocol defines boundaries and capabilities, but the model's behavior within those boundaries is probabilistic. The protocol constrains — it doesn't dictate.

This is actually an argument for declarative definitions, not against them. When behavior is non-deterministic, you need the deterministic parts (which tools exist, what inputs are expected, how execution flows, what the model's instructions are) to be as explicit and reviewable as possible. The spec is the contract. The model operates within it.

What Changes When You Work This Way

Teams that adopt declarative agent specs tend to notice a few things:

Agent review becomes possible. When a new agent is a pull request with a YAML protocol and some markdown prompts, the whole team can review what it does. Not just the engineers — product managers, designers, domain experts. The spec is readable.

Prompt iteration accelerates. Nobody needs to understand the tool handler code to improve a system prompt. The two concerns are decoupled.

Debugging gets easier. When something goes wrong, you can check the protocol first. Is the tool declared? Is the variable passed to the prompt? Is the handler wired to the right trigger? These are structural questions with clear answers, before you even look at runtime behavior.

Reuse becomes natural. Tool definitions, workers, and prompt templates are independently addressable. A get-user-account tool defined once gets implemented in your backend once and used across any agent that needs it. A worker that generates summaries can be called by any interactive agent.

Getting Started

If you're building agents today — even if you're not using Octavus — the principle applies: pull your agent's behavioral specification out of your application code.

Start with the prompts. Move them to standalone files. Version them. Then extract the tool definitions. Then the execution flow. At each step, you'll find the separation creates clarity you didn't realize you were missing.

If you want to try this with Octavus specifically, the Protocol Overview is where to start. Define a protocol, sync it with the CLI, and implement your tools in your backend. The protocol describes the agent. Your code runs the tools. The platform handles everything in between.

E

This is the direction the whole ecosystem is moving — declarative agent definitions over imperative code. The spec file approach has the same advantage CLAUDE.md has for Claude Code: it separates WHAT the agent should do from HOW the runtime executes it. That means you can version control your agent behavior, diff changes in PRs, and roll back bad deployments just like any other config. The agents that win long-term will be the ones where the spec is human-readable enough that non-engineers can audit what the agent is actually allowed to do.