The 2026 Discipline

Harness Engineering

Agent = Model + Harness

The discipline of building the infrastructure around an AI model — skills, tools, permissions, memory, and verification — so a capable model becomes a reliable agent.

01Definition

What Is Harness Engineering?

A harness is everything that surrounds the model: the tool registry it can call, the skills it loads, the permissions that bound it, the memory that persists across sessions, the hooks that fire on every action, and the checks that verify its output. Harness engineering treats this scaffolding as a first-class engineering artifact — one that co-determines agent performance as much as the underlying model does.

Agent = Model + Harness

The model is a frozen utility. Reliability, safety, and multi-step execution live in the harness.

The canonical example: telling an agent “follow our coding standards” in a prompt is probabilistic compliance. Wiring a linter hook that blocks violations is deterministic enforcement. Harness engineering is the systematic practice of converting the first into the second.

02Evolution

From Prompts to Harnesses: The Evolution

2023

Prompt Engineering

Optimize the words. One prompt, one answer. Fragile and probabilistic.

2025

Context Engineering

Optimize everything the model sees — retrieval, structure, compaction. Better inputs, same single-shot mindset.

2026

Harness Engineering

Optimize the environment the model operates in — tools, skills, permissions, verification. Agent = Model + Harness.

2026 H2

Loop Engineering

Optimize the runtime that drives the agent — systems that prompt, verify, retry, and stop without a human in every turn.

03Components

The Harness Stack

Seven component families make up a production harness. Skills are one of them — which is exactly why a skills directory is the natural starting point for harness builders.

Skills

Versioned folders of instructions, scripts, and resources the agent loads on demand — the knowledge and procedure layer.

Browse 30,000+ skills

MCP Connectors

Model Context Protocol servers that give the agent typed, permissioned access to external systems — databases, browsers, SaaS APIs.

Skills vs MCP explained

Sub-agents

Specialized agents with isolated context that the main agent dispatches for review, research, or parallel work.

See a full agent suite

Hooks

Deterministic scripts that fire before or after tool calls — the enforcement layer that turns guidelines into guarantees.

Hooks in practice

Permissions & Sandboxing

Allow-lists, approval modes, and sandboxes that bound what the agent can touch — safety as architecture, not vibes.

How skills stay safe

Memory & State

Persistent files, task lists, and context-management strategies that survive across sessions and compactions.

Memory-heavy workflows

Evals & Observability

Automated checks, test suites, and traces that tell you whether the harness actually improved the agent.

Eval-driven development

04Failure Modes

Why Agents Fail in Production

Anthropic’s own guidance on building effective agents makes the same point from the vendor side: most failures are environment failures. Each recurring failure mode maps to a missing harness layer.

Failure you seeWhat’s actually missingHarness fix
Agent “forgets” your conventions mid-taskKnowledge lives in chat, not artifactsPackage conventions as skills, loaded on demand
Claims tests pass when they don’tSelf-graded homeworkHooks + checkers the model cannot skip
Touches files it had no business touchingNo boundariesPermissions, sandboxes, scope fences
Repeats or abandons work after long sessionsState lives in the context windowExternal state: plan files, task lists, worktrees
Confidently calls APIs that don’t existFree-form access to external systemsTyped MCP connectors instead of guessed curl calls
Quality regresses and nobody noticesNo measurementEvals wired into the workflow

05Reference

Anatomy of a Reference Harness: Claude Code

The fastest way to understand harness engineering is to dissect a production harness. Claude Code is the most-documented example — every component family has an official, public specification:

  • Skills — procedure packages following the open SKILL.md standard, progressively disclosed so context stays lean. Anthropic’s essay Equipping agents for the real world explains the design rationale.
  • Hooks — lifecycle scripts (PreToolUse, PostToolUse, Stop) that run deterministically; the official docs call out exactly the guarantee-vs-request distinction this page is built on.
  • Sub-agents — named specialists with their own system prompts, tools, and isolated context windows.
  • Model Context Protocol — the open standard for connectors; Anthropic’s Writing tools for agents is the definitive guide to designing the tool surface itself.

Study how these four compose — Claude Code best practices is effectively a harness engineering field manual — then port the patterns to whatever runtime you use.

06Maturity

The Harness Maturity Model

Teams don’t jump from chat to autonomy. Harness capability grows in five levels — knowing your level tells you what to build next.

L0 — Raw model

Chat interface, copy-paste context, no tools. Every session starts from zero.

Next: Add a runtime with tools and version control.

L1 — Tooled

Agent can read/write files and run commands, but improvises all procedure.

Next: Install skills so procedure is packaged, not improvised.

L2 — Skilled

Methodology collections loaded; conventions live in versioned artifacts.

Next: Convert repeated rules into hooks; scope permissions.

L3 — Enforced

Deterministic gates on every action; sub-agents review with fresh context.

Next: Wire evals so harness changes are measured, not vibed.

L4 — Measured & autonomous

Evals gate changes; loops run long-horizon work with human checkpoints at irreversibility.

Next: You're doing loop engineering — scale sideways.

07Skills

Where Skills Fit in the Harness

Skills are the harness component that packages procedural knowledge: how to review a PR, how to run a TDD loop, how to build a slide deck. Because they are plain folders following the open SKILL.md standard, they are the most portable part of any harness — the same skill works across Claude Code, Codex CLI, and other SKILL.md-compatible runtimes.

Flagship collections show what harness-grade skills look like: Superpowers encodes an entire engineering methodology (brainstorm → plan → TDD → review) as composable skills, and Everything Claude Code ships skills alongside the hooks, sub-agents, and commands that complete the harness.

08Playbook

Build Your First Harness

  1. 1

    Start from a proven base

    Use an existing harness runtime (Claude Code, Codex CLI) instead of building from scratch — you get tools, permissions, and a loop for free.

  2. 2

    Install a methodology collection

    Add Superpowers or Anthropic’s official skills so the agent has disciplined procedures before you grant autonomy.

  3. 3

    Convert guidelines into hooks

    Every rule you keep repeating in prompts is a hook candidate: linters, test gates, commit checks. Deterministic beats probabilistic.

  4. 4

    Add evals before autonomy

    Wire eval-driven development so every harness change is measured, then graduate to loop engineering.

09FAQ

Harness Engineering FAQ

What is harness engineering?

Harness engineering is the discipline of designing the infrastructure around an AI model — tool registries, skills, permissions, memory, hooks, sub-agents, and verification — so that a capable model becomes a reliable production agent. The core equation is Agent = Model + Harness: the model provides raw intelligence, the harness determines whether that intelligence ships reliable work.

How is harness engineering different from prompt engineering?

Prompt engineering optimizes the words you send a model and relies on probabilistic compliance. Harness engineering wires deterministic constraints around the model: a linter that blocks bad code enforces standards in a way no prompt can. Prompts ask; harnesses enforce.

Are agent skills part of the harness?

Yes. Skills are the knowledge-and-procedure layer of the harness — versioned folders of instructions and scripts the agent loads on demand. They sit alongside MCP connectors, hooks, sub-agents, permissions, and memory as one of the core harness components.

What is the difference between harness engineering and loop engineering?

Harness engineering builds the environment an agent runs in; loop engineering designs the runtime system that prompts, verifies, retries, and stops the agent without a human driving every turn. The harness is the car, the loop is the driver — loop engineering sits on top of a well-built harness.

How do I start building an agent harness?

Start with an existing harness like Claude Code, then layer components: install a skills collection such as Superpowers for workflow discipline, add hooks for deterministic checks, configure permissions, and wire MCP connectors for external tools. Measure with evals before adding autonomy.

10Sources

Further Reading: Primary Sources

Harness engineering is young enough that you can read essentially all of the foundational material in an afternoon. Start here:

11Next

Keep Going

Stay Updated with Claude Skills

Subscribe to get the latest Claude Skills, tutorials, and community highlights delivered to your inbox.

We respect your privacy. Unsubscribe at any time.