The 2026 Discipline

Harness Engineering

Agent = Model + Harness

The discipline of building the infrastructure around an AI model — skills, tools, permissions, memory, and verification — so a capable model becomes a reliable agent.

Browse Harness Components Next: Loop Engineering →

01 — Definition

What Is Harness Engineering?

A harness is everything that surrounds the model: the tool registry it can call, the skills it loads, the permissions that bound it, the memory that persists across sessions, the hooks that fire on every action, and the checks that verify its output. Harness engineering treats this scaffolding as a first-class engineering artifact — one that co-determines agent performance as much as the underlying model does.

Agent = Model + Harness

The model is a frozen utility. Reliability, safety, and multi-step execution live in the harness.

The canonical example: telling an agent “follow our coding standards” in a prompt is probabilistic compliance. Wiring a linter hook that blocks violations is deterministic enforcement. Harness engineering is the systematic practice of converting the first into the second.

02 — Evolution

From Prompts to Harnesses: The Evolution

2023

Prompt Engineering

Optimize the words. One prompt, one answer. Fragile and probabilistic.

2025

Context Engineering

Optimize everything the model sees — retrieval, structure, compaction. Better inputs, same single-shot mindset.

2026

Harness Engineering

Optimize the environment the model operates in — tools, skills, permissions, verification. Agent = Model + Harness.

2026 H2

Loop Engineering

Optimize the runtime that drives the agent — systems that prompt, verify, retry, and stop without a human in every turn.

03 — Components

The Harness Stack

Seven component families make up a production harness. Skills are one of them — which is exactly why a skills directory is the natural starting point for harness builders.

Skills

Versioned folders of instructions, scripts, and resources the agent loads on demand — the knowledge and procedure layer.

Browse 30,000+ skills

MCP Connectors

Model Context Protocol servers that give the agent typed, permissioned access to external systems — databases, browsers, SaaS APIs.

Skills vs MCP explained

Sub-agents

Specialized agents with isolated context that the main agent dispatches for review, research, or parallel work.

See a full agent suite

Hooks

Deterministic scripts that fire before or after tool calls — the enforcement layer that turns guidelines into guarantees.

Hooks in practice

Permissions & Sandboxing

Allow-lists, approval modes, and sandboxes that bound what the agent can touch — safety as architecture, not vibes.

How skills stay safe

Memory & State

Persistent files, task lists, and context-management strategies that survive across sessions and compactions.

Memory-heavy workflows

Evals & Observability

Automated checks, test suites, and traces that tell you whether the harness actually improved the agent.

Eval-driven development

04 — Failure Modes

Why Agents Fail in Production

Anthropic’s own guidance on building effective agents makes the same point from the vendor side: most failures are environment failures. Each recurring failure mode maps to a missing harness layer.

Failure you see	What’s actually missing	Harness fix
Agent “forgets” your conventions mid-task	Knowledge lives in chat, not artifacts	Package conventions as skills, loaded on demand
Claims tests pass when they don’t	Self-graded homework	Hooks + checkers the model cannot skip
Touches files it had no business touching	No boundaries	Permissions, sandboxes, scope fences
Repeats or abandons work after long sessions	State lives in the context window	External state: plan files, task lists, worktrees
Confidently calls APIs that don’t exist	Free-form access to external systems	Typed MCP connectors instead of guessed curl calls
Quality regresses and nobody notices	No measurement	Evals wired into the workflow

05 — Reference

Anatomy of a Reference Harness: Claude Code

The fastest way to understand harness engineering is to dissect a production harness. Claude Code is the most-documented example — every component family has an official, public specification:

→Skills — procedure packages following the open SKILL.md standard, progressively disclosed so context stays lean. Anthropic’s essay Equipping agents for the real world explains the design rationale.
→Hooks — lifecycle scripts (PreToolUse, PostToolUse, Stop) that run deterministically; the official docs call out exactly the guarantee-vs-request distinction this page is built on.
→Sub-agents — named specialists with their own system prompts, tools, and isolated context windows.
→Model Context Protocol — the open standard for connectors; Anthropic’s Writing tools for agents is the definitive guide to designing the tool surface itself.

Study how these four compose — Claude Code best practices is effectively a harness engineering field manual — then port the patterns to whatever runtime you use.

06 — Maturity

The Harness Maturity Model

Teams don’t jump from chat to autonomy. Harness capability grows in five levels — knowing your level tells you what to build next.

L0 — Raw model

Chat interface, copy-paste context, no tools. Every session starts from zero.

Next: Add a runtime with tools and version control.

L1 — Tooled

Agent can read/write files and run commands, but improvises all procedure.

Next: Install skills so procedure is packaged, not improvised.

L2 — Skilled

Methodology collections loaded; conventions live in versioned artifacts.

Next: Convert repeated rules into hooks; scope permissions.

L3 — Enforced

Deterministic gates on every action; sub-agents review with fresh context.

Next: Wire evals so harness changes are measured, not vibed.

L4 — Measured & autonomous

Evals gate changes; loops run long-horizon work with human checkpoints at irreversibility.

Next: You're doing loop engineering — scale sideways.

07 — Skills

Where Skills Fit in the Harness

Skills are the harness component that packages procedural knowledge: how to review a PR, how to run a TDD loop, how to build a slide deck. Because they are plain folders following the open SKILL.md standard, they are the most portable part of any harness — the same skill works across Claude Code, Codex CLI, and other SKILL.md-compatible runtimes.

Flagship collections show what harness-grade skills look like: Superpowers encodes an entire engineering methodology (brainstorm → plan → TDD → review) as composable skills, and Everything Claude Code ships skills alongside the hooks, sub-agents, and commands that complete the harness.

08 — Playbook

Build Your First Harness

1
Start from a proven base
Use an existing harness runtime (Claude Code, Codex CLI) instead of building from scratch — you get tools, permissions, and a loop for free.
2
Install a methodology collection
Add Superpowers or Anthropic’s official skills so the agent has disciplined procedures before you grant autonomy.
3
Convert guidelines into hooks
Every rule you keep repeating in prompts is a hook candidate: linters, test gates, commit checks. Deterministic beats probabilistic.
4
Add evals before autonomy
Wire eval-driven development so every harness change is measured, then graduate to loop engineering.

09 — FAQ

Harness Engineering FAQ

What is harness engineering?

Harness engineering is the discipline of designing the infrastructure around an AI model — tool registries, skills, permissions, memory, hooks, sub-agents, and verification — so that a capable model becomes a reliable production agent. The core equation is Agent = Model + Harness: the model provides raw intelligence, the harness determines whether that intelligence ships reliable work.

How is harness engineering different from prompt engineering?

Prompt engineering optimizes the words you send a model and relies on probabilistic compliance. Harness engineering wires deterministic constraints around the model: a linter that blocks bad code enforces standards in a way no prompt can. Prompts ask; harnesses enforce.

Are agent skills part of the harness?

Yes. Skills are the knowledge-and-procedure layer of the harness — versioned folders of instructions and scripts the agent loads on demand. They sit alongside MCP connectors, hooks, sub-agents, permissions, and memory as one of the core harness components.

What is the difference between harness engineering and loop engineering?

Harness engineering builds the environment an agent runs in; loop engineering designs the runtime system that prompts, verifies, retries, and stops the agent without a human driving every turn. The harness is the car, the loop is the driver — loop engineering sits on top of a well-built harness.

How do I start building an agent harness?

Start with an existing harness like Claude Code, then layer components: install a skills collection such as Superpowers for workflow discipline, add hooks for deterministic checks, configure permissions, and wire MCP connectors for external tools. Measure with evals before adding autonomy.

10 — Sources

Stay Updated with Claude Skills

Subscribe to get the latest Claude Skills, tutorials, and community highlights delivered to your inbox.

We respect your privacy. Unsubscribe at any time.

Harness Engineering

What Is Harness Engineering?

From Prompts to Harnesses: The Evolution

Prompt Engineering

Context Engineering

Harness Engineering

Loop Engineering

The Harness Stack

Skills

MCP Connectors

Sub-agents

Hooks

Permissions & Sandboxing

Memory & State

Evals & Observability

Why Agents Fail in Production

Anatomy of a Reference Harness: Claude Code

The Harness Maturity Model

L0 — Raw model

L1 — Tooled

L2 — Skilled

L3 — Enforced

L4 — Measured & autonomous

Where Skills Fit in the Harness

Build Your First Harness

Start from a proven base

Install a methodology collection

Convert guidelines into hooks

Add evals before autonomy

Harness Engineering FAQ

Further Reading: Primary Sources

Building Effective Agents — Anthropic Engineering ↗

Claude Code Best Practices — Anthropic Engineering ↗

Equipping Agents for the Real World with Skills — Anthropic ↗

Writing Tools for Agents — Anthropic Engineering ↗

Harnessing Agent Skills (arXiv 2606.20631) ↗

Awesome Harness Engineering — GitHub ↗

Harness Engineering for AI Coding Agents — Augment Code ↗

SKILL.md — the open Agent Skills standard ↗

Keep Going

Loop Engineering →

Harness vs Prompt Engineering →

Best Skills by Use Case →

Stay Updated with Claude Skills