The 2026 Discipline
Harness Engineering
Agent = Model + Harness
The discipline of building the infrastructure around an AI model — skills, tools, permissions, memory, and verification — so a capable model becomes a reliable agent.
01 — Definition
What Is Harness Engineering?
A harness is everything that surrounds the model: the tool registry it can call, the skills it loads, the permissions that bound it, the memory that persists across sessions, the hooks that fire on every action, and the checks that verify its output. Harness engineering treats this scaffolding as a first-class engineering artifact — one that co-determines agent performance as much as the underlying model does.
Agent = Model + Harness
The model is a frozen utility. Reliability, safety, and multi-step execution live in the harness.
The canonical example: telling an agent “follow our coding standards” in a prompt is probabilistic compliance. Wiring a linter hook that blocks violations is deterministic enforcement. Harness engineering is the systematic practice of converting the first into the second.
02 — Evolution
From Prompts to Harnesses: The Evolution
Prompt Engineering
Optimize the words. One prompt, one answer. Fragile and probabilistic.
Context Engineering
Optimize everything the model sees — retrieval, structure, compaction. Better inputs, same single-shot mindset.
Harness Engineering
Optimize the environment the model operates in — tools, skills, permissions, verification. Agent = Model + Harness.
Loop Engineering
Optimize the runtime that drives the agent — systems that prompt, verify, retry, and stop without a human in every turn.
03 — Components
The Harness Stack
Seven component families make up a production harness. Skills are one of them — which is exactly why a skills directory is the natural starting point for harness builders.
Skills
Versioned folders of instructions, scripts, and resources the agent loads on demand — the knowledge and procedure layer.
Browse 30,000+ skillsMCP Connectors
Model Context Protocol servers that give the agent typed, permissioned access to external systems — databases, browsers, SaaS APIs.
Skills vs MCP explainedSub-agents
Specialized agents with isolated context that the main agent dispatches for review, research, or parallel work.
See a full agent suiteHooks
Deterministic scripts that fire before or after tool calls — the enforcement layer that turns guidelines into guarantees.
Hooks in practicePermissions & Sandboxing
Allow-lists, approval modes, and sandboxes that bound what the agent can touch — safety as architecture, not vibes.
How skills stay safeMemory & State
Persistent files, task lists, and context-management strategies that survive across sessions and compactions.
Memory-heavy workflowsEvals & Observability
Automated checks, test suites, and traces that tell you whether the harness actually improved the agent.
Eval-driven development04 — Failure Modes
Why Agents Fail in Production
Anthropic’s own guidance on building effective agents makes the same point from the vendor side: most failures are environment failures. Each recurring failure mode maps to a missing harness layer.
| Failure you see | What’s actually missing | Harness fix |
|---|---|---|
| Agent “forgets” your conventions mid-task | Knowledge lives in chat, not artifacts | Package conventions as skills, loaded on demand |
| Claims tests pass when they don’t | Self-graded homework | Hooks + checkers the model cannot skip |
| Touches files it had no business touching | No boundaries | Permissions, sandboxes, scope fences |
| Repeats or abandons work after long sessions | State lives in the context window | External state: plan files, task lists, worktrees |
| Confidently calls APIs that don’t exist | Free-form access to external systems | Typed MCP connectors instead of guessed curl calls |
| Quality regresses and nobody notices | No measurement | Evals wired into the workflow |
05 — Reference
Anatomy of a Reference Harness: Claude Code
The fastest way to understand harness engineering is to dissect a production harness. Claude Code is the most-documented example — every component family has an official, public specification:
- →Skills — procedure packages following the open SKILL.md standard, progressively disclosed so context stays lean. Anthropic’s essay Equipping agents for the real world explains the design rationale.
- →Hooks — lifecycle scripts (PreToolUse, PostToolUse, Stop) that run deterministically; the official docs call out exactly the guarantee-vs-request distinction this page is built on.
- →Sub-agents — named specialists with their own system prompts, tools, and isolated context windows.
- →Model Context Protocol — the open standard for connectors; Anthropic’s Writing tools for agents is the definitive guide to designing the tool surface itself.
Study how these four compose — Claude Code best practices is effectively a harness engineering field manual — then port the patterns to whatever runtime you use.
06 — Maturity
The Harness Maturity Model
Teams don’t jump from chat to autonomy. Harness capability grows in five levels — knowing your level tells you what to build next.
L0 — Raw model
Chat interface, copy-paste context, no tools. Every session starts from zero.
Next: Add a runtime with tools and version control.
L1 — Tooled
Agent can read/write files and run commands, but improvises all procedure.
Next: Install skills so procedure is packaged, not improvised.
L2 — Skilled
Methodology collections loaded; conventions live in versioned artifacts.
Next: Convert repeated rules into hooks; scope permissions.
L3 — Enforced
Deterministic gates on every action; sub-agents review with fresh context.
Next: Wire evals so harness changes are measured, not vibed.
L4 — Measured & autonomous
Evals gate changes; loops run long-horizon work with human checkpoints at irreversibility.
Next: You're doing loop engineering — scale sideways.
07 — Skills
Where Skills Fit in the Harness
Skills are the harness component that packages procedural knowledge: how to review a PR, how to run a TDD loop, how to build a slide deck. Because they are plain folders following the open SKILL.md standard, they are the most portable part of any harness — the same skill works across Claude Code, Codex CLI, and other SKILL.md-compatible runtimes.
Flagship collections show what harness-grade skills look like: Superpowers encodes an entire engineering methodology (brainstorm → plan → TDD → review) as composable skills, and Everything Claude Code ships skills alongside the hooks, sub-agents, and commands that complete the harness.
08 — Playbook
Build Your First Harness
- 1
Start from a proven base
Use an existing harness runtime (Claude Code, Codex CLI) instead of building from scratch — you get tools, permissions, and a loop for free.
- 2
Install a methodology collection
Add Superpowers or Anthropic’s official skills so the agent has disciplined procedures before you grant autonomy.
- 3
Convert guidelines into hooks
Every rule you keep repeating in prompts is a hook candidate: linters, test gates, commit checks. Deterministic beats probabilistic.
- 4
Add evals before autonomy
Wire eval-driven development so every harness change is measured, then graduate to loop engineering.
09 — FAQ
Harness Engineering FAQ
What is harness engineering?
Harness engineering is the discipline of designing the infrastructure around an AI model — tool registries, skills, permissions, memory, hooks, sub-agents, and verification — so that a capable model becomes a reliable production agent. The core equation is Agent = Model + Harness: the model provides raw intelligence, the harness determines whether that intelligence ships reliable work.
How is harness engineering different from prompt engineering?
Prompt engineering optimizes the words you send a model and relies on probabilistic compliance. Harness engineering wires deterministic constraints around the model: a linter that blocks bad code enforces standards in a way no prompt can. Prompts ask; harnesses enforce.
Are agent skills part of the harness?
Yes. Skills are the knowledge-and-procedure layer of the harness — versioned folders of instructions and scripts the agent loads on demand. They sit alongside MCP connectors, hooks, sub-agents, permissions, and memory as one of the core harness components.
What is the difference between harness engineering and loop engineering?
Harness engineering builds the environment an agent runs in; loop engineering designs the runtime system that prompts, verifies, retries, and stops the agent without a human driving every turn. The harness is the car, the loop is the driver — loop engineering sits on top of a well-built harness.
How do I start building an agent harness?
Start with an existing harness like Claude Code, then layer components: install a skills collection such as Superpowers for workflow discipline, add hooks for deterministic checks, configure permissions, and wire MCP connectors for external tools. Measure with evals before adding autonomy.
10 — Sources
Further Reading: Primary Sources
Harness engineering is young enough that you can read essentially all of the foundational material in an afternoon. Start here:
Building Effective Agents — Anthropic Engineering ↗
The foundational essay on agent architecture: workflows vs agents, and why simple composable patterns beat frameworks.
Claude Code Best Practices — Anthropic Engineering ↗
A field manual for working inside a production harness — context, tools, permissions, verification.
Equipping Agents for the Real World with Skills — Anthropic ↗
The design rationale behind skills as a harness component: progressive disclosure, portability, composition.
Writing Tools for Agents — Anthropic Engineering ↗
How to design the tool surface itself — the deepest layer of the harness.
Harnessing Agent Skills (arXiv 2606.20631) ↗
Academic reference architecture for skill-mediated LLM agents — formalizes skills' place in the harness.
Awesome Harness Engineering — GitHub ↗
Community-curated list of harness tools, patterns, evals, memory systems, and observability.
Harness Engineering for AI Coding Agents — Augment Code ↗
An industry guide focused on constraints that ship reliable code.
SKILL.md — the open Agent Skills standard ↗
The specification that makes the skills layer portable across runtimes.
11 — Next
Keep Going
Stay Updated with Claude Skills
Subscribe to get the latest Claude Skills, tutorials, and community highlights delivered to your inbox.
We respect your privacy. Unsubscribe at any time.