Coined June 2026 · Beyond Prompting

Loop Engineering

prompt → verify → retry → stop

Designing systems that prompt, verify, retry, and stop AI agents — so the unit of work is no longer a prompt you type, but a loop you engineer.

01Definition

What Is Loop Engineering?

For three years, working with AI meant typing a prompt, reading the answer, and typing again — the human was the loop. By mid-2026, coding agents became reliable enough at long-horizon work that the bottleneck moved: the scarce skill is no longer writing prompts, it is designing the system that writes them.

“I don’t prompt Claude anymore. I have loops that are running. They’re the ones that are prompting Claude and figuring out what to do.”

Loop engineering is the discipline that emerged from that shift: designing the outer system — contracts, state, checkers, stop conditions — that lets an agent run for hours and still ship work you can trust.

02Origin

The Origin: Three Essays, One Month

The term crystallized in June 2026 when three practitioners converged on the same observation independently:

  • 1.Boris Cherny described his day as running a couple hundred agents driven by loops that read his GitHub, Slack, and task queues and decide what to do next — The New Stack’s coverage framed it as the moment prompting stopped being the job.
  • 2.Addy Osmani published the essay that named the practice and gave it an anatomy — automations, worktrees, skills, connectors, sub-agents, and external state (the six blocks below).
  • 3.The community turned it into tooling within weeks — open-source loop starters and audit CLIs inspired directly by Cherny’s and Osmani’s setups.

Why it happened then: agents crossed the reliability threshold for long-horizon work. Once a model can recover from its own mistakes for hours at a stretch, the human bottleneck moves from answer quality to orchestration design — and orchestration design is loop engineering.

03The Stack

The Operator Loop Stack

Five layers separate a demo loop from a production loop. Each layer answers one failure mode of autonomous agents.

1. Harness

The environment: tools, skills, permissions, memory. Everything the loop can act with.

Harness engineering guide

2. Loop Contract

A machine-checkable definition of done: tests green, lints clean, spec satisfied. Without it the loop cannot know when to stop.

Eval-driven development

3. State Layer

Task lists, plan files, and worktrees that live outside the context window — so the loop survives restarts and compactions.

File-based knowledge

4. Checker

Automated verification the agent cannot skip: test suites, hooks, adversarial review agents. Trust the checker, not the transcript.

Automated testing skills

5. Human Checkpoint

The narrow gate where a person reviews: merges, deploys, irreversible actions. Autonomy everywhere else.

Code review workflows

04Anatomy

Anatomy of an Autonomous Coding Loop

Six building blocks recur in every serious loop setup — skills are one of them, which is why loop engineers end up curating skill libraries.

Automations

Cron jobs, CI triggers, and schedulers that start loops without a human.

Worktrees

Isolated git workspaces so parallel loops never trample each other.

Skills

Packaged procedures the loop executes — the knowledge layer.

Connectors

MCP servers wiring the loop to browsers, databases, and SaaS tools.

Sub-agents

Fresh-context specialists for review, research, and verification.

External State

Files, task queues, and issues that carry intent across sessions.

05Patterns

Core Loop Patterns

Plan → Execute → Verify

The base loop: write a plan to a file, implement step by step, verify against the contract before marking done. Superpowers packages this whole cycle as skills.

Loop-until-dry

For discovery work (bug hunts, audits): keep spawning finder passes until N consecutive rounds return nothing new — count-based limits miss the tail.

Two-stage review

Every change gets a spec-compliance review and a code-quality review from fresh sub-agents before it counts as complete.

Self-correcting retry

On failure, the loop feeds the error back with the original contract — bounded retries with escalation to a human when the budget runs out.

06In Production

Loops in the Wild

Four loop shapes cover most production setups today — all five stack layers apply to each, only the trigger and cadence change.

Scheduled maintenance loops

Cron-triggered: dependency bumps, flaky-test hunts, doc drift checks. Runs nightly, opens a PR when it finds work, files an issue when it can't finish.

CI-triggered repair loops

A failing build wakes the loop with the error log as its contract: reproduce, fix, verify green, push. Bounded retries, then escalate with a diagnosis.

Issue-driven feature loops

The backlog is the queue. Each issue becomes a contract; each loop gets a worktree; humans review finished PRs instead of prompting turns.

Review & audit sweeps

Loop-until-dry over a codebase: parallel finder passes, adversarial verification of each finding, stop after N dry rounds. The pattern behind serious security audits.

07Anti-Patterns

Loop Anti-Patterns

The failure modes are as convergent as the patterns. If your loop misbehaves, it’s almost certainly one of these:

The unterminated contract

“Improve the codebase” cannot halt. Every contract line must be checkable by a command with an exit code.

Checker theater

The agent reviews its own work in the same context window. Same blind spots in, same blind spots out — verification requires fresh context or deterministic execution.

The 3 a.m. yes-man

Auto-approving everything to keep the loop moving, including the one irreversible action that needed a human. Autonomy budget ≠ approval budget.

Context-window state

The loop's memory lives in the transcript. First compaction, the loop forgets what it finished — externalize state to files or it didn't happen.

Retry without diagnosis

Feeding the same failing prompt back verbatim. Retries must carry the error plus the contract, and must be bounded with escalation.

08Metrics

Measuring Your Loops

Loop engineering inherits harness engineering’s rule: unmeasured changes are vibes. Four metrics tell you whether a loop is earning its autonomy:

  • Intervention rateHuman touches per merged change — the headline autonomy metric. Should trend down as checkers mature.
  • Checker catch rateFailures caught by hooks/tests/review agents vs. failures that reached the human checkpoint. Your safety margin.
  • Rework ratioLoop-produced changes later reverted or rewritten. The honest quality signal — low intervention with high rework is a fooled checkpoint.
  • Cost per merged changeTokens and wall-clock per unit of shipped work. Loops that retry blindly show up here first.

Instrument these with the same eval discipline you’d apply to the harness — eval-driven development extends naturally from single runs to loop-level measurement.

09Skills

Skills: The Procedure Layer of Every Loop

A loop without disciplined procedures just makes mistakes faster. Skills give the loop its methodology: Superpowers packages brainstorm → plan → TDD → review as composable skills built for exactly this kind of subagent-driven execution, while Everything Claude Code ships loop infrastructure — hooks, sub-agents, and even a dedicated loop-operator agent — alongside its skills.

For the verification layers, pair them with webapp-testing, playwright-skill and eval-driven-dev — checkers the loop can run without you.

10FAQ

Loop Engineering FAQ

What is loop engineering?

Loop engineering is the practice of designing systems that prompt, verify, retry, and stop AI agents — instead of a human prompting the agent turn by turn. The term took off in June 2026 after Claude Code creator Boris Cherny described his workflow as loops that prompt Claude and figure out what to do, rather than manual prompting.

How is loop engineering different from harness engineering?

Harness engineering builds the environment the agent operates in (skills, tools, hooks, permissions). Loop engineering designs the runtime that drives the agent through that environment: the loop contract, state, checkers, and stop conditions. A good loop needs a good harness underneath it.

What is the operator loop stack?

A five-layer model for autonomous agent loops: the harness (environment), the loop contract (what done means), the state layer (files and task lists that survive restarts), the checker (automated verification), and the human checkpoint (where a person reviews before irreversible actions).

What role do skills play in agent loops?

Skills encode the procedures a loop executes — test-driven development, code review, debugging workflows. In Addy Osmani's loop anatomy, skills sit alongside automations, worktrees, connectors, sub-agents, and external state as one of the six building blocks of an autonomous coding loop.

How do I build my first agent loop?

Start with a bounded task and a verifiable definition of done (tests pass, lints clean). Give the agent a disciplined skill set, externalize state into files or task lists, add a checker the agent cannot skip, and keep a human checkpoint before merges or deploys. Expand autonomy only as your checkers catch real failures.

11Sources

Further Reading: Primary Sources

The field is one month old — these are the documents that defined it, plus the foundational agent-architecture material underneath.

12Next

Keep Going

Stay Updated with Claude Skills

Subscribe to get the latest Claude Skills, tutorials, and community highlights delivered to your inbox.

We respect your privacy. Unsubscribe at any time.