Coined June 2026 · Beyond Prompting
Loop Engineering
prompt → verify → retry → stop
Designing systems that prompt, verify, retry, and stop AI agents — so the unit of work is no longer a prompt you type, but a loop you engineer.
01 — Definition
What Is Loop Engineering?
For three years, working with AI meant typing a prompt, reading the answer, and typing again — the human was the loop. By mid-2026, coding agents became reliable enough at long-horizon work that the bottleneck moved: the scarce skill is no longer writing prompts, it is designing the system that writes them.
“I don’t prompt Claude anymore. I have loops that are running. They’re the ones that are prompting Claude and figuring out what to do.”
Loop engineering is the discipline that emerged from that shift: designing the outer system — contracts, state, checkers, stop conditions — that lets an agent run for hours and still ship work you can trust.
02 — Origin
The Origin: Three Essays, One Month
The term crystallized in June 2026 when three practitioners converged on the same observation independently:
- 1.Boris Cherny described his day as running a couple hundred agents driven by loops that read his GitHub, Slack, and task queues and decide what to do next — The New Stack’s coverage framed it as the moment prompting stopped being the job.
- 2.Addy Osmani published the essay that named the practice and gave it an anatomy — automations, worktrees, skills, connectors, sub-agents, and external state (the six blocks below).
- 3.The community turned it into tooling within weeks — open-source loop starters and audit CLIs inspired directly by Cherny’s and Osmani’s setups.
Why it happened then: agents crossed the reliability threshold for long-horizon work. Once a model can recover from its own mistakes for hours at a stretch, the human bottleneck moves from answer quality to orchestration design — and orchestration design is loop engineering.
03 — The Stack
The Operator Loop Stack
Five layers separate a demo loop from a production loop. Each layer answers one failure mode of autonomous agents.
1. Harness
The environment: tools, skills, permissions, memory. Everything the loop can act with.
Harness engineering guide →2. Loop Contract
A machine-checkable definition of done: tests green, lints clean, spec satisfied. Without it the loop cannot know when to stop.
Eval-driven development →3. State Layer
Task lists, plan files, and worktrees that live outside the context window — so the loop survives restarts and compactions.
File-based knowledge →4. Checker
Automated verification the agent cannot skip: test suites, hooks, adversarial review agents. Trust the checker, not the transcript.
Automated testing skills →5. Human Checkpoint
The narrow gate where a person reviews: merges, deploys, irreversible actions. Autonomy everywhere else.
Code review workflows →04 — Anatomy
Anatomy of an Autonomous Coding Loop
Six building blocks recur in every serious loop setup — skills are one of them, which is why loop engineers end up curating skill libraries.
Automations
Cron jobs, CI triggers, and schedulers that start loops without a human.
Worktrees
Isolated git workspaces so parallel loops never trample each other.
Skills
Packaged procedures the loop executes — the knowledge layer.
Connectors
MCP servers wiring the loop to browsers, databases, and SaaS tools.
Sub-agents
Fresh-context specialists for review, research, and verification.
External State
Files, task queues, and issues that carry intent across sessions.
05 — Patterns
Core Loop Patterns
Plan → Execute → Verify
The base loop: write a plan to a file, implement step by step, verify against the contract before marking done. Superpowers packages this whole cycle as skills.
Loop-until-dry
For discovery work (bug hunts, audits): keep spawning finder passes until N consecutive rounds return nothing new — count-based limits miss the tail.
Two-stage review
Every change gets a spec-compliance review and a code-quality review from fresh sub-agents before it counts as complete.
Self-correcting retry
On failure, the loop feeds the error back with the original contract — bounded retries with escalation to a human when the budget runs out.
06 — In Production
Loops in the Wild
Four loop shapes cover most production setups today — all five stack layers apply to each, only the trigger and cadence change.
Scheduled maintenance loops
Cron-triggered: dependency bumps, flaky-test hunts, doc drift checks. Runs nightly, opens a PR when it finds work, files an issue when it can't finish.
CI-triggered repair loops
A failing build wakes the loop with the error log as its contract: reproduce, fix, verify green, push. Bounded retries, then escalate with a diagnosis.
Issue-driven feature loops
The backlog is the queue. Each issue becomes a contract; each loop gets a worktree; humans review finished PRs instead of prompting turns.
Review & audit sweeps
Loop-until-dry over a codebase: parallel finder passes, adversarial verification of each finding, stop after N dry rounds. The pattern behind serious security audits.
07 — Anti-Patterns
Loop Anti-Patterns
The failure modes are as convergent as the patterns. If your loop misbehaves, it’s almost certainly one of these:
✗ The unterminated contract
“Improve the codebase” cannot halt. Every contract line must be checkable by a command with an exit code.
✗ Checker theater
The agent reviews its own work in the same context window. Same blind spots in, same blind spots out — verification requires fresh context or deterministic execution.
✗ The 3 a.m. yes-man
Auto-approving everything to keep the loop moving, including the one irreversible action that needed a human. Autonomy budget ≠ approval budget.
✗ Context-window state
The loop's memory lives in the transcript. First compaction, the loop forgets what it finished — externalize state to files or it didn't happen.
✗ Retry without diagnosis
Feeding the same failing prompt back verbatim. Retries must carry the error plus the contract, and must be bounded with escalation.
08 — Metrics
Measuring Your Loops
Loop engineering inherits harness engineering’s rule: unmeasured changes are vibes. Four metrics tell you whether a loop is earning its autonomy:
- Intervention rateHuman touches per merged change — the headline autonomy metric. Should trend down as checkers mature.
- Checker catch rateFailures caught by hooks/tests/review agents vs. failures that reached the human checkpoint. Your safety margin.
- Rework ratioLoop-produced changes later reverted or rewritten. The honest quality signal — low intervention with high rework is a fooled checkpoint.
- Cost per merged changeTokens and wall-clock per unit of shipped work. Loops that retry blindly show up here first.
Instrument these with the same eval discipline you’d apply to the harness — eval-driven development extends naturally from single runs to loop-level measurement.
09 — Skills
Skills: The Procedure Layer of Every Loop
A loop without disciplined procedures just makes mistakes faster. Skills give the loop its methodology: Superpowers packages brainstorm → plan → TDD → review as composable skills built for exactly this kind of subagent-driven execution, while Everything Claude Code ships loop infrastructure — hooks, sub-agents, and even a dedicated loop-operator agent — alongside its skills.
For the verification layers, pair them with webapp-testing, playwright-skill and eval-driven-dev — checkers the loop can run without you.
10 — FAQ
Loop Engineering FAQ
What is loop engineering?
Loop engineering is the practice of designing systems that prompt, verify, retry, and stop AI agents — instead of a human prompting the agent turn by turn. The term took off in June 2026 after Claude Code creator Boris Cherny described his workflow as loops that prompt Claude and figure out what to do, rather than manual prompting.
How is loop engineering different from harness engineering?
Harness engineering builds the environment the agent operates in (skills, tools, hooks, permissions). Loop engineering designs the runtime that drives the agent through that environment: the loop contract, state, checkers, and stop conditions. A good loop needs a good harness underneath it.
What is the operator loop stack?
A five-layer model for autonomous agent loops: the harness (environment), the loop contract (what done means), the state layer (files and task lists that survive restarts), the checker (automated verification), and the human checkpoint (where a person reviews before irreversible actions).
What role do skills play in agent loops?
Skills encode the procedures a loop executes — test-driven development, code review, debugging workflows. In Addy Osmani's loop anatomy, skills sit alongside automations, worktrees, connectors, sub-agents, and external state as one of the six building blocks of an autonomous coding loop.
How do I build my first agent loop?
Start with a bounded task and a verifiable definition of done (tests pass, lints clean). Give the agent a disciplined skill set, externalize state into files or task lists, add a checker the agent cannot skip, and keep a human checkpoint before merges or deploys. Expand autonomy only as your checkers catch real failures.
11 — Sources
Further Reading: Primary Sources
The field is one month old — these are the documents that defined it, plus the foundational agent-architecture material underneath.
Loop Engineering — Addy Osmani ↗
The essay that named the practice and defined the six-block anatomy: automations, worktrees, skills, connectors, sub-agents, external state.
Boris Cherny on Acquired Unplugged — WorkOS takeaways ↗
The interview behind the quote — how the creator of Claude Code runs loops instead of prompts.
The New Stack: Loop Engineering ↗
Industry coverage framing the shift from prompting to loop design.
loop-engineering — patterns & CLI starters (GitHub) ↗
Practical open-source starting points: loop-audit, loop-init, loop-cost.
Building Effective Agents — Anthropic Engineering ↗
The agent-architecture foundation every loop stands on: workflows vs agents, composable patterns.
Claude Code Best Practices — Anthropic Engineering ↗
Operating manual for the harness layer of the stack, from the team that runs the most loops.
12 — Next
Keep Going
Harness Engineering →
The environment layer underneath every loop: skills, hooks, permissions, memory.
Build Your First Loop →
A practical walkthrough: from single prompts to a self-verifying loop with skills.
Everything Claude Code →
Skills, hooks, sub-agents, and a loop-operator agent in one collection.
Stay Updated with Claude Skills
Subscribe to get the latest Claude Skills, tutorials, and community highlights delivered to your inbox.
We respect your privacy. Unsubscribe at any time.