Origin Intelligence
research · deep dive

Agent Loops Deep Dive

What VB Srivastav (OpenAI DevX Lead) and Peter Steinberger are actually building when they talk about "agent loops" — Codex App features, real production setups, OSS repos with loop architecture, orchestration frameworks, and the four levels of agent engineering.

🔁 Agent Loops Deep Dive

What VB Srivastav and Peter Steinberger are talking about — with real code, repos, and production examples. June 2026.

1. The Two Posts & Why They Matter

🟦 VB Srivastav (@reach_vb) — OpenAI DevX Lead

Codex App
"You can do this directly in the Codex App: 1) Automations for autonomous discovery/triage, 2) Worktrees for isolated features, 3) Skills to codify project knowledge, 4) Plugins/Connectors, 5) Sub-agents to ideate and verify. All with simple markdown/linear for state tracking."

What he means: The Codex App (OpenAI's cloud coding agent) already has all the infrastructure to build autonomous loops — you don't need to code your own orchestrator. Each feature maps to a specific capability.

🦞 Peter Steinberger (@steipete)

Agent Loops
"Here's your monthly reminder that you shouldn't be prompting coding agents anymore. You should be designing loops that prompt your agents."

What he means: Stop being the human in the loop. Instead, build systems (automations, CI triggers, scheduled scripts) that generate prompts, feed them to agents, verify the output, and loop. The human designs the loop once; the system runs continuously.

Stats: 2.5M views, 10.1K bookmarks, ~55% skeptical / ~44% supportive. His follow-up: "Don't worry it'll take 3 months until it's there. We'll be talking about fleets that design your loops then."

2. OpenAI Codex App — Feature by Feature

Source: Official Codex Developer Docs · GitHub (89.4K ⭐)

2.1 Automations — Always-On Background Agents

📚 Official Docs

Scheduled recurring tasks that run unattended. Two types:

  • Standalone automations — cron-scheduled, independent runs (e.g., daily triage)
  • Thread automations — heartbeat-style recurring wake-ups that preserve context

Use cases: issue triage, CI failure monitoring, alert response, PR babysitting, Sentry error triage, daily project briefings.

Results go to a Triage inbox — Codex archives if nothing to report.

Real Example — Self-Improving Skills

Scan all of the ~/.codex/sessions files from the past day
and if there have been any issues using particular skills,
update the skills to be more helpful.

If there's anything we've been doing often that we should
save as a skill, let's do it.

2.2 Worktrees — Isolated Parallel Environments

📚 Official Docs

Uses Git worktrees under the hood — separate checkouts sharing the same .git metadata but with their own file copies.

  • Each agent works in its own worktree — no file conflicts between parallel agents
  • Handoff: move threads between Local ↔ Worktree seamlessly
  • Starts in detached HEAD state (avoids polluting branches)
  • Default limit: 15 managed worktrees with snapshot backup before deletion
  • Stored in $CODEX_HOME/worktrees

2.3 Skills — Codified Project Knowledge

📚 Official Docs · Skills Catalog · Open Standard (agentskills.io)

SKILL.md files that package instructions, resources, and optional scripts. Progressive disclosure: Codex loads only name/description initially, full instructions on invocation.

  • Activation: Explicit ($skill-name) or Implicit (auto-matched by description)
  • Scope: Repo ($CWD/.agents/skills), User ($HOME/.agents/skills), Admin (/etc/codex/skills), System (bundled)
  • Built-in $skill-creator for generating new skills

Example SKILL.md

---
name: commit
description: Stage and commit changes in semantic groups.
---
1. Never run `git add .` — stage files in logical groups
2. Group into separate commits: feat → test → docs → refactor → chore
3. Write concise commit messages in Conventional Commit format
4. Run `npm run lint && npm test` before committing

2.4 Plugins & Connectors

📚 Official Docs

Plugins bundle skills + app integrations + MCP servers into installable packages.

Built-in Plugins

  • Gmail
  • Google Drive
  • Slack
  • Sora
  • Playwright
  • Codex Security

Connectors (event-driven)

  • Linear: Assign issues to @Codex → cloud agent spins up → posts updates
  • GitHub: PR workflows, issue triage

Linear MCP Setup

codex mcp add linear --url https://mcp.linear.app/mcp

Source: Linear Integration Docs

2.5 Sub-agents — Parallel Ideation & Verification

📚 Official Docs

Spawn specialized agents in parallel, collect results, return consolidated response.

  • Built-in: default, worker (execution-focused), explorer (read-heavy)
  • Max 6 concurrent threads, max depth 1 (configurable)
  • Inherit sandbox policy from parent; can override per agent

Custom Agent Definition (pr-explorer.toml)

[agent]
name = "pr_explorer"
description = "Read-only codebase explorer for gathering evidence."
model = "gpt-5.3-codex-spark"
model_reasoning_effort = "medium"
sandbox_mode = "read-only"
developer_instructions = """
Stay in exploration mode. Trace execution paths, cite files and symbols.
Prefer fast search and targeted file reads over broad scans.
"""

Multi-Agent Prompt Example

Spawn one agent per review point, wait for all, summarize:
1. Security issues   2. Code quality   3. Bugs
4. Race conditions   5. Test flakiness  6. Maintainability

2.6 Markdown/Linear State Tracking

A) Markdown — Durable Project Memory

From the Long-Horizon Tasks Guide:

  • Prompt.md — Spec + deliverables
  • Plan.md — Milestones + validation criteria
  • Implement.md — Execution runbook
  • Documentation.md — Status + audit log

Example repo: github.com/derrickchoi-openai/design-desk

B) Linear as State Machine — Symphony

Symphony (GitHub) · Blog Post

Open-source orchestration spec: every open Linear issue → dedicated agent workspace. Ticket statuses drive workflow transitions (Todo → In Progress → Review → Done). Agents transition issues, create sub-issues, file discoveries.

Result: 500% increase in landed PRs on some teams.

3. The "Loop That Prompts Agents" Concept

THE SHIFT: From Human-in-the-Loop to Human-Designs-the-Loop ┌──────────────────────────────────────────────────────────┐ │ OLD WAY: You prompt → Agent does → You review → Repeat │ │ │ │ Human ──prompt──▶ Agent ──code──▶ Human ──review──▶ ... │ └──────────────────────────────────────────────────────────┘ ▼ ┌──────────────────────────────────────────────────────────┐ │ NEW WAY: You design the loop → System runs continuously │ │ │ │ Trigger ──prompt──▶ Agent ──code──▶ Verifier ──pass?──┐ │ │ (cron/ (auto- (runs, (CI, lint, │ │ │ event/ generated tests, tests) │ │ │ webhook) from commits) │ │ │ context) ▼ │ │ ┌─────────┐│ │ ◀──fail + error──── │ Re-prompt││ │ └─────────┘│ │ Human reviews PRs only. Loop runs 24/7. │ └──────────────────────────────────────────────────────────┘

The core idea: every autonomous coding agent is fundamentally a while-loop:

while not done:
    observation = environment.observe()     # read files, errors, issues
    action = llm.decide(context + observation)  # what to do next
    result = environment.execute(action)    # run command, edit file
    context.append(result)                  # learn from outcome

The shift is about who/what triggers and manages this loop. Instead of a human typing a prompt, it's a system: a cron job, a GitHub webhook, a CI failure, a Slack message, or another agent.

4. Peter Steinberger's Actual Setup

His Stack — Solo Dev, ~300K LOC TypeScript

OpenClaw

From his blog "Just Talk To It" and Pragmatic Engineer interview:

  • Runs 3-8 parallel Codex CLI instances in a 3×3 terminal grid
  • Uses OpenClaw (openclaw.ai) as a supervisor over Codex instances
  • Each agent does atomic git commits guided by an agent file
  • Uses VISION.md per project as strategic guidance for agents
  • Cost: ~$1K/month on subscriptions

Key Repos

His Principles

  • "Almost all MCPs should be CLIs" — MCPs cost 23K+ context tokens; CLIs cost zero (agent learns via --help)
  • Agents must close the loop — compile, lint, execute, validate their own work
  • "PRs are dead — long live Prompt Requests" — review the prompt, not the code
  • Under-prompt intentionally — sometimes vague prompts let AI explore better directions
  • Stop models mid-way freely — file changes are atomic, models resume

5. Open-Source Agent Repos (with Loop Architecture)

OpenHands (formerly OpenDevin)

65K ⭐
What:Full autonomous AI software engineer in sandboxed Docker
Loop:Event-driven agent loop inside container. LLM receives observations → decides action (shell, browser, file edit) → executes → loops. CodeAct agent is primary architecture.

aider

41K ⭐
What:AI pair programming in terminal with auto-commits
Loop:REPL-style: prompt → LLM generates diffs → applies edits → auto-runs linter/tests → feeds errors back for self-correction. "Architect mode" = one model plans, another executes.

Codex CLI (OpenAI)

89.4K ⭐
Loop:User input → prompt assembly → model inference via Responses API → tool call or final message → if tool call: execute (shell, file edit) → append output → re-query model → repeat. Only 1.6% of codebase is AI logic; 98.4% is operational infrastructure.

SWE-agent (Princeton)

19K ⭐
What:Takes a GitHub issue, autonomously fixes it
Loop:Observe-act: LLM gets a "shell" with custom ACI (Agent-Computer Interface) commands → iterates: read issue → explore repo → edit → test → submit patch.

Sweep AI

7K ⭐
Loop:Pure "automated prompt" pattern: GitHub webhook (issue labeled) → reads codebase via embeddings → plans changes → implements → creates PR → runs CI → self-corrects on failure → loops. Zero human prompting.

Pi Agent

418 lines
What:Terminal-native agent loop in 418 lines of TypeScript. Ranks with Claude Code and Cursor on Terminal-Bench 2.0.

The "Ralph Loop"

Pattern

Named after Ralph Wiggum. The dominant autonomous coding loop pattern:

Key insight from Huntley: monolithic > multi-agent because non-deterministic microservices = "a red hot mess".

6. Orchestration Frameworks

🔗 LangGraph

Graph-based state machines for agent orchestration. Supports cycles (true iterative loops), checkpointing for long-running agents.

🤖 AutoGen (Microsoft)

Multi-agent orchestration. Agents converse and collaborate to solve tasks.

🚢 CrewAI

Repo:crewaiinc/crewai 30K+ ⭐

Role-based multi-agent framework. Crews with roles, goals, tools. Sequential & parallel execution.

🎵 Symphony (OpenAI)

Linear ↔ Codex orchestration. Ticket statuses drive agent workflow transitions.

📦 DSPy

"Program, Don't Prompt" — compile declarative modules into optimized prompts. Conceptual ancestor of the whole philosophy.

🧠 Mozilla cq

Shared agent learning store — agents store and query discoveries across sessions.

GitHub Agentic Workflows

Blog · Docs

This IS the "loop that prompts agents" built into GitHub. Workflow files in .github/workflows/<name>.md — YAML frontmatter + Markdown instructions. Triggered by schedule or events.

Six categories: Continuous Triage, Documentation, Code Simplification, Test Improvement, Quality Hygiene, Reporting.

No human triggers individual runs. The system IS the loop.

7. Companies Doing This in Production

🟠 Cognition (Devin)

$73M ARR
  • Single-threaded continuous-context agent for coding
  • Code-Review Loop: Coding agent writes → separate review agent finds bugs (avg 2 per PR, 58% severe) → coding agent fixes
  • Key insight: Coding + review agents work best when they do NOT share context
  • "Smart Friend" pattern: Small fast model as primary, frontier model as on-demand consultant tool
  • Users created scripts for "Devins to manage other Devins"
  • $1M → $73M ARR in 9 months (Sep 2024 → Jun 2025)

🟢 Spotify — Honk System

1,500+ PRs merged
  • Internal agent built on Claude Code / Claude Agent SDK
  • Engineers assign tasks via Slack → Honk agent runs in background → generates PR → CI validates → human reviews
  • Agent loops on CI failures automatically
  • 1,500+ AI-generated PRs merged across hundreds of repos since mid-2024

🏭 Factory.ai

Agent-Native
  • Multi-agent "Droid" system with coordinator pattern
  • Core loop: Explore → Plan → Code → Verify
  • Key principles: (1) Requests precise enough that success is demonstrable, (2) Tasks small enough that wrong assumptions don't compound, (3) Environments for automatic objective verification

🔮 Augment Code

Harness Engineering
  • Remote Agents that run in parallel, autonomously
  • Their agent built itself — continuously improves its own codebase
  • Three harness layers: (1) Constraint Harnesses (feedforward — rules, lint), (2) Feedback Loops (corrective — structured error signals), (3) Quality Gates (CI blocks non-compliant code)

🔵 OpenAI Internal Usage

88 AGENTS.md files

OpenAI uses 88 AGENTS.md files across their monorepo for constraint composition. They enforce "taste invariants" as hard CI failures, not warnings.

8. The Four Levels of Agent Engineering

Source: Daniel Demmel — Feedback Loop Engineering

Level 4 │ HARNESS ENGINEERING ◀── Everything in an agent except the model │ (guides + sensors) (Augment, OpenAI, Factory) │ Level 3 │ FEEDBACK LOOP ENGINEERING ◀── ★ HIGHEST ROI TODAY ★ │ (tools for agents to verify (CI, lint, test runners, │ their own work at runtime) structured error signals) │ Level 2 │ CONTEXT ENGINEERING ◀── What goes in the prompt │ (CLAUDE.md, AGENTS.md, (steipete agent-scripts, │ docs, skills) Codex Skills) │ Level 1 │ PROMPT ENGINEERING ◀── How you ask │ (manual prompting) (where most people are stuck)

Inner loop = agent runs code → reads result → iterates within one session

Outer loop = one session's lesson becomes knowledge for all future sessions (e.g., Mozilla cq, Codex self-improving skills automation)

Official Codex

Codex Features Docs

Peter Steinberger

Open-Source Agent Repos

Frameworks & Patterns

Company Engineering Blogs

Research compiled June 8, 2026. All links verified at time of research. Star counts approximate.