Agent Loops Deep Dive
What VB Srivastav (OpenAI DevX Lead) and Peter Steinberger are actually building when they talk about "agent loops" — Codex App features, real production setups, OSS repos with loop architecture, orchestration frameworks, and the four levels of agent engineering.
🔁 Agent Loops Deep Dive
What VB Srivastav and Peter Steinberger are talking about — with real code, repos, and production examples. June 2026.
- The Two Posts & Why They Matter
- OpenAI Codex App — Feature by Feature
- The "Loop That Prompts Agents" Concept
- Peter Steinberger's Actual Setup
- Open-Source Agent Repos (with Loop Architecture)
- Orchestration Frameworks
- Companies Doing This in Production
- The Four Levels of Agent Engineering
- All Links & Resources
1. The Two Posts & Why They Matter
🟦 VB Srivastav (@reach_vb) — OpenAI DevX Lead
Codex AppWhat he means: The Codex App (OpenAI's cloud coding agent) already has all the infrastructure to build autonomous loops — you don't need to code your own orchestrator. Each feature maps to a specific capability.
🦞 Peter Steinberger (@steipete)
Agent LoopsWhat he means: Stop being the human in the loop. Instead, build systems (automations, CI triggers, scheduled scripts) that generate prompts, feed them to agents, verify the output, and loop. The human designs the loop once; the system runs continuously.
Stats: 2.5M views, 10.1K bookmarks, ~55% skeptical / ~44% supportive. His follow-up: "Don't worry it'll take 3 months until it's there. We'll be talking about fleets that design your loops then."
2. OpenAI Codex App — Feature by Feature
Source: Official Codex Developer Docs · GitHub (89.4K ⭐)
2.1 Automations — Always-On Background Agents
Scheduled recurring tasks that run unattended. Two types:
- Standalone automations — cron-scheduled, independent runs (e.g., daily triage)
- Thread automations — heartbeat-style recurring wake-ups that preserve context
Use cases: issue triage, CI failure monitoring, alert response, PR babysitting, Sentry error triage, daily project briefings.
Results go to a Triage inbox — Codex archives if nothing to report.
Real Example — Self-Improving Skills
Scan all of the ~/.codex/sessions files from the past day and if there have been any issues using particular skills, update the skills to be more helpful. If there's anything we've been doing often that we should save as a skill, let's do it.
2.2 Worktrees — Isolated Parallel Environments
Uses Git worktrees under the hood — separate checkouts sharing the same .git metadata but with their own file copies.
- Each agent works in its own worktree — no file conflicts between parallel agents
- Handoff: move threads between Local ↔ Worktree seamlessly
- Starts in detached HEAD state (avoids polluting branches)
- Default limit: 15 managed worktrees with snapshot backup before deletion
- Stored in
$CODEX_HOME/worktrees
2.3 Skills — Codified Project Knowledge
📚 Official Docs · Skills Catalog · Open Standard (agentskills.io)
SKILL.md files that package instructions, resources, and optional scripts. Progressive disclosure: Codex loads only name/description initially, full instructions on invocation.
- Activation: Explicit (
$skill-name) or Implicit (auto-matched by description) - Scope: Repo (
$CWD/.agents/skills), User ($HOME/.agents/skills), Admin (/etc/codex/skills), System (bundled) - Built-in
$skill-creatorfor generating new skills
Example SKILL.md
--- name: commit description: Stage and commit changes in semantic groups. --- 1. Never run `git add .` — stage files in logical groups 2. Group into separate commits: feat → test → docs → refactor → chore 3. Write concise commit messages in Conventional Commit format 4. Run `npm run lint && npm test` before committing
2.4 Plugins & Connectors
Plugins bundle skills + app integrations + MCP servers into installable packages.
Built-in Plugins
- Gmail
- Google Drive
- Slack
- Sora
- Playwright
- Codex Security
Connectors (event-driven)
- Linear: Assign issues to
@Codex→ cloud agent spins up → posts updates - GitHub: PR workflows, issue triage
Linear MCP Setup
codex mcp add linear --url https://mcp.linear.app/mcp
Source: Linear Integration Docs
2.5 Sub-agents — Parallel Ideation & Verification
Spawn specialized agents in parallel, collect results, return consolidated response.
- Built-in:
default,worker(execution-focused),explorer(read-heavy) - Max 6 concurrent threads, max depth 1 (configurable)
- Inherit sandbox policy from parent; can override per agent
Custom Agent Definition (pr-explorer.toml)
[agent] name = "pr_explorer" description = "Read-only codebase explorer for gathering evidence." model = "gpt-5.3-codex-spark" model_reasoning_effort = "medium" sandbox_mode = "read-only" developer_instructions = """ Stay in exploration mode. Trace execution paths, cite files and symbols. Prefer fast search and targeted file reads over broad scans. """
Multi-Agent Prompt Example
Spawn one agent per review point, wait for all, summarize: 1. Security issues 2. Code quality 3. Bugs 4. Race conditions 5. Test flakiness 6. Maintainability
2.6 Markdown/Linear State Tracking
A) Markdown — Durable Project Memory
From the Long-Horizon Tasks Guide:
Prompt.md— Spec + deliverablesPlan.md— Milestones + validation criteriaImplement.md— Execution runbookDocumentation.md— Status + audit log
Example repo: github.com/derrickchoi-openai/design-desk
B) Linear as State Machine — Symphony
Open-source orchestration spec: every open Linear issue → dedicated agent workspace. Ticket statuses drive workflow transitions (Todo → In Progress → Review → Done). Agents transition issues, create sub-issues, file discoveries.
Result: 500% increase in landed PRs on some teams.
3. The "Loop That Prompts Agents" Concept
The core idea: every autonomous coding agent is fundamentally a while-loop:
while not done:
observation = environment.observe() # read files, errors, issues
action = llm.decide(context + observation) # what to do next
result = environment.execute(action) # run command, edit file
context.append(result) # learn from outcome
The shift is about who/what triggers and manages this loop. Instead of a human typing a prompt, it's a system: a cron job, a GitHub webhook, a CI failure, a Slack message, or another agent.
4. Peter Steinberger's Actual Setup
His Stack — Solo Dev, ~300K LOC TypeScript
OpenClawFrom his blog "Just Talk To It" and Pragmatic Engineer interview:
- Runs 3-8 parallel Codex CLI instances in a 3×3 terminal grid
- Uses OpenClaw (openclaw.ai) as a supervisor over Codex instances
- Each agent does atomic git commits guided by an agent file
- Uses
VISION.mdper project as strategic guidance for agents - Cost: ~$1K/month on subscriptions
Key Repos
Current. Shared agent instructions, skills, portable helpers. Contains AGENTS.MD + skills/ + scripts/
His Principles
- "Almost all MCPs should be CLIs" — MCPs cost 23K+ context tokens; CLIs cost zero (agent learns via
--help) - Agents must close the loop — compile, lint, execute, validate their own work
- "PRs are dead — long live Prompt Requests" — review the prompt, not the code
- Under-prompt intentionally — sometimes vague prompts let AI explore better directions
- Stop models mid-way freely — file changes are atomic, models resume
5. Open-Source Agent Repos (with Loop Architecture)
OpenHands (formerly OpenDevin)
65K ⭐aider
41K ⭐Codex CLI (OpenAI)
89.4K ⭐SWE-agent (Princeton)
19K ⭐Sweep AI
7K ⭐Pi Agent
418 linesThe "Ralph Loop"
PatternNamed after Ralph Wiggum. The dominant autonomous coding loop pattern:
Key insight from Huntley: monolithic > multi-agent because non-deterministic microservices = "a red hot mess".
6. Orchestration Frameworks
🔗 LangGraph
Graph-based state machines for agent orchestration. Supports cycles (true iterative loops), checkpointing for long-running agents.
🤖 AutoGen (Microsoft)
Multi-agent orchestration. Agents converse and collaborate to solve tasks.
🚢 CrewAI
Role-based multi-agent framework. Crews with roles, goals, tools. Sequential & parallel execution.
🎵 Symphony (OpenAI)
Linear ↔ Codex orchestration. Ticket statuses drive agent workflow transitions.
📦 DSPy
"Program, Don't Prompt" — compile declarative modules into optimized prompts. Conceptual ancestor of the whole philosophy.
🧠 Mozilla cq
Shared agent learning store — agents store and query discoveries across sessions.
GitHub Agentic Workflows
This IS the "loop that prompts agents" built into GitHub. Workflow files in .github/workflows/<name>.md — YAML frontmatter + Markdown instructions. Triggered by schedule or events.
Six categories: Continuous Triage, Documentation, Code Simplification, Test Improvement, Quality Hygiene, Reporting.
No human triggers individual runs. The system IS the loop.
7. Companies Doing This in Production
🟠 Cognition (Devin)
$73M ARR- Single-threaded continuous-context agent for coding
- Code-Review Loop: Coding agent writes → separate review agent finds bugs (avg 2 per PR, 58% severe) → coding agent fixes
- Key insight: Coding + review agents work best when they do NOT share context
- "Smart Friend" pattern: Small fast model as primary, frontier model as on-demand consultant tool
- Users created scripts for "Devins to manage other Devins"
- $1M → $73M ARR in 9 months (Sep 2024 → Jun 2025)
🟢 Spotify — Honk System
1,500+ PRs merged- Internal agent built on Claude Code / Claude Agent SDK
- Engineers assign tasks via Slack → Honk agent runs in background → generates PR → CI validates → human reviews
- Agent loops on CI failures automatically
- 1,500+ AI-generated PRs merged across hundreds of repos since mid-2024
🏭 Factory.ai
Agent-Native- Multi-agent "Droid" system with coordinator pattern
- Core loop: Explore → Plan → Code → Verify
- Key principles: (1) Requests precise enough that success is demonstrable, (2) Tasks small enough that wrong assumptions don't compound, (3) Environments for automatic objective verification
🔮 Augment Code
Harness Engineering- Remote Agents that run in parallel, autonomously
- Their agent built itself — continuously improves its own codebase
- Three harness layers: (1) Constraint Harnesses (feedforward — rules, lint), (2) Feedback Loops (corrective — structured error signals), (3) Quality Gates (CI blocks non-compliant code)
🔵 OpenAI Internal Usage
88 AGENTS.md filesOpenAI uses 88 AGENTS.md files across their monorepo for constraint composition. They enforce "taste invariants" as hard CI failures, not warnings.
8. The Four Levels of Agent Engineering
Source: Daniel Demmel — Feedback Loop Engineering
Inner loop = agent runs code → reads result → iterates within one session
Outer loop = one session's lesson becomes knowledge for all future sessions (e.g., Mozilla cq, Codex self-improving skills automation)
9. All Links & Resources
Official Codex
Codex Features Docs
Peter Steinberger
Open-Source Agent Repos
Frameworks & Patterns
Company Engineering Blogs
Research compiled June 8, 2026. All links verified at time of research. Star counts approximate.