research · deep dive

Agent Loops Deep Dive

June 8, 2026 Luis Sanchez 16 min read EN

What VB Srivastav (OpenAI DevX Lead) and Peter Steinberger are actually building when they talk about "agent loops" — Codex App features, real production setups, OSS repos with loop architecture, orchestration frameworks, and the four levels of agent engineering.

🔁 Agent Loops Deep Dive

What VB Srivastav and Peter Steinberger are talking about — with real code, repos, and production examples. June 2026.

Contents

The Two Posts & Why They Matter
OpenAI Codex App — Feature by Feature
The "Loop That Prompts Agents" Concept
Peter Steinberger's Actual Setup
Open-Source Agent Repos (with Loop Architecture)
Orchestration Frameworks
Companies Doing This in Production
The Four Levels of Agent Engineering
All Links & Resources

1. The Two Posts & Why They Matter

🟦 VB Srivastav (@reach_vb) — OpenAI DevX Lead

Codex App

"You can do this directly in the Codex App: 1) Automations for autonomous discovery/triage, 2) Worktrees for isolated features, 3) Skills to codify project knowledge, 4) Plugins/Connectors, 5) Sub-agents to ideate and verify. All with simple markdown/linear for state tracking."

What he means: The Codex App (OpenAI's cloud coding agent) already has all the infrastructure to build autonomous loops — you don't need to code your own orchestrator. Each feature maps to a specific capability.

🦞 Peter Steinberger (@steipete)

Agent Loops

"Here's your monthly reminder that you shouldn't be prompting coding agents anymore. You should be designing loops that prompt your agents."

What he means: Stop being the human in the loop. Instead, build systems (automations, CI triggers, scheduled scripts) that generate prompts, feed them to agents, verify the output, and loop. The human designs the loop once; the system runs continuously.

Stats: 2.5M views, 10.1K bookmarks, ~55% skeptical / ~44% supportive. His follow-up: "Don't worry it'll take 3 months until it's there. We'll be talking about fleets that design your loops then."

2. OpenAI Codex App — Feature by Feature

Source: Official Codex Developer Docs · GitHub (89.4K ⭐)

2.1 Automations — Always-On Background Agents

📚 Official Docs

Scheduled recurring tasks that run unattended. Two types:

Standalone automations — cron-scheduled, independent runs (e.g., daily triage)
Thread automations — heartbeat-style recurring wake-ups that preserve context

Use cases: issue triage, CI failure monitoring, alert response, PR babysitting, Sentry error triage, daily project briefings.

Results go to a Triage inbox — Codex archives if nothing to report.

Real Example — Self-Improving Skills

Scan all of the ~/.codex/sessions files from the past day
and if there have been any issues using particular skills,
update the skills to be more helpful.

If there's anything we've been doing often that we should
save as a skill, let's do it.

2.2 Worktrees — Isolated Parallel Environments

📚 Official Docs

Uses Git worktrees under the hood — separate checkouts sharing the same .git metadata but with their own file copies.

Each agent works in its own worktree — no file conflicts between parallel agents
Handoff: move threads between Local ↔ Worktree seamlessly
Starts in detached HEAD state (avoids polluting branches)
Default limit: 15 managed worktrees with snapshot backup before deletion
Stored in $CODEX_HOME/worktrees

2.3 Skills — Codified Project Knowledge

📚 Official Docs · Skills Catalog · Open Standard (agentskills.io)

SKILL.md files that package instructions, resources, and optional scripts. Progressive disclosure: Codex loads only name/description initially, full instructions on invocation.

Activation: Explicit ($skill-name) or Implicit (auto-matched by description)
Scope: Repo ($CWD/.agents/skills), User ($HOME/.agents/skills), Admin (/etc/codex/skills), System (bundled)
Built-in $skill-creator for generating new skills

Example SKILL.md

---
name: commit
description: Stage and commit changes in semantic groups.
---
1. Never run `git add .` — stage files in logical groups
2. Group into separate commits: feat → test → docs → refactor → chore
3. Write concise commit messages in Conventional Commit format
4. Run `npm run lint && npm test` before committing

2.4 Plugins & Connectors

📚 Official Docs

Plugins bundle skills + app integrations + MCP servers into installable packages.

Built-in Plugins

Gmail
Google Drive
Slack
Sora
Playwright
Codex Security

Connectors (event-driven)

Linear: Assign issues to @Codex → cloud agent spins up → posts updates
GitHub: PR workflows, issue triage

Linear MCP Setup

codex mcp add linear --url https://mcp.linear.app/mcp

Source: Linear Integration Docs

2.5 Sub-agents — Parallel Ideation & Verification

📚 Official Docs

Spawn specialized agents in parallel, collect results, return consolidated response.

Built-in: default, worker (execution-focused), explorer (read-heavy)
Max 6 concurrent threads, max depth 1 (configurable)
Inherit sandbox policy from parent; can override per agent

Custom Agent Definition (pr-explorer.toml)

[agent]
name = "pr_explorer"
description = "Read-only codebase explorer for gathering evidence."
model = "gpt-5.3-codex-spark"
model_reasoning_effort = "medium"
sandbox_mode = "read-only"
developer_instructions = """
Stay in exploration mode. Trace execution paths, cite files and symbols.
Prefer fast search and targeted file reads over broad scans.
"""

Multi-Agent Prompt Example

Spawn one agent per review point, wait for all, summarize:
1. Security issues   2. Code quality   3. Bugs
4. Race conditions   5. Test flakiness  6. Maintainability

2.6 Markdown/Linear State Tracking

A) Markdown — Durable Project Memory

From the Long-Horizon Tasks Guide:

Prompt.md — Spec + deliverables
Plan.md — Milestones + validation criteria
Implement.md — Execution runbook
Documentation.md — Status + audit log

Example repo: github.com/derrickchoi-openai/design-desk

B) Linear as State Machine — Symphony

Symphony (GitHub) · Blog Post

Open-source orchestration spec: every open Linear issue → dedicated agent workspace. Ticket statuses drive workflow transitions (Todo → In Progress → Review → Done). Agents transition issues, create sub-issues, file discoveries.

Result: 500% increase in landed PRs on some teams.

3. The "Loop That Prompts Agents" Concept

THE SHIFT: From Human-in-the-Loop to Human-Designs-the-Loop ┌──────────────────────────────────────────────────────────┐ │ OLD WAY: You prompt → Agent does → You review → Repeat │ │ │ │ Human ──prompt──▶ Agent ──code──▶ Human ──review──▶ ... │ └──────────────────────────────────────────────────────────┘ ▼ ┌──────────────────────────────────────────────────────────┐ │ NEW WAY: You design the loop → System runs continuously │ │ │ │ Trigger ──prompt──▶ Agent ──code──▶ Verifier ──pass?──┐ │ │ (cron/ (auto- (runs, (CI, lint, │ │ │ event/ generated tests, tests) │ │ │ webhook) from commits) │ │ │ context) ▼ │ │ ┌─────────┐│ │ ◀──fail + error──── │ Re-prompt││ │ └─────────┘│ │ Human reviews PRs only. Loop runs 24/7. │ └──────────────────────────────────────────────────────────┘

The core idea: every autonomous coding agent is fundamentally a while-loop:

while not done:
    observation = environment.observe()     # read files, errors, issues
    action = llm.decide(context + observation)  # what to do next
    result = environment.execute(action)    # run command, edit file
    context.append(result)                  # learn from outcome

The shift is about who/what triggers and manages this loop. Instead of a human typing a prompt, it's a system: a cron job, a GitHub webhook, a CI failure, a Slack message, or another agent.

4. Peter Steinberger's Actual Setup

His Stack — Solo Dev, ~300K LOC TypeScript

OpenClaw

From his blog "Just Talk To It" and Pragmatic Engineer interview:

Runs 3-8 parallel Codex CLI instances in a 3×3 terminal grid
Uses OpenClaw (openclaw.ai) as a supervisor over Codex instances
Each agent does atomic git commits guided by an agent file
Uses VISION.md per project as strategic guidance for agents
Cost: ~$1K/month on subscriptions

Key Repos

agent-scripts 4.3K ⭐
Current. Shared agent instructions, skills, portable helpers. Contains AGENTS.MD + skills/ + scripts/

agent-rules 5.7K ⭐
Archived May 2026. Old unified .mdc rules for Claude Code/Cursor.

His Principles

"Almost all MCPs should be CLIs" — MCPs cost 23K+ context tokens; CLIs cost zero (agent learns via --help)
Agents must close the loop — compile, lint, execute, validate their own work
"PRs are dead — long live Prompt Requests" — review the prompt, not the code
Under-prompt intentionally — sometimes vague prompts let AI explore better directions
Stop models mid-way freely — file changes are atomic, models resume

5. Open-Source Agent Repos (with Loop Architecture)

OpenHands (formerly OpenDevin)

65K ⭐

Repo:github.com/OpenHands/openhands

What:Full autonomous AI software engineer in sandboxed Docker

Loop:Event-driven agent loop inside container. LLM receives observations → decides action (shell, browser, file edit) → executes → loops. CodeAct agent is primary architecture.

aider

41K ⭐

Repo:github.com/aider-ai/aider

What:AI pair programming in terminal with auto-commits

Loop:REPL-style: prompt → LLM generates diffs → applies edits → auto-runs linter/tests → feeds errors back for self-correction. "Architect mode" = one model plans, another executes.

Codex CLI (OpenAI)

89.4K ⭐

Repo:github.com/openai/codex

Blog:Unrolling the Codex Agent Loop

Loop:User input → prompt assembly → model inference via Responses API → tool call or final message → if tool call: execute (shell, file edit) → append output → re-query model → repeat. Only 1.6% of codebase is AI logic; 98.4% is operational infrastructure.

SWE-agent (Princeton)

19K ⭐

Repo:github.com/swe-agent/swe-agent

What:Takes a GitHub issue, autonomously fixes it

Loop:Observe-act: LLM gets a "shell" with custom ACI (Agent-Computer Interface) commands → iterates: read issue → explore repo → edit → test → submit patch.

Sweep AI

7K ⭐

Repo:github.com/sweepai/sweep

Loop:Pure "automated prompt" pattern: GitHub webhook (issue labeled) → reads codebase via embeddings → plans changes → implements → creates PR → runs CI → self-corrects on failure → loops. Zero human prompting.

Pi Agent

418 lines

Repo:github.com/earendil-works/pi

What:Terminal-native agent loop in 418 lines of TypeScript. Ranks with Claude Code and Cursor on Terminal-Bench 2.0.

Rust port:github.com/Dicklesworthstone/pi_agent_rust

The "Ralph Loop"

Pattern

Named after Ralph Wiggum. The dominant autonomous coding loop pattern:

ralph-claude-code — Open-source Ralph loop for Claude Code

ghuntley.com/loop — Geoffrey Huntley's "Loom" — monolithic orchestrator

aihero.dev Ralph tutorial

Cursor forum: Ralph feature request

Key insight from Huntley: monolithic > multi-agent because non-deterministic microservices = "a red hot mess".

6. Orchestration Frameworks

🔗 LangGraph

Repo:langchain-ai/langgraph

Graph-based state machines for agent orchestration. Supports cycles (true iterative loops), checkpointing for long-running agents.

🤖 AutoGen (Microsoft)

Repo:microsoft/autogen

Multi-agent orchestration. Agents converse and collaborate to solve tasks.

🚢 CrewAI

Repo:crewaiinc/crewai 30K+ ⭐

Role-based multi-agent framework. Crews with roles, goals, tools. Sequential & parallel execution.

🎵 Symphony (OpenAI)

Repo:openai/symphony

Linear ↔ Codex orchestration. Ticket statuses drive agent workflow transitions.

📦 DSPy

URL:dspy.ai

"Program, Don't Prompt" — compile declarative modules into optimized prompts. Conceptual ancestor of the whole philosophy.

🧠 Mozilla cq

Repo:mozilla-ai/cq

Shared agent learning store — agents store and query discoveries across sessions.

GitHub Agentic Workflows

Blog · Docs

This IS the "loop that prompts agents" built into GitHub. Workflow files in .github/workflows/<name>.md — YAML frontmatter + Markdown instructions. Triggered by schedule or events.

Six categories: Continuous Triage, Documentation, Code Simplification, Test Improvement, Quality Hygiene, Reporting.

No human triggers individual runs. The system IS the loop.

7. Companies Doing This in Production

🟠 Cognition (Devin)

$73M ARR

Blog:Multi-Agents: What's Working

Single-threaded continuous-context agent for coding
Code-Review Loop: Coding agent writes → separate review agent finds bugs (avg 2 per PR, 58% severe) → coding agent fixes
Key insight: Coding + review agents work best when they do NOT share context
"Smart Friend" pattern: Small fast model as primary, frontier model as on-demand consultant tool
Users created scripts for "Devins to manage other Devins"
$1M → $73M ARR in 9 months (Sep 2024 → Jun 2025)

🟢 Spotify — Honk System

1,500+ PRs merged

Blog:Background Coding Agent Part 1

Internal agent built on Claude Code / Claude Agent SDK
Engineers assign tasks via Slack → Honk agent runs in background → generates PR → CI validates → human reviews
Agent loops on CI failures automatically
1,500+ AI-generated PRs merged across hundreds of repos since mid-2024

🏭 Factory.ai

Agent-Native

Blog:Build With Agents

Multi-agent "Droid" system with coordinator pattern
Core loop: Explore → Plan → Code → Verify
Key principles: (1) Requests precise enough that success is demonstrable, (2) Tasks small enough that wrong assumptions don't compound, (3) Environments for automatic objective verification

🔮 Augment Code

Harness Engineering

Guide:Harness Engineering

Remote Agents that run in parallel, autonomously
Their agent built itself — continuously improves its own codebase
Three harness layers: (1) Constraint Harnesses (feedforward — rules, lint), (2) Feedback Loops (corrective — structured error signals), (3) Quality Gates (CI blocks non-compliant code)

🔵 OpenAI Internal Usage

88 AGENTS.md files

Blog:Harness Engineering at OpenAI

OpenAI uses 88 AGENTS.md files across their monorepo for constraint composition. They enforce "taste invariants" as hard CI failures, not warnings.

8. The Four Levels of Agent Engineering

Source: Daniel Demmel — Feedback Loop Engineering

Level 4 │ HARNESS ENGINEERING ◀── Everything in an agent except the model │ (guides + sensors) (Augment, OpenAI, Factory) │ Level 3 │ FEEDBACK LOOP ENGINEERING ◀── ★ HIGHEST ROI TODAY ★ │ (tools for agents to verify (CI, lint, test runners, │ their own work at runtime) structured error signals) │ Level 2 │ CONTEXT ENGINEERING ◀── What goes in the prompt │ (CLAUDE.md, AGENTS.md, (steipete agent-scripts, │ docs, skills) Codex Skills) │ Level 1 │ PROMPT ENGINEERING ◀── How you ask │ (manual prompting) (where most people are stuck)

Inner loop = agent runs code → reads result → iterates within one session

Outer loop = one session's lesson becomes knowledge for all future sessions (e.g., Mozilla cq, Codex self-improving skills automation)

9. All Links & Resources

Official Codex

openai.com/codex/ — Product page

developers.openai.com/codex/ — Dev docs

github.com/openai/codex — CLI (89.4K ⭐)

github.com/openai/symphony — Orchestrator

github.com/openai/skills — Skills catalog

Unrolling the Agent Loop — Architecture

Harness Engineering — OpenAI internal

Long-Horizon Tasks — Markdown state tracking

Research compiled June 8, 2026. All links verified at time of research. Star counts approximate.

Agent Loops Deep Dive

🔁 Agent Loops Deep Dive

1. The Two Posts & Why They Matter

🟦 VB Srivastav (@reach_vb) — OpenAI DevX Lead

🦞 Peter Steinberger (@steipete)

2. OpenAI Codex App — Feature by Feature

2.1 Automations — Always-On Background Agents

Real Example — Self-Improving Skills

2.2 Worktrees — Isolated Parallel Environments

2.3 Skills — Codified Project Knowledge

Example SKILL.md

2.4 Plugins & Connectors

Built-in Plugins

Connectors (event-driven)

Linear MCP Setup

2.5 Sub-agents — Parallel Ideation & Verification

Custom Agent Definition (pr-explorer.toml)

Multi-Agent Prompt Example

2.6 Markdown/Linear State Tracking

A) Markdown — Durable Project Memory

B) Linear as State Machine — Symphony

3. The "Loop That Prompts Agents" Concept

4. Peter Steinberger's Actual Setup

His Stack — Solo Dev, ~300K LOC TypeScript

Key Repos

His Principles

5. Open-Source Agent Repos (with Loop Architecture)

OpenHands (formerly OpenDevin)

aider

Codex CLI (OpenAI)

SWE-agent (Princeton)

Sweep AI

Pi Agent

The "Ralph Loop"

6. Orchestration Frameworks

🔗 LangGraph

🤖 AutoGen (Microsoft)

🚢 CrewAI

🎵 Symphony (OpenAI)

📦 DSPy

🧠 Mozilla cq

GitHub Agentic Workflows

7. Companies Doing This in Production

🟠 Cognition (Devin)

🟢 Spotify — Honk System

🏭 Factory.ai

🔮 Augment Code

🔵 OpenAI Internal Usage

8. The Four Levels of Agent Engineering

9. All Links & Resources

Official Codex

Codex Features Docs

Peter Steinberger

Open-Source Agent Repos

Frameworks & Patterns

Company Engineering Blogs