An open-source spec for Codex orchestration: Symphony
By Alex Kotliarskyi, Victor Zhu, and Zach Brock
Six months ago, while working on an internal productivity tool, our team made a controversial (at the time) decision: we’d build our repo with no human-written code. Every line in our project repository had to be generated by Codex.
To make that work, we redesigned our engineering workflow from the ground up. We built an agent-friendly repository, invested heavily in automated tests and guardrails, and treated Codex as a full-fledged teammate. We documented that journey in our previous blog post on harness engineering.
And it worked, but then we ran into the next bottleneck: context switching.
To solve this new problem, we built a system called Symphony. Symphony(opens in a new window) is an agent orchestrator that turns a project-management board like Linear into a control plane for coding agents. Every open task gets an agent, agents run continuously, and humans review the results.
This post explains how we created Symphony—resulting in a 500% increase in landed pull requests on some teams—and how to use it to turn your own issue tracker into an always-on agent orchestrator.
The ceiling of interactive coding agents
Even as they get easier to use, coding agents—whether accessed through web apps or CLI—are still interactive tools.
As the scale of agentic work increased at OpenAI, we found a new kind of burden. Each engineer would open a few Codex sessions, assign tasks, review the output, steer the agent, and repeat. In practice, most people could comfortably manage three to five sessions at a time before context switching became painful. Beyond that, productivity dropped. We'd forget which session was doing what, jump between terminals to nudge agents back on track, and debug long-running tasks that stalled halfway through.
The agents were fast, but we had a system bottleneck: human attention. We had effectively built a team of extremely capable junior engineers, then assigned our human engineers to micromanaging them. That wasn’t going to scale.
A shift in perspective
We realized we were optimizing the wrong thing. We were orienting our system around coding sessions and merged PRs, when PRs and sessions are really a means to an end. Software workflows are largely organized around deliverables: issues, tasks, tickets, milestones.
So we asked ourselves what would happen if we stopped supervising agents directly and instead let them pull work from our task tracker.
That idea became Symphony, a written spec that functions as a supervisor to orchestrate agentic work.
Turning our issue tracker into an agent orchestrator
Symphony started with a simple concept: any open task should get picked up and completed by an agent. Instead of managing Codex sessions in multiple tabs, we made our issue tracker the control plane.
In this setup, each open Linear issue maps to a dedicated agent workspace. Symphony continuously watches the task board and ensures that every active task has an agent running in the loop until it’s done. If an agent crashes or stalls, Symphony restarts it. If new work appears, Symphony picks it up and starts organizing work.
We built our workflow based on ticket statuses, using the task manager Linear as a state machine.
In practice, Symphony decouples work from sessions and from pull requests. Some issues produce multiple PRs across repos; others are pure investigation or analysis that never touch the codebase.
Once work is abstracted this way, tickets can represent much larger units of work.
We regularly use Symphony to orchestrate complex features and infrastructure migrations. For example, we might file a task asking the agent to analyze the codebase, Slack, or Notion and produce an implementation plan. Once we’re happy with the plan, the agent generates a tree of tasks, breaking the work into stages and defining dependencies between tasks.
Agents only start working on tasks that aren’t blocked, so execution unfolds naturally and optimally in parallel for this DAG (a sequence of execution steps). For example, we marked the React upgrade as blocked on a migration to Vite. As expected, agents started upgrading React only after the migration to Vite was complete.
Agents can also create work themselves. During implementation or review, they often notice improvements that fall outside the scope of the current task: a performance issue, a refactoring opportunity, or a better architecture. When that happens, they simply file a new issue that we can evaluate and schedule later—many of these follow-up tasks also get picked up by agents. While we oversee this process, agents stay organized and keep work moving forward.
This way of working dramatically reduces the cognitive cost of kicking off ambiguous work. If the agent gets something wrong, that’s still useful information, and the cost to us is near zero. We can very cheaply file tickets for the agent to go prototype and explore, and throw away any explorations we don’t like.
Because the orchestrator runs on devboxes and never sleeps, we can add tasks from anywhere and know an agent will pick it up. For instance, one engineer on our team made three significant changes from the Linear app on his phone from a cozy cabin on shoddy wifi.
An increase in exploration from working this way
When observing the effects of working with Symphony, the most obvious change was output. Among some teams at OpenAI, we saw the number of landed PRs increase by 500% in the first three weeks. Outside of OpenAI, Linear founder Karri Saarinen highlighted a spike in workspaces created(opens in a new window) as we released Symphony. However, the deeper shift is how teams think about work.
When our engineers no longer spend time supervising Codex sessions, the economics of code changes completely. The perceived cost of each change drops because we’re no longer investing human effort in driving the implementation itself.
That changed our behavior. It's become trivial to spin up speculative tasks in Symphony. Try an idea, explore a refactor, test a hypothesis, and only keep the results that look promising.
It also broadens who can initiate work. Our product manager and designer can now file feature requests directly into Symphony. They don’t need to check out the repo or manage a Codex session. They describe the feature and get back a review packet that includes a video walkthrough of the feature working inside the real product.
Symphony also shines in large monorepos (like the one we have at OpenAI) where the last mile of landing a PR is slow and fragile. The system watches CI, rebases when needed, resolves conflicts, retries flaky checks, and generally shepherds changes through the pipeline. By the time a ticket reaches Merging, we have high confidence the change will make it into the main branch without human babysitting.
After implementing Symphony, we delegate more work to agents and focus on harder, more exploratory tasks.
Progress comes with new, different problems
Operating at this level comes with tradeoffs. When we moved from steering agents interactively to assigning them work at the ticket level, we lost the ability to constantly nudge them mid-flight and course-correct when needed. Sometimes the agent produced something that completely missed the mark. That was useful—those failures revealed gaps in the system and helped us make it more robust.
Instead of patching the result manually, we added guardrails and skills so the agents could succeed the next time. Over time, this led us to add new capabilities to our harness, like running end-to-end tests, driving the app through Chrome DevTools, and managing QA smoke tests. We significantly improved our documentation and clarified what good looks like.
Not every task fits the Symphony style of work. Some problems still require engineers working directly with interactive Codex sessions, especially ambiguous problems or work that requires strong judgment and expertise. In practice, these are usually the most interesting and enjoyable tasks for our engineers to spend time on.
The difference is that Symphony can handle the bulk of routine implementation work. That lets engineers focus on a single hard problem at a time instead of constantly context-switching between smaller tasks.
We also learned that treating agents as rigid nodes in a state machine doesn’t work well. Models get smarter and can solve bigger problems than the box we try to fit them in. Our early versions of agentic work was only asking Codex to implement the task. That approach proved too limiting. Codex is perfectly capable of creating multiple PRs as well as reading review feedback and addressing it. So we gave it tools—gh CLI, skills to read CI logs, etc.—and now we can ask Codex to do more, like closing old PRs or pulling reports on completed vs. abandoned work. These types of tasks fell way outside the initial feature implementation box.
So we eventually moved toward giving agents objectives instead of strict transitions, much like a good manager would assign a goal to a direct report on their team. The power of models comes from their ability to reason, so give them tools and context and let them cook.
Using Symphony to build Symphony
When you open the Symphony repository,(opens in a new window) the first thing you’ll notice is that Symphony is technically just a SPEC.md file—a definition of the problem and the intended solution. Rather than building a complex supervision system, we defined the problem and intended solutions, giving agents high-level steering.
The reference implementation is written in Elixir—because when code is effectively free, you can finally pick languages for their strengths, like Elixir's concurrency—but the core idea can be expressed in a simple Markdown document. We encourage you to point your favorite coding agent at the spec and have it implement its own version.
The first version of Symphony was just a Codex session running in tmux, polling Linear and spawning sub-agents for new tasks. It worked, but it wasn’t particularly reliable. The second version lived inside our main project repository, which was built with agents in mind. We had already built the agent harness to give agents the skills and context to do high quality work in this repo, so Symphony simply connects it all.
Once the basic functionality existed, we used Symphony to build Symphony.
When we internally demoed the system managing tasks and attaching its proof-of-work video, the reaction was overwhelmingly positive: our Symphony project channel grew, and teams across the organization started using it organically. Internal product market fit is a prerequisite for launching externally at OpenAI. Based on the usage we saw at OpenAI, it became clear we should share Symphony beyond company walls.
So we extracted the idea into a standalone SPEC.md and asked Codex to implement it. For the reference implementation, we chose Elixir, a relatively niche language with excellent primitives for orchestrating and supervising concurrent processes. Codex built the Elixir implementation in one shot, and we kept iterating on both spec and implementation from there. To polish the spec, we even asked Codex to implement it in several other languages—TypeScript, Go, Rust, Java, Python—and use the results to identify ambiguities and simplify the system. It succeeded in every language.
Through the process of building Symphony, we removed a lot of incidental complexity, like dependencies on specific repositories or Linear MCP. Symphony no longer depends on our internal repositories or workflows. The core approach became simple:
For every open task, guarantee that an agent is running in its own workspace.
In addition to helping with the active work, the development workflow is now something agents know and follow. The development workflow—work on an issue, check out a repo, put it in progress so the PM knows it's being worked on, add the PR, move it to the Review status, attach videos, etc.—is now captured in a simple WORKFLOW.md file. All of this is a process that humans followed, but it was never documented. Rather than relying on this implicit set of steps, we now document it, and Symphony ensures agents follow it. This lets us build agents that work alongside us. If we decide that agents should also attach self-reflection to finished work, we'll add that to the WORKFLOW.md, and Symphony will guide the agents to that step.
We also got to use Codex in app server mode(opens in a new window), a built-in headless mode for Codex. This mode allowed us to run Codex and talk to it programmatically via a well documented JSON-RPC API for things like starting a thread or reacting to turns. It’s more convenient and scalable than trying to interact with Codex via CLI or live tmux sessions.
Codex App Server was a perfect fit for our use case: we take advantage of the harness Codex provides while having knobs and hooks to plug into. For example, to avoid exposing the Linear access token to subagents, we use dynamic tool calls(opens in a new window) to expose the raw linear_graphql function that executes arbitrary requests against Linear, without relying on MCP or exposing the access token to containers.
What’s next
Symphony is an intentionally minimal orchestration layer. We’re open sourcing it to demonstrate the power of Codex App Server when paired with different workflow tools, like Linear. As such, we don't plan to maintain Symphony as a standalone product. Think of it as a reference implementation. Similar to how many developers pointed their coding agents at the harness engineering post to scaffold their repositories, we hope you point your favorite coding agent at the Symphony spec(opens in a new window) and repository(opens in a new window) to build your own versions tailored to your environments.
The power comes from Codex and its app server. Symphony was a way to connect Codex to Linear, two things we already used, to solve the work management problem. As coding agents become better at reasoning and following instructions, we suspect the bottleneck at other companies will shift from writing code toward managing agentic work, too. The exciting part is that the barrier to experimenting with these coding agent systems is now surprisingly low. You can just build things with Codex.
Community shoutouts
We're thrilled to see the engineering community using Symphony in the weeks since release, garnering over 15K GitHub stars(opens in a new window) as of April 23.