← Obsta Labs

The Architect-Execute Split

May 2026 · Obsta Labs

Most operators discover the same trap the expensive way. A frontier session starts with architecture, drifts into implementation, absorbs review, accumulates side quests, and by the end one agent is trying to be strategist, memory, dispatcher, reviewer, and typist at the same time. The bill hurts. The drift hurts more.

There is a better split. Let the long-running session do architecture. Let fresh bounded sessions do execution. Let the shared substrate carry the truth between them.

A frontier session is most valuable when it is deciding, not when it is grinding through routine code lifts.The expensive model should spend its tokens on the hard judgment, then hand the bounded work to a cheaper or more specialized execution surface.

The pattern

Architect session: long-running, exploratory, and allowed to span weeks if necessary. It frames the problem, names invariants, writes work orders, records decisions, and freezes the parts of the reasoning that must survive. It does not need to write every line of code itself.

Execution sessions: fresh, bounded, and scoped to one work order or one clean review batch. They start from a mission briefing, not from the full transcript that produced the architecture. They commit, note the result, and end clean.

In practice that execution tier might be Codex CLI for bounded implementation, Opus on OAuth for a slice that still needs frontier quality, tokencontrol for routine ready work, or Sonnet for the default middle band. The important rule is not brand loyalty. It is that execution starts fresh and stays scoped.

Substrate: Hiveram carries the canonical work graph, authority boundary, portable bundles, rejected paths, checkpoints, and provenance of what moved out and what came back. NeuroRouter shapes the live model window when an execution surface needs the right slice of context. The truth does not live in the transcript anymore.

C1  mechanical or obvious work        -> tokencontrol / fast Codex
C2  bounded implementation or review  -> Codex CLI / Sonnet
C3  scoped but non-trivial execution  -> fresh session + mission briefing
C4  architecture and ambiguity        -> Opus architect session, then split

The point is not that one model is noble and another is cheap. The point is that the work shape changes across a project, and the session should change with it.

The cost arithmetic

Pricing moves, so treat the numbers here as a shape rather than a forecast. As of May 2026, OpenAI's API pricing page lists GPT-5.5 at $5 / $30 and GPT-5.4 mini at $0.75 / $4.50 per million tokens. Anthropic's API pricing docs list Sonnet 4 at $3 / $15 and Opus 4.1 at $15 / $75. Anthropic also publishes lower prices for some newer Opus generations on other pricing surfaces, but the lesson here does not depend on the exact frontier sticker price.

Take a simplified ten-turn task with roughly 200K input tokens and 30K output tokens per turn. On the older Opus 4.1 pricing shape, all ten turns cost about $52.50. Keeping two architecture turns on Opus 4.1 and moving eight bounded execution turns to Sonnet drops that to about $18.90. Moving those execution turns to GPT-5.4 mini drops it to about $12.78.

On a newer, cheaper frontier pricing shape, the absolute numbers fall. The product lesson does not. Once the work is bounded, routine turns can safely move to cheaper execution tiers, and the savings compound across every turn that no longer needs frontier judgment.

The cheapest useful optimization is often not a cheaper model. It is making the work bounded enough that a cheaper model is now obviously safe.

Those numbers are not a universal forecast. They are a simple proof that "one giant premium session" is often paying premium prices for the wrong turns.

The quality argument

Cost is only half of the argument. The deeper reason for the split is quality.

A single long-lived session accumulates exploration baggage. Every abandoned branch, every vague instruction, every wrong-but-useful probe remains in the live context and competes with the task that actually matters now. The same session is then asked to implement, review, remember, and route. That is where drift enters.

Fresh execution runs are different. They begin after the architectural work has already been compressed into a mission briefing or bounded payload. They do not need to carry the architect's entire search path, only the parts that still matter. That makes them cheaper, but it also makes them cleaner.

We have already seen the branch survive in internal capture. In our oauth-smoke-5 capture, a forked execution session held RCS 100 across 51 requests after the handoff. That matters because the proof is not "the context window was huge." The proof is that the fresh branch stayed aligned without replaying the whole past.

How to do it today

The ergonomics are still manual, and it is better to say that plainly.

architect on Opus
-> create or refine the WO
-> write the decisions into workledger
-> generate a briefing, checkpoint, or bundle
-> start a fresh execution session
-> commit, note, and end

Today that usually means forking a fresh session by hand, loading the work order or bundle, and keeping the result explicit in the ledger. The workflow already works. The polish is what is still catching up.

The future ergonomic version is a launch primitive that can render a clean execution run from the work order itself. That is the obvious tool shape. It is not the thing this post is asking you to pretend already exists.

What changes for operators

The hard part is not memorizing commands. The hard part is changing the mental model.

The default mental model is one session, one tool, one bill. The architect-execute split asks for something else: multiple sessions, multiple tiers, one substrate. Once that clicks, the workflow stops feeling like overhead and starts feeling like cost control with better reasoning hygiene.

You are not downgrading the frontier model. You are using it where it is strongest: naming the hard thing correctly once, then letting bounded runs carry the execution load.

If the last year taught operators anything, it is that bigger context windows do not save an incoherent workflow. A better workflow does.