← Obsta Labs

Decision Observability Without Transcript Replay

May 2026 · Obsta Labs

The original decision-telemetry argument was right. Token counts still hide the shape of reasoning. Decision boundaries still matter. Branch factor still matters. What changed is where those signals live.

In the long-session model, the only place to inspect the reasoning trail was the session itself. If you wanted to know where the system committed, what it rejected, or how broadly it searched, you had to mine one very large transcript while it was still alive.

That is not a durable observability model. It makes the decision trail disappear at the same moment the session gets too expensive, too stale, or too compacted to trust.

The problem was never just missing metrics.The problem was storing the decision trail in the least stable place in the system.

The metrics were right

Thinking Telemetry named the useful signals: decision boundaries, branch factor, exploration cost, and the difference between convergence and thrashing. None of that becomes less true just because the system stops leaning on one giant session.

What changes in the v2 model is the artifact shape. The decisions no longer live only as traces inside a transcript. They live as work-order scope, mission briefings, rejected paths, checkpoints, provenance edges, and recovery capsules that can survive a handoff.

The observability moved from the transcript to the structure around the transcript.

Decision boundaries become explicit

A decision boundary is no longer inferred only from a change in token entropy or tool choice. In the architectural model, a decision boundary is often visible because someone froze a mission briefing, wrote the objective and constraints into the work order, or cut a checkpoint before a branch.

The old telemetry model asked: when did the session stop exploring and start executing? The structural model asks: what artifact captured that commitment, and what had to survive the handoff?

That is a better question. It does not depend on the original session remaining alive. It does not require replaying the whole chat to detect the commit. The decision becomes visible because the system had to serialize it to move work forward at all.

Mission briefing: objective, constraints, rejected paths, evidence, reporting contract.A decision boundary the next session can actually inherit.

Branch factor becomes a graph problem

Branch factor used to be a hidden property of a transcript. The model may have explored seven architectures, but the operator only saw the surviving one unless they dug through everything.

Once you have checkpoints, portable bundles, provenance, and non-destructive branch returns, the search shape stops being folklore. The operator can inspect what left, what came back, what was rejected, and which branch produced the final answer.

That is still decision observability. It is just no longer trapped inside a temporary session. The graph becomes the record of exploration width.

work order
  → checkpoint
  → branch
  → bounded reply
  → provenance edge
  → reviewed apply

The old branch-factor intuition stays useful here. Some work still branches too narrowly. Some still thrashes. But now the evidence can be attached to the work graph instead of disappearing into a transcript nobody wants to reread.

Recovery capsules are observability artifacts

The live session used to be responsible for both execution and memory. That made every recovery event feel like an amnesia event. Once recovery capsules and mission briefings exist, a fresh branch can resume with a bounded explanation of what still matters.

That changes what a recovery event means. It is no longer proof that the system forgot. It can be proof that the system remembered the right things in the right shape.

This is also why the compiler framing matured into a broader context-operating-system frame. Projection is still necessary, but projection alone cannot carry the full decision trail across sessions, machines, or authority modes. For that you need substrate, transfer, projection, and retrieval working together. The broader bridge is in From Context Compiler to Context Operating System.

What becomes measurable now

The observability surface gets better, not worse, when the transcript stops being sacred. You can ask different questions:

Which checkpoint locked the decision? Which branch returned the winning answer? Which rejected paths survived into the mission briefing? Which work order carried the final reporting contract?

Those are not token counters. They are decision artifacts. They tell you what the system believed, what it tried, and what the next session was allowed to inherit.

Why this is the better durability model

Telemetry on one live session is still useful. It catches loops, drift, and runaway cost while they are happening. But it is no longer the only place where the reasoning record exists. That is the meaningful change.

The session can die. The branch can end. The operator can spin up a fresh agent. The work graph still carries the decisions that mattered, the paths that were ruled out, and the evidence needed to continue without pretending the whole past must be replayed.