← Obsta Labs

AI Breaks. Your Work Shouldn't.

April 2026 · Obsta Labs

We lost $300 and half a week to a single AI coding session. Not because the model was bad. Because the system around it had no guardrails.

The session ran away. An agent looped on the same task for hours. No spend alert. No runaway detection. No way to know it was happening until the budget was gone and the cooldown locked us out.

We filed bugs. The vendor closed them as stale. So we built NeuroRouter — a local proxy that cleans, repairs, and controls AI requests before they hit the API.

This will happen to you. Here's what we found.

The bugs they won't fix

Session corruption from parallel tool calls. When multiple agents write to the same session file concurrently, the JSONL writer can drop an assistant message. The corresponding tool result becomes an orphan. Every subsequent API call returns 400. The session is permanently unresumable. Eleven orphaned entries in one 45MB session. The vendor's response: closed by a bot after 30 days of inactivity.

Content filter false positives. Working with the Business Source License triggers "output blocked by content filtering policy." Four duplicate issues. All closed as stale. The client never retries 400 errors.

These aren't edge cases. They're structural. The client sends requests with no validation — no preflight checks, no tool chain integrity, no recovery logic. The request is serialized and fired. If it fails, it fails.

What $300 taught us

We stopped waiting for upstream fixes and built a proxy that sits between the AI tool and the API. Every request passes through it. Every response comes back through it.

The proxy does three things:

Block secrets before they leak. Credentials, tokens, and connection strings intercepted inline. Flagged with rotation steps, not just blocked.

Strip waste before you pay for it. Stale reads, thinking blocks, orphaned tool results, failed retries, duplicate reminders. The API only sees clean context.

Repair failures before they break your session. Malformed tool chains fixed before they become 400s. Role alternation repaired. Content filter false positives retried with reduced context.

The math

Context decay is physics, not a bug. Fixed-size window plus unbounded session equals a quadratic cost curve. Every dead token — a stale file read, an abandoned thinking block, a failed retry — gets re-sent on every subsequent turn. A 1,000-token dead block at turn 10 in a 200-turn session costs 1,000 × 190 = 190,000 tokens of re-read waste. Not 1,000. The cost compounds.

The proxy strips 0–2% per turn. Sounds small. But across a 200-turn Opus session, that's roughly 20 million tokens of compounded re-reads avoided — about $60 in savings you barely notice accumulating. The $29/month subscription pays for itself in the first long session.

The per-turn number is the visible metric. The real value is underneath:

Per turn:     ~1,000 tokens stripped (thinking, retries, reminders)
Snowball:     ~20M tokens of re-reads avoided per 200-turn session
Secrets:      up to 28 caught per request (credentials, keys, DSNs)
Session heal: 15+ broken tool chains repaired automatically

We used to report bigger numbers — 67% noise removed, $280 saved. Those numbers were real but dishonest. Our filter was too aggressive: it stripped content the model needed, causing the model to re-read it, creating the waste it claimed to remove. A feedback loop we built ourselves. We fixed it. The savings are smaller now. The sessions don't crash.

The discipline tax

The free proxy works, but it demands discipline. One protocol per instance. One session at a time. You restart it for every tool switch. You notice the runaway yourself.

One missed runaway, one corrupted tool chain — and you're filing a bug that gets closed as stale.

The pro version removes that tax:

Session multiplexing. Claude and Codex in one daemon. Isolated sessions, no cross-contamination. Start once in the morning, forget about it.

Continuity repair. Broken tool chains fixed locally before they become upstream 400s. The session file is healed automatically when degradation is detected. No manual contextspectre runs.

Model routing. Mechanical work goes to cheaper models by default. Subagent running a file search? Haiku. Lint check? GPT-4o-mini. Foreground reasoning stays on Opus. Opt out with one flag.

Runaway gating. When the proxy detects the same byte delta repeating, it gates further requests. The budget stops burning.

Cooldown warning. At 80% quota usage: wrap up. At 90%: commit and push now. At 95%: work is automatically rescued to a checkpoint file. You're never locked out without a save.

What we learned by breaking our own product

The most important bug we found was in our own code. Our stale-reads filter was stripping file content the model needed to remember. The model would re-read the files. The filter would strip them again. A feedback loop that grew the session to 2MB while the filtered output stayed flat at 275KB — small enough that the client's compaction never triggered.

The filter designed to save money was causing the runaway.

We found it because we use the product ourselves, on real work, every day. The $200 runaway on our own session is what exposed the bug. That's the advantage of building infrastructure you depend on: the feedback loop between failure and fix is measured in hours, not quarters.

If extraordinary effort is needed to maintain safety, the architecture is broken. The fix should be structural, not behavioral.

Who this is for

If you've burned credits on a runaway session, lost work to a corrupted tool chain, or had a content filter kill your output mid-task — you already know the problem.

You don't need convincing. You need insurance.