AI Breaks. Your Work Shouldn't.

We lost $300 and half a week to a single AI coding session. Not because the model was bad. Because the system around it had no guardrails.

The session ran away. An agent looped on the same task for hours. No spend alert. No runaway detection. No way to know it was happening until the budget was gone and the cooldown locked us out.

We filed bugs. The vendor closed them as stale. So we built NeuroRouter — a local proxy that cleans, repairs, and controls AI requests before they hit the API.

The bugs they won't fix

Session corruption from parallel tool calls. When multiple agents write to the same session file concurrently, the JSONL writer can drop an assistant message. The corresponding tool result becomes an orphan. Every subsequent API call returns 400. The session is permanently unresumable. Eleven orphaned entries in one 45MB session. The vendor's response: closed by a bot after 30 days of inactivity.

Content filter false positives. Working with the Business Source License triggers "output blocked by content filtering policy." Four duplicate issues. All closed as stale. The client never retries 400 errors.

These aren't edge cases. They're structural. The client sends requests with no validation — no preflight checks, no tool chain integrity, no recovery logic. The request is serialized and fired. If it fails, it fails.

What $300 taught us

We stopped waiting for upstream fixes and built a proxy that sits between the AI tool and the API. Every request passes through it. Every response comes back through it.

Block detected secrets before forwarding. Known credential formats are intercepted inline according to deterministic rules. Flagged with rotation steps, not just blocked.

Shape context before the model sees it. Stale reads, abandoned exploration, orphaned tool results, failed retries, duplicate reminders — compiled away. The model reasons on clean working memory.

Repair failures before they break your session. Malformed tool chains are repaired locally when safe, or blocked with a clear integrity warning when unsafe. Role alternation is normalized. Content filter false positives are retried with reduced context.

The math

Context decay is physics, not a bug. Fixed-size window plus unbounded session equals a quadratic cost curve. Every dead token — a stale file read, an abandoned thinking block, a failed retry — gets re-sent on every subsequent turn. A 1,000-token dead block at turn 10 in a 200-turn session costs 1,000 × 190 = 190,000 tokens of re-read waste. Not 1,000. The cost compounds.

Context shaping removes 0–2% per turn. Sounds small. But across a 200-turn Opus session, that's roughly 20 million tokens of compounded re-reads avoided. The real value isn't the savings — it's the reasoning quality when the model stops re-reading its own abandoned explorations from 150 turns ago.

This is why we now describe NeuroRouter as a context compiler, not just a proxy. The useful result is not a shorter transcript. It is a request that still carries the objective, decisions, constraints, rejected approaches, and current workspace after the dead history is gone.

We used to report bigger numbers — 67% noise removed, $280 saved. Those numbers were real but dishonest. Our filter was too aggressive: it stripped content the model needed, causing the model to re-read it, creating the waste it claimed to remove. A feedback loop we built ourselves. We fixed it. The savings are smaller now. The sessions don't crash.

The discipline tax

The free proxy works, but it demands discipline. One protocol per instance. One session at a time. You restart it for every tool switch. You notice the runaway yourself.

One missed runaway, one corrupted tool chain — and you're filing a bug that gets closed as stale.

Session multiplexing. Claude and Codex in one daemon. Isolated sessions, no cross-contamination. Start once in the morning, forget about it.

Continuity repair. Broken tool chains are fixed locally before they become upstream 400s when the structure is provably safe. When it is not, the request is blocked with an integrity warning instead of forwarded blindly.

Model routing. Mechanical work goes to cheaper models by default. Subagent running a file search? Haiku. Lint check? GPT-4o-mini. Foreground reasoning stays on Opus. Opt out with one flag.

Session integrity gating. When the proxy detects stale objectives, wrong workspace evidence, repeated loops, or unsafe recovery, it downgrades the session even if compression and RCS look green. The budget stops burning on false progress.

Cooldown warning. At 80% quota usage: wrap up. At 90%: commit and push now. At 95%: work is automatically rescued to a checkpoint file. You're never locked out without a save.

What we learned by breaking our own product

The most important bug we found was in our own code. Our stale-reads filter was stripping file content the model needed to remember. The model would re-read the files. The filter would strip them again. A feedback loop that grew the session to 2MB while the filtered output stayed flat at 275KB — small enough that the client's compaction never triggered.

We found it because we use the product ourselves, on real work, every day. The $200 runaway on our own session is what exposed the bug. That's the advantage of building infrastructure you depend on: the feedback loop between failure and fix is measured in hours, not quarters.

Who this is for

If you've burned credits on a runaway session, lost work to a corrupted tool chain, or had a content filter kill your output mid-task — you already know the problem.

NeuroRouter

Install the free version in one command: brew install obstalabs/tap/neurorouter

Pro compiles long-session context, stops runaway sessions, repairs failures, and keeps you working. $29/mo.