Your AI Session Costs $400. Here's Where the Money Goes.

Your AI session didn't cost $400 because it was brilliant. It cost $400 because it kept re-reading itself.

Most engineers don't know this because the bill arrives as an opaque monthly total with no breakdown by session, by conversation, or by mistake. We ran the numbers on a real 57MB session with 25 compaction events. Total cost: $368.45.

Long AI sessions have three invisible taxes. Understanding them is the first step to controlling them.

Tax 1: The re-read tax

The context window is not memory. It is a recurring expense. Every token you keep is a token you pay to re-read on every subsequent turn. Every compaction is a lossy compression event that erases specificity. After ten compactions, the model is reasoning from a summary of a summary of a summary — while you continue paying full price for less precise thinking.

The CLI shows ctx:41% — a single number. It doesn't tell you what's filling it, how much it costs, or what happens when compaction triggers. It certainly doesn't tell you that the 15-minute tangent into an unrelated repository just displaced 34K tokens of your actual project context.

Tax 2: The compaction tax

Exploratory — temporary, unstable. You're trying approaches, reading files, asking questions. This is scaffolding.

Decision — the commit point. You've decided on an approach, an architecture, a fix. This is signal.

Operational — forward-only execution. You're implementing the decision. The exploration that got you here is now noise.

Claude Code mixes all three permanently. Once you've decided, the scaffolding that got you there doesn't just waste tokens — it biases future responses. The model sees your rejected approaches and old reasoning alongside your final decision. That's not token waste. That's reasoning contamination. And contamination compounds.

Tax 3: The re-explanation tax

Here's a pattern we see in every long session: you're working on Project A. A question comes up about Project B. You check a file, read a config, ask a follow-up. Twenty minutes later, compaction triggers. The summary is about Project B — because that's what filled the recent context. Project A's architecture, decisions, and constraints got compressed into a sentence.

The model did exactly what you told it to do. You just didn't mean to tell it that.

Now you're paying the re-explanation tax: re-reading the same files, re-stating the same constraints, re-establishing the same context you had an hour ago. This tax is invisible and cumulative. Over a multi-day project spanning dozens of sessions, it can exceed the cost of the actual productive work.

The practice: reasoning hygiene

The fix is not bigger context windows — those just delay compaction while increasing cache read costs. The fix is treating your context window the way you treat your codebase: with intentional decisions about what stays and what gets removed.

That is the bridge from reasoning hygiene to context compilation. The transcript is source material; the active objective, decisions, constraints, rejected approaches, quantities, and blockers are the field that must survive. Clean context is not verbatim memory. It is preserved work.

Keep conclusions, remove scaffolding. After a decision is made, the exploration that got you there becomes dead weight. Collapse exploration into decisions.

Clean at decision boundaries, not when the window is full. By the time you're at 90%, compaction is imminent and you've lost control over what gets compressed. Waiting until 90% is like refactoring only when production is on fire.

Separate exploration from execution. Exploratory reasoning in one session, structured execution in another. The exploration session can be messy — the execution session carries forward only the decisions.

Track where the money goes. Tokens are abstract. Percentages are abstract. Dollars are visceral. Knowing that a debugging detour cost $32 changes behavior faster than knowing you were at 82% context usage.

What we needed

We wanted visibility into where reasoning and money were going. So we built ContextSpectre. It reads Claude Code's local session files and gives you control over what fills your context window — what it costs, what's noise, and what happens if you remove it.

It doesn't use AI to analyze your sessions. It uses structural signals: token counts, file paths, compaction boundaries, cost attribution from actual usage data. When it doesn't know something, it says so.

Update, May 2026. The three taxes in this post still hold. What changed is the architectural answer. The v1 answer was surgical cleanup inside a live session. The v2 answer is to keep exploration in ice-cubed memory, start fresh work from a mission briefing, and recall frozen context only when the task actually needs it. The hygiene becomes a property of the system, not a cleanup ritual the operator has to remember. See From Context Compiler to Context Operating System for the broader model.

NeuroRouter

Context operating system for Claude Code and Codex. It applies this discipline before requests reach the model — shaping the active slice of work and keeping long sessions from rotting into expensive guesswork.

The open-source prototype, ContextSpectre (MIT), is one implementation of the same reasoning hygiene layer.