Your AI session didn't cost $400 because it was brilliant. It cost $400 because it kept re-reading itself.
Most engineers don't know this because the bill arrives as an opaque monthly total with no breakdown by session, by conversation, or by mistake. We ran the numbers on a real 57MB session with 25 compaction events. Total cost: $368.45.
83% of the cost wasn't thinking. It was re-thinking.Cache reads — the model re-processing the same context every turn $32 for a debugging detourLater compacted away — the work vanished, the bill didn't The cheapest epoch cost $2.52. The most expensive: $32.03.Epochs — the periods between compaction events — have wildly different economicsLong AI sessions have three invisible taxes. Understanding them is the first step to controlling them.
The context window is not memory. It is a recurring expense. Every token you keep is a token you pay to re-read on every subsequent turn. Every compaction is a lossy compression event that erases specificity. After ten compactions, the model is reasoning from a summary of a summary of a summary — while you continue paying full price for less precise thinking.
The CLI shows ctx:41% — a single number. It doesn't tell you what's filling it, how much it costs, or what happens when compaction triggers. It certainly doesn't tell you that the 15-minute tangent into an unrelated repository just displaced 34K tokens of your actual project context.
LLM sessions have three reasoning phases:
Exploratory — temporary, unstable. You're trying approaches, reading files, asking questions. This is scaffolding.
Decision — the commit point. You've decided on an approach, an architecture, a fix. This is signal.
Operational — forward-only execution. You're implementing the decision. The exploration that got you here is now noise.
Claude Code mixes all three permanently. Once you've decided, the scaffolding that got you there doesn't just waste tokens — it biases future responses. The model sees your rejected approaches and old reasoning alongside your final decision. That's not token waste. That's reasoning contamination. And contamination compounds.
Here's a pattern we see in every long session: you're working on Project A. A question comes up about Project B. You check a file, read a config, ask a follow-up. Twenty minutes later, compaction triggers. The summary is about Project B — because that's what filled the recent context. Project A's architecture, decisions, and constraints got compressed into a sentence.
The model did exactly what you told it to do. You just didn't mean to tell it that.
Now you're paying the re-explanation tax: re-reading the same files, re-stating the same constraints, re-establishing the same context you had an hour ago. This tax is invisible and cumulative. Over a multi-day project spanning dozens of sessions, it can exceed the cost of the actual productive work.
This is not a tooling problem. It's a practice problem.
The fix is not bigger context windows — those just delay compaction while increasing cache read costs. The fix is treating your context window the way you treat your codebase: with intentional decisions about what stays and what gets removed.
We call this reasoning hygiene:
Keep conclusions, remove scaffolding. After a decision is made, the exploration that got you there becomes dead weight. Collapse exploration into decisions.
Clean at decision boundaries, not when the window is full. By the time you're at 90%, compaction is imminent and you've lost control over what gets compressed. Waiting until 90% is like refactoring only when production is on fire.
Separate exploration from execution. Exploratory reasoning in one session, structured execution in another. The exploration session can be messy — the execution session carries forward only the decisions.
Track where the money goes. Tokens are abstract. Percentages are abstract. Dollars are visceral. Knowing that a debugging detour cost $32 changes behavior faster than knowing you were at 82% context usage.
We wanted visibility into where reasoning and money were going. So we built ContextSpectre. It reads Claude Code's local session files and gives you control over what fills your context window — what it costs, what's noise, and what happens if you remove it.
It doesn't use AI to analyze your sessions. It uses structural signals: token counts, file paths, compaction boundaries, cost attribution from actual usage data. When it doesn't know something, it says so.
Reasoning hygiene layer for Claude Code. Open source, MIT licensed.