Obsta Labs writes about agent safety, context systems, provenance, and decision governance — the same control model behind Bulwark, Verdict, NeuroRouter, Hiveram, VectorCourt, and CoreGate. These are working notes from building that stack, grouped by theme rather than date.
It does not prove what world they observed. Why authorship signatures manufacture false confidence between AI agents, and what to bind into the claim instead.
Safety tools classify the words and ignore everything around them. Harm is not a property of text alone — it is a property of text, actor, object, and authority. Classify the environment before you classify intent.
A failure mode that appears only after your tools become powerful enough to gate anything. Prevent what is dangerous. Reconcile what is fuzzy. The proxy lies.
26 LLM proxies caught stealing credentials. A supply-chain breach compromised thousands. One operator with Claude Code hacked 9 government agencies. The proxy layer is the new front door.
An agent executed terraform destroy and wiped a production database. The agent worked correctly. The system around it was incomplete.
AI sessions corrupt, burn budgets, and die to false positives. The vendors close the bugs as stale. Here's what we built after losing $300 in one week.
Where the money goes in long Claude Code sessions, and why reasoning hygiene matters more than bigger context windows.
We mined contemplative texts for search algorithms and found TF-IDF. The detour did not produce the algorithm — it forced the benchmark, and the benchmark revealed the boring fix. Recall went from 45% to 91% by downweighting the words that appear everywhere.
Why long coding sessions need compilation, not verbatim memory: source transcript, semantic field, optimization, target model context, and proof of continuity.
Why the compiler framing was one slice of a larger system: shared truth, mission briefings, portable bundles, live-window projection, and retrieval without transcript replay.
Token counts measure volume, not structure. Decision boundaries and branch factor reveal how the model reasoned — not just what it cost.
AI collapsed the cost of writing code. It did not collapse the cost of knowing what to write. The shift from compute budgets to decision budgets.
Decision boundaries and branch factor were the right metrics. The next step is making those signals structural artifacts instead of a property of one long live session.
Why long AI coding sessions silently degrade — and what session tokendynamics means for human-AI collaboration. The context window is the first hard resource constraint in collaborative reasoning.
AI coding agents can write code while you sleep. They should not be allowed to decide the work is done. A research note on the missing closure layer for headless AI development.
Issue trackers work because a human is always reading them. When agents do the work and no one watches every transition, the supervision layer — verification, dedup, identity, and an immune system against runaway work — has to become structure.
Why AI teams need structured work artifacts, evidence convergence, and workflow discipline instead of flatter ticket databases with AI layered on top.
Filing tickets too early creates noise. Waiting too long creates surprise migrations. Preheat WOs are the structured early-warning object that prevents both — with investigation before promotion and an applicability gate before fanout.
Why long-running frontier sessions should do architecture, not every code lift: architect once, hand bounded work to cheaper execution tiers, and let the substrate carry truth between sessions.
Agentic coding will not make programmers useless. It will punish people who only execute instructions and amplify people who can project capability toward the right target.
The graphical interface existed to render machine state into pictures humans could point at. Agents do not need the pictures. For agent-operated systems the GUI becomes overhead — and that is what makes legacy systems suddenly reachable.
For seventeen days, every billing monitor was green. Every button on the product page was dead. Component health is not flow health.