Research Notes — Obsta Labs

Short technical notes on agent boundaries, provenance, context, and decision safety.

Obsta Labs writes about agent safety, context systems, provenance, and decision governance — the same control model behind Bulwark, Verdict, NeuroRouter, Hiveram, VectorCourt, and CoreGate. These are working notes from building that stack, grouped by theme rather than date.

Agent security & boundaries

A Signature Says Who Spoke

It does not prove what world they observed. Why authorship signatures manufacture false confidence between AI agents, and what to bind into the claim instead.

The False-Positive Tax

Safety tools classify the words and ignore everything around them. Harm is not a property of text alone — it is a property of text, actor, object, and authority. Classify the environment before you classify intent.

When You Can Build Any Gate, Don't

A failure mode that appears only after your tools become powerful enough to gate anything. Prevent what is dangerous. Reconcile what is fuzzy. The proxy lies.

Your LLM Proxy Is Your Biggest Attack Surface

26 LLM proxies caught stealing credentials. A supply-chain breach compromised thousands. One operator with Claude Code hacked 9 government agencies. The proxy layer is the new front door.

The Terraform Destroy Problem: Why AI Agents Need Hard Boundaries

An agent executed terraform destroy and wiped a production database. The agent worked correctly. The system around it was incomplete.

AI Breaks. Your Work Shouldn't.

AI sessions corrupt, burn budgets, and die to false positives. The vendors close the bugs as stale. Here's what we built after losing $300 in one week.

Context & reasoning systems

Your AI Session Costs $400

Where the money goes in long Claude Code sessions, and why reasoning hygiene matters more than bigger context windows.

Math Did Not Beat the Sutras. It Gave Them Unit Tests.

We mined contemplative texts for search algorithms and found TF-IDF. The detour did not produce the algorithm — it forced the benchmark, and the benchmark revealed the boring fix. Recall went from 45% to 91% by downweighting the words that appear everywhere.

NeuroRouter Is a Context Compiler

Why long coding sessions need compilation, not verbatim memory: source transcript, semantic field, optimization, target model context, and proof of continuity.

From Context Compiler to Context Operating System

Why the compiler framing was one slice of a larger system: shared truth, mission briefings, portable bundles, live-window projection, and retrieval without transcript replay.

Your AI Explored Seven Architectures. You Only Saw One.

Token counts measure volume, not structure. Decision boundaries and branch factor reveal how the model reasoned — not just what it cost.

Your Token Bill Is a Decision Receipt

AI collapsed the cost of writing code. It did not collapse the cost of knowing what to write. The shift from compute budgets to decision budgets.

Decision Observability Without Transcript Replay

Decision boundaries and branch factor were the right metrics. The next step is making those signals structural artifacts instead of a property of one long live session.

Context Decay in Long AI Sessions

Why long AI coding sessions silently degrade — and what session tokendynamics means for human-AI collaboration. The context window is the first hard resource constraint in collaborative reasoning.

Work coordination & operations

The Agent Must Not Close Its Own Ticket

AI coding agents can write code while you sleep. They should not be allowed to decide the work is done. A research note on the missing closure layer for headless AI development.

A Board Nobody Reads Is Just a Database

Issue trackers work because a human is always reading them. When agents do the work and no one watches every transition, the supervision layer — verification, dedup, identity, and an immune system against runaway work — has to become structure.

AI-Native Work Coordination, Beyond Ticket Databases

Why AI teams need structured work artifacts, evidence convergence, and workflow discipline instead of flatter ticket databases with AI layered on top.

Preheat Work Orders: The Missing Primitive Between Intuition and Tickets

Filing tickets too early creates noise. Waiting too long creates surprise migrations. Preheat WOs are the structured early-warning object that prevents both — with investigation before promotion and an applicability gate before fanout.

The Architect-Execute Split

Why long-running frontier sessions should do architecture, not every code lift: architect once, hand bounded work to cheaper execution tiers, and let the substrate carry truth between sessions.

When Execution Becomes Cheap, Direction Becomes Scarce

Agentic coding will not make programmers useless. It will punish people who only execute instructions and amplify people who can project capability toward the right target.

The GUI Was a Translation Layer for Human Eyes

The graphical interface existed to render machine state into pictures humans could point at. Agents do not need the pictures. For agent-operated systems the GUI becomes overhead — and that is what makes legacy systems suddenly reachable.

Your Checkout Endpoint Is Not Your Selling Flow

For seventeen days, every billing monitor was green. Every button on the product page was dead. Component health is not flow health.