In March 2026, an AI agent wiped a production database along with its VPC, ECS cluster, load balancers, and bastion host. Two and a half years of course submissions—gone. The command was terraform destroy. This is not a rare failure mode. It is the default outcome of running agents without enforcement.
The natural reaction: AI is dangerous. Disable autonomy. Go back to doing everything manually.
That reaction is wrong. And it is the most expensive mistake an engineering team can make after an incident like this.
Claude executed the instruction it was given. Terraform executed the command it received. Nothing in the chain malfunctioned. Every component worked as designed.
The failure happened in the absence of control loops—specifically, no enforcement boundary existed between “agent decides to act” and “irreversible action executes.”
Not: “AI made a mistake.”
But: “AI was allowed to act without the control loops humans already rely on.”
lint → CI → code review → staging → approval → production deploy
These layers exist because humans make mistakes. Nobody finds this controversial. We don’t argue about whether code review is worth the latency—we argue about how many approvals are enough.
When we gave agents operational power, we discovered which safety layers were implicit human habits rather than explicit system controls. The habit of reading a terraform plan before typing yes is a human control loop. The agent had no equivalent.
Some operations are irreversible by definition:
terraform destroy—infrastructure gone. DROP DATABASE—data gone. rm -rf /—filesystem gone. Sending an email—message received.
These are deterministic facts about the operations, not probabilistic predictions about intent. An ML model that is 99.5% accurate at classifying destructive commands will, given enough agent sessions, miss one. The missed one will be the one that matters.
The correct model is monotonic: risk can only increase, never decrease within a session.
SAFE → SENSITIVE → COMMITMENT → IRREVERSIBLE
terraform plan is SENSITIVE. terraform apply is COMMITMENT. terraform destroy is IRREVERSIBLE. The transitions are one-way. An agent that has entered the COMMITMENT zone cannot drift back to SAFE by running a harmless command.
Do not rely on intent. Enforce structure.
At the IRREVERSIBLE boundary, the system must stop. Not warn. Not log. Stop. A human crosses that boundary, or nobody does. Anything else is a design failure.
Agent → policy engine → terraform destroy
|
[ Decision ] terraform destroy → IRREVERSIBLE
→ require_approval
|
BLOCKED — operator approval required
“terraform destroy targets 5 resource types. Approve?”
The agent never gets to execute. The operator sees what will be destroyed. The decision is deterministic—no model needed to classify terraform destroy as irreversible. The audit trail is immutable.
If the operator approves, the action executes under their authority. If they deny, the agent proposes an alternative. The agent retains full autonomy for everything that isn’t irreversible.
The operator’s post-incident fixes tell a story:
1. Moved Terraform state to S3 (operational fix)
2. Enabled deletion protection on RDS (AWS-level fix)
3. Daily restore testing via Lambda (recovery fix)
4. Disabled automatic execution—now requires manual approval for all commands (process fix)
Fix 4 is the overcorrection. It throws away the productivity gains of agent-assisted work because one boundary was missing. The agent now needs approval for echo "hello" and terraform destroy equally. This is what most teams will do—and it quietly kills all productivity gains.
You don’t remove circuit breakers because they slow down electricity. You install better ones.The middle path: agents retain full autonomy for safe operations. They hit deterministic gates at irreversible boundaries. This is what chainwatch enforces. The gate is cheap—milliseconds of latency. The absence of the gate is expensive—24 hours of recovery and an AWS support upgrade. This is not a rare edge case. It is guaranteed at scale.
This is not a classification problem. Nobody needs a model to determine that terraform destroy is dangerous. The danger is structural, not statistical.
Deterministic policy for irreversible boundaries. Same reason circuit breakers use physics instead of predictions: the cost of a false negative at an irreversible boundary is unbounded.
ML is useful for the ambiguous middle—“is this API call suspicious?”—where false positives have low cost and false negatives have bounded cost. At the irreversible boundary, there is no ambiguous middle. The command either destroys infrastructure or it doesn’t.
Userspace enforcement handles the 90% case—agent makes a tool call, policy evaluates it, decision executes. But an adversary (or a confused agent) can bypass userspace wrappers with direct syscalls.
The complete enforcement stack:
Userspace: agent → policy engine → allow/deny/approve
Kernel: eBPF sensor (observe) + seccomp filter (enforce)
35 blocked syscalls across 5 groups:
privilege escalation, file mutation, mount admin,
network egress, baseline
The eBPF sensor sees what the agent process actually does at the syscall level—not what it says it’s doing. Seccomp filters prevent the syscalls from executing. If the agent finds a way to bypass the policy engine, the kernel stops it.
This is defense in depth applied to agent containment. The userspace gate catches the tool calls. The kernel gate catches the syscalls. Neither trusts the other.
The same kind of failure that happens when a deployment pipeline lacks a staging environment, or when a database lacks deletion protection.
The fix is not less autonomy. The fix is better boundaries.
Approval gates are not bureaucracy. They are circuit breakers.
Autonomy without enforcement is trust without verification. Trust is not a safety mechanism.
The cost of enforcement is milliseconds. The cost of missing enforcement is irreversible.
This is not an AI problem. It is a systems problem.
Principiis obsta — resist the beginnings. Don’t detect the destruction after it happens. Prevent it structurally.
Open-source runtime control plane for AI agent safety. Intercepts tool calls at irreversible boundaries. Enforcement, not observability.