Bulwark — Policy-Driven, Auditable Consent for AI Agents

AI coding agents can read what your shell can read. That is powerful — and dangerous.

A local agent running in your development environment may be able to inspect source code, credentials, customer data, private notes, build artifacts, SSH config, cloud credentials, and anything else reachable from the account it runs under.

Bulwark is a consent layer for that world. Instead of trusting an agent process with ambient read access, Bulwark places a policy gate in front of sensitive file reads. When an agent tries to access protected material, the read can be paused, evaluated, audited, and approved out-of-band by a human.

Technical preview

Bulwark v0.8.0 is available now as a technical preview on Linux and macOS. This release proves the core primitive end-to-end:

Before opening the enforcement publicly, we put it through repeated adversarial review — attacking the gate, fixing what we found, then attacking the fixes — until two consecutive rounds turned up nothing an agent could reach. The review surfaced around a dozen-and-a-half issues, including a few that our own first fixes introduced; every one is closed with a regression test that fails on the old code and passes on the new, verified on real hardware (Linux kernel 6.12, macOS 26). We don't claim the gate is unbreakable. We claim we tried hard to break it ourselves, and wrote down both what it stops and what it does not.

Kernel-level read gate

A protected file open is suspended before any bytes reach the agent, and the gate returns a verdict to the kernel. Decisions are made by inode, so a symlink or a rename with an innocent name still hits the same protected file.

Off-band consent

The approval travels over a channel the agent has no handle on. The agent sees only the result — the file opens, or it does not. It is structurally unable to approve its own access.

Local policy & audit

A policy file defines what is protected. Every decision is recorded — the process chain that led there, the file, allow or deny, and why — never the file content.

Public binaries

Checksummed artifacts for x86_64 and aarch64 Linux, plus signed and notarized builds for Intel and Apple Silicon macOS.

This is an MVP. It is intended for technical evaluators, early users, and design partners who want to inspect the primitive and understand the direction. macOS support has landed — the same open local enforcement engine, proven on real hardware and installable with brew install. On macOS, the boundary is enforced through Apple Endpoint Security: a protected open by the supervised tree is denied before file contents reach the agent, decided by inode, so a symlink or a hard link to a protected file is gated the same way.

Why this matters

Traditional developer security assumes a human is driving the terminal. AI agents change that assumption.

An agent can move faster than a human, inspect more files than a human would manually open, and combine local context in ways that are difficult to review after the fact.

For pipelines: give an agent exactly one path

The off-band prompt is for a human at a terminal. In CI, or when you dispatch an agent unattended, there is no one to ask — so Bulwark runs in default-deny allowlist mode: the agent may read only the paths you grant, and every other read is denied. The policy is the decision.

The concrete shape: dispatch a triage agent to a production host with read access to the logs only. It can investigate a ClickHouse incident — read the server logs, run its tools — while the data directory, the credentials, and every other database stay out of reach. You get least-privilege without a human in the loop, and a receipt for every read it attempted.

A program must read its linker, libc, and a few system files just to start, so allowlist mode permits a small runtime base set in addition to your grants — bulwark base-set prints exactly what that is. It is a stated trade-off, not a magic wand: wide enough to run a program, narrow enough that sensitive material stays out of reach.

For dispatch where the worst case includes the supervisor itself being killed, run the same allowlist as a kernel-enforced floor with --hardened. The restriction is applied to the agent in the kernel and Bulwark then becomes the agent — there is no supervisor to kill, so a crash or a SIGKILL cannot widen what the agent can read. It is the crash-safe form of the same one-path dispatch.

When one agent clamps another

The dispatcher does not have to be a human. In a fleet, an orchestrating agent hands work to a sub-agent on a sensitive host — and the safe way to do that is to clamp the sub-agent's reach at the moment it dispatches. So Bulwark is built to be operated by an agent, not only a person: it ships a machine-readable contract an orchestrator can read and invoke.

Handing an agent that lever is only safe because the lever turns one way. Tightening a clamp — protecting another path, narrowing the allowlist — is something an agent may do on its own. Loosening one — widening or removing a restriction — is not: it routes to a human over the off-band consent channel, which the supervised agent cannot see or reach. An agent can clamp; it cannot un-clamp. The restraint can only be tightened by the machine and only loosened by a person — which is what makes it safe to let one agent bound another.

Let an agent SSH into a server, gated

"My AI agent can SSH into production and read everything" is the sentence every security person understands. A local guard cannot help here: SSH is encrypted transport, so it cannot tell whether the agent is reading a log or a database dump. Enforcement has to run on the remote machine.

A protected read on the remote host is denied immediately — the kernel deadline cannot be held while a human thinks — and a prompt appears on your machine with the host, the path, and the process chain. Your answer lets the next read of that file through; the agent on the remote host never sees the prompt and cannot forge it. Grants are scoped to the requesting identity, the session, and the policy version, so an approval cannot leak to another process.

The prompt is answered on your machine, not the remote host: the operator loop runs locally, and your reply travels back over a separate control channel — the agent's own output never carries it. If the remote host doesn't have Bulwark, it is set up automatically, so you can run checks over SSH without installing anything there first; --auto answers every prompt the same way for unattended dispatch.

This is a preview of the remote tier. SSH carries transport and authentication today; a signed, time-bounded trust channel is the production hardening to come. The core — enforcement on the remote kernel, consent at your terminal, default-deny that respects the kernel deadline — is real.

Security boundary and non-goals

A security tool is only as useful as its honesty about what it does not do. Bulwark intercepts the file open — that is the boundary, and it has edges.

What it gates. Opens of a protected file by the supervised process tree, decided by inode. Because the decision is by inode and not by name, renames, symlinks, and hard links to a protected file are all still gated — they resolve to the same inode.

What it does not gate. Bulwark controls reach, not consequences. It does not stop an agent from using data it has already read, from acting on credentials already present in its environment, from reading a copy of a secret made elsewhere, or from sending anything over the network. It governs one question — may this process open this inode — and nothing downstream of the answer.

It intercepts the open, not the past. A file descriptor that was already open before Bulwark started is not gated; Bulwark sees opens, not descriptors that predate it. In the normal flow the gate is installed before the agent is launched, so the supervised tree cannot pre-open a protected file — but a descriptor handed in from outside the tree is out of scope.

It assumes the host root is trusted. Bulwark runs as a privileged process. The same root that runs it can stop it. Bulwark reduces what an agent can reach; it is not a defense against a compromised host or a malicious administrator. This is a deliberate boundary, not an accident — and the kind of tamper-resistance and managed-daemon hardening a fleet deployment needs is part of the Pro/Fleet direction below.

The decision context is ours, not the agent's. The process chain, the path, and the reason shown at decision time are derived by Bulwark from the kernel and the process tree. They are not text supplied by the agent. The agent is the subject of the decision, never a participant in it.

What happens if Bulwark dies. The default gate uses the Linux permission-event mechanism, which has a documented behavior: on a graceful shutdown Bulwark denies any read it was holding (it fails closed), but if the supervising process is hard-killed (SIGKILL), crashes, is OOM-killed, or the machine loses power while a read is held, the kernel releases that held read as allowed. This is a property of that kernel interface, not a Bulwark choice. For workloads where that matters, hardened mode (above) removes the supervisor from the critical path entirely — the restriction is enforced by the kernel on the agent itself, so nothing can be killed to widen it. We would rather state this plainly than imply a wall that is not there.

What happens on the next run. Bulwark cannot un-release a read the kernel already let through at the moment of a hard kill — but it does not pretend the next run starts from a clean slate either. Each run records its integrity context — whether it shut down cleanly, and the identity of every protected file. If the previous run ended uncleanly, or a protected file has been swapped for a different one underneath it, the next run starts tainted: protected reads are denied by default, and an interactive operator is re-asked for every one rather than trusting a cached approval. The taint persists across restarts until a human reviews it and runs bulwark reset. It is a circuit-breaker, not a repair — it bounds what a recovery can quietly resume, and makes the operator acknowledge the break.

How to use it well

Bulwark is one control in a defense-in-depth posture, not a single wall. Used with its grain, it meaningfully reduces what an agent can reach. Used as if it were absolute, its edges will bite. A few practical rules:

Launch the agent under Bulwark from the start.

Start the agent through bulwark run so the gate is in place before the agent’s first open. A descriptor opened before the gate exists is not gated — don’t hand the agent pre-opened files or start it loose and attach later.

Keep secrets out of the workspace in the first place.

Bulwark bounds what the agent can reach; it does not un-leak a secret that already lives in the working tree, a build artifact, or an environment variable. Minimize what is present to be read. The strongest protected file is the one that isn’t in reach at all.

Pair it with an egress control.

Bulwark gates reads, not the network. To bound what leaves after a read, run the agent behind a default-deny egress gate as well. Reach-control and exfiltration-control are different jobs; you want both.

Treat allow decisions as real decisions.

Off-band consent only helps if a human actually weighs the request. Protect a focused set of genuinely sensitive paths rather than everything, so each prompt means something and approval does not become reflexive.

For crash-safety on Linux, use hardened mode.

If your threat model includes an adversary who can hard-kill the supervisor at the exact moment of a read, the fanotify supervisor path is not sufficient on its own — use --hardened, where the restriction is applied as a kernel-enforced Landlock floor on the agent itself (Linux only; macOS has no Landlock equivalent). For an honest developer guarding against an agent over-reaching, the default mode already does its job.

Try the preview

The same command runs the agent under the gate on both platforms — a protected open by the supervised tree is denied at the kernel, by inode:

On Linux the gate is fanotify permission events; on macOS it is an Apple Endpoint Security client, launched per session by the CLI — a signed, notarized component that carries Apple's Endpoint Security entitlement. Both require root. On Linux, off-band consent holds a read while a human decides; macOS today is deny-by-default for protected paths, with the default-deny allowlist mode below working the same on both. Run bulwark doctor for a platform readiness check.

Download signed binaries, notarized macOS installers, and checksums from the latest release.

Local enforcement is open. Managed trust is paid.

The line is not arbitrary pricing — it is architectural. If it runs entirely on your own machines, it is open source. If it depends on managed trust, identity, fleet policy, or audit infrastructure, it is the commercial tier.

Bulwark Core is open source (AGPL) and available now as a technical preview: the read gate, local off-band consent, the CI allowlist, the crash-safe hardened mode, and the peer bulwark ssh mechanism when you own both ends. Local functionality is never gated by a license check — you can read the source and see exactly how the enforcement works.

Bulwark Pro / Fleet is the commercial tier — managed trust for teams, the part you do not want to run yourself (design-partner stage):

· The remote trust channel — mutual-TLS and signed-grant authority — the production hardening of bulwark ssh
· Fleet policy distribution and managed daemon identity
· A centralized, tamper-evident audit pipeline with proof trails
· Team approval flows and an operator cockpit
· An SLA on the consent and trust channel

People can inspect and run the local enforcement primitive for free. Teams pay for the part that makes agents governable at organizational scale: reliable authority across machines, users, and agents — with uptime, auditability, and administration handled for them. We are especially interested in teams adopting AI coding agents who need a practical consent and audit layer before those agents touch sensitive source, credentials, or production-adjacent material.