The Agent Boundary Problem
On March 8, 2026, an agent with too much reach replaced Plesk's nginx package on my VPS and knocked multiple proxied apps sideways. This is the incident, and the boundary rules that came out of it.
On March 8, 2026, I learned the expensive version of a simple rule: if an agent can touch the host package manager, it can touch the shape of the whole machine.
The break was not dramatic in the movie sense. No sparks. No giant red dashboard. One agent replaced Plesk's sw-nginxnginx
This is the agent boundary problem. Most agent demos focus on whether the model can do the task. Production asks a harder question: what happens when it does the wrong task cleanly?
What actually broke on March 8, 2026?
The short version: sw-nginxnginxquic
Stock Ubuntu nginx80443
The repair path was ugly in a very normal ops way:
| Step | Why it mattered |
|---|---|
| Remove the manually added non-Plesk nginx config | Clear any extra drift before reinstall |
Remove stock | Get the wrong package family off the box |
Reinstall | Restore the package Plesk actually expects |
| Fix Plesk component detection | Plesk had lost the nginx binary path after reinstall |
| Re-enable nginx through Plesk tooling | Bring the service back the way the control plane expects |
That work restored the reverse proxies and brought the affected endpoints back. The bigger fix was cultural, not technical: I stopped letting "helpful" be the same thing as "allowed."
Why was this a boundary failure, not just a bad command?
Because the command was only possible inside the wrong lane.
If an agent can edit app code, inspect logs, restart a container, install packages, and reload host services from the same context, you do not have a tool. You have an operator with fuzzy judgment and no instinct for consequences.
That is not me being anti-agent. I run them all over the place. The lesson is narrower than that. Agents need lanes, the same way humans do. In my ZeroRelay setup, Zee is container-only by design, while Claude is the host-capable operator lane. That split exists because host actions and container actions are not the same risk class.
flowchart TD
A["General-purpose agent gets broad host access"] --> B["Host package change"]
B --> C["Core service config no longer matches runtime"]
C --> D["nginx fails to start"]
D --> E["Traffic falls through to wrong layer"]
E --> F["Multiple apps look broken at once"]
F --> G["Human writes a new boundary rule"]The expensive part is not that the agent made a mistake. Humans do that too. The expensive part is that the system design gave one agent enough reach to turn a mistake into infrastructure drift.
What changed after the incident?
The explicit rule I wrote down was blunt: only one operator-facing agent gets VPS host access.
That rule now lives in memory because I do not trust "we'll remember next time" as a control mechanism.
The system also leans on the controls Claude Code already exposes around permissioning and hooks. Its hooks can block a tool call before it runs, and permission decisions can deny an action at the gate instead of cleaning it up afterward (Claude Code hooks reference, Claude Code docs).
The practical changes were:
- One host-capable lane, not many. If something needs package management, system service control, or deep host surgery, it belongs to the designated operator agent only.
- Container-only lanes for everything else. Zee can work inside his container. He cannot reach out and "help" on the VPS host.
- Scoped sudo instead of vague trust. Useful commands stay available. Broad package and service authority do not sit open by default.
- Hooks for pre-flight refusal. If a command crosses a boundary, the system should say no before execution, not write a post-mortem afterward.
- Written doctrine in the repo, not oral tradition in my head. If a boundary matters, it needs to survive the next tired day.
This is the boring side of agent design, but boring is what keeps the lights on.
How should you design agent boundaries on a real server?
Start by assuming every agent will eventually take the most locally reasonable action with globally bad consequences.
That framing changes the architecture fast. You stop asking, "Can this agent solve the task?" and start asking, "What is the maximum damage if it solves the wrong task?" That is a much better systems question.
Here is the model I trust now:
| Capability | Default owner | Reason |
|---|---|---|
| Read logs, inspect files, draft changes | Most agents | High utility, low blast radius |
| Restart app containers | Narrow operator lanes | Recoverable, but still production-touching |
| Edit host nginx/Plesk config | Single host operator | Shared infrastructure surface |
| Install or replace host packages | Human or single host operator with guardrails | Highest drift risk for self-hosted stacks |
If you are early, this can feel restrictive. It is. That is the point. Freedom is cheap in a sandbox and expensive on a live box.
A good boundary should be easy to explain in one sentence. Mine is: container agents stay in containers, and host agents need a reason plus a gate. If the sentence needs a paragraph of exceptions, the boundary is already leaking.
If you are building out the operational layer, the Agents guide is the broad map, and the ai-workflowsvps-infra
FAQ
- What is an agent boundary in practice?
It is the written damage limit for a given agent. If that agent goes off-script, the boundary decides whether the mistake stays inside one app, one container, or jumps to the host. Separate users, tool profiles, scoped sudo, and hard container-only lanes are how you enforce that limit in practice.
- Why not just trust one smart agent with full VPS access?
Because the downside is lopsided. A smart agent with broad access can save you time for weeks, then burn all of that savings in one wrong package change or service reload. Full host access is not just "more capability," it is a much larger recovery surface when the judgment slips.
- What is the minimum useful setup?
One lane per agent, host package management denied by default, one host-capable operator lane, and pre-tool blocks for dangerous commands. You can get fancier later. That baseline already cuts out a lot of stupid risk.
The incident itself was fixable. The useful part was what it exposed: agent architecture is still software architecture, and software architecture still lives or dies on boundaries. If an agent can quietly replace part of your host runtime, you do not have autonomy. You have ambient risk wearing a productivity hat.
That sounds harsh, but it made the system better. The lanes are clearer now. The rules are written down. And the next time an agent wants to be helpful, it has less room to be creatively destructive.
Tested with Claude Code v2.1.91