Why can't a regular container or VM be used as a sandbox?

A regular Docker container shares the host kernel and was designed for trusted code, not adversarial code. Once an agent has shell access inside a normal container, kernel-level exploits can escape to the host. That is why the 2026 sandbox stack has moved to microVMs (Firecracker, Kata) and user-space kernels like gVisor — they put a real isolation boundary between the agent's syscalls and the host. Anthropic itself uses gVisor for server-side Claude execution and full hypervisor isolation for Claude Cowork.

Is sandboxing the same thing as permissions?

No. Permissions decide what the agent is allowed to ask for; the sandbox decides what is reachable even if the agent asks for something else. Permissions are the front door; the sandbox is the wall. A good design uses both. Sandboxing without permissions still lets the agent take any action inside the box; permissions without sandboxing rely on the agent to behave, which prompt injection breaks regularly.

Do desktop AI agents need a sandbox if they only run locally?

Yes — arguably more than cloud agents. A cloud agent sandbox protects the provider's servers from the agent. A desktop agent sandbox protects the user's own files, browser cookies, SSH keys, and saved passwords from the agent. The threat model is closer: the agent is running inside the same OS user account where your real work lives. Without OS-level sandboxing (Seatbelt on macOS, AppContainer or job objects on Windows) plus permissioned execution, every tool call has the full reach of your login session.

What is the 'lethal trifecta' OWASP warns about?

OWASP's State of Agentic AI Security and Governance report describes a 'lethal trifecta': an agent that has access to private data, exposure to untrusted content, and the ability to communicate externally. Any agent with all three is a prompt-injection exfiltration risk. Meta's 'Agents Rule of Two' codifies the mitigation: an autonomous agent should satisfy at most two of the three properties; the third requires human-in-the-loop approval. Sandbox boundaries are how you enforce that the third property is actually constrained.

How does Lapu AI sandbox the agent on macOS and Windows?

Lapu AI runs the agent as a permissioned local process — not a cloud VM. The OS provides the outer boundary: macOS Seatbelt and Endpoint Security restrict file and network reach; Windows uses AppContainer and job objects for the same purpose. Inside that boundary, every tool call (read this file, run this command, hit this URL) is checked against the current task's permission scope, and destructive actions surface a per-action approval. The audit trail records every decision. The combination — OS sandbox plus permissioned execution plus audit — is closer to how a careful human assistant works than to how a containerized cloud agent works.

Is a cloud-VM sandbox safer than a desktop sandbox?

Safer for the provider; not necessarily safer for the user. A cloud VM gives the agent its own kernel and disposes of it cleanly when the task ends, which is excellent containment. But the user's files, credentials, and apps then have to be copied into the VM (or the agent has to call back to the user's machine) to do real work. A desktop sandbox keeps the work local, leaves credentials in the OS keychain where they belong, and asks the user to approve sensitive actions in the same window they are already using. The right answer depends on whether the task is 'analyze data you give the agent' (cloud VM fine) or 'do work across the apps already on your computer' (desktop sandbox wins).

What Is an AI Agent Sandbox?

An AI agent sandbox is an isolated execution environment that bounds what an autonomous agent can read, run, or send. Unlike a chat assistant that only generates text, a desktop AI agent reads real files and executes real commands — so its sandbox is the difference between a safe tool and a loaded weapon.

This post explains what an agent sandbox is, the three patterns in use today, why desktop agents need a different sandbox model than cloud agents, and how to evaluate any vendor's claim.

What is an AI agent sandbox?

An agent sandbox is the boundary between what the agent can attempt and what it can actually reach. The agent still receives tools — usually some combination of a file API, a shell, a headless browser, and HTTP egress — but each call is intercepted, checked against a policy, and either allowed, denied, or escalated for human approval. If the agent ignores its instructions, hallucinates a destructive command, or follows an injected prompt, the damage stops at the sandbox boundary.

This is the same concept browsers have used for two decades. A web page runs JavaScript with full access to its own document and almost nothing else; the kernel does not let it read your home directory or open arbitrary sockets. Agent sandboxing applies the same idea to a process whose "code" is a stream of model-generated tool calls.

The pieces that make up a sandbox are usually four:

A kernel boundary — a microVM (Firecracker, Kata Containers), a user-space kernel like gVisor, or OS-level isolation primitives (macOS Seatbelt, Windows AppContainer, Linux Landlock + Seccomp).
A filesystem policy — explicit allowlists of paths the agent can read or write, plus deny-by-default for everything else.
A network policy — egress rules that limit which domains and ports the agent can reach; many designs only allow LLM API traffic out and block the rest.
An audit log — a record of every tool call, with parameters, outcome, and any human approval. This is the forensics layer when something goes wrong.

The combination of these four is what people mean by "an AI agent sandbox." Any one alone is insufficient.

Why an agent needs a sandbox at all

The honest answer: because the agent's instructions are not trustworthy. They mix system prompts the developer wrote with content the agent fetches from the open web, and modern LLMs still cannot reliably separate the two.

The OWASP GenAI Security Project's State of Agentic AI Security and Governance (version 2.01) reports that prompt injection now connects most documented agentic incidents and maps to six of the ten categories in OWASP's Top 10 for Agentic Applications. Prompt injection is not a bug; it is a property of how tokens are processed. As Help Net Security summarized the report, "there is no reliable mechanism to enforce privilege boundaries between system prompts, user queries and content retrieved by an agent."

The implication is uncomfortable but clear: if the model cannot be trusted to refuse an instruction it was not supposed to receive, the only durable defense is to make sure the instruction cannot reach anything dangerous. That is what the sandbox is for.

The 2025 academic survey A Survey on the Safety and Security Threats of Computer-Using Agents catalogs the threats more formally, drawing from five research domains (NLP, AI, security, computer vision, and software engineering). Its taxonomy points to the same conclusion: vulnerabilities stem from the LLM reasoning layer combined with the complexity of integrating multiple software components and multimodal inputs, and the mitigation that consistently works is environmental — constrain what the agent can do, do not just hope it behaves.

The three sandbox patterns in use today

Most production systems pick one of three patterns, depending on where the agent runs and how much capability it needs.

Pattern	Isolation primitive	Used by	Best for
Ephemeral container	gVisor + per-session FS	Anthropic claude.ai code execution	Server-side, stateless tasks
Human-in-the-loop sandbox	OS-level (Seatbelt, bubblewrap) + per-action approval	Anthropic Claude Code	Local dev work with user oversight
Sealed VM	Full hypervisor (Apple Virtualization, Windows HCS)	Anthropic Claude Cowork	Autonomous multi-hour tasks

Anthropic's engineering team describes its containment architecture in How we contain Claude across products, and the three patterns above map directly to their three products. The pattern shifts as the agent's autonomy increases: more autonomy, more isolation.

The trade-off is real and worth naming. An ephemeral container is cheap and disposable but cannot do anything to your real machine. A human-in-the-loop sandbox can touch your real files but only with your consent — Anthropic's own telemetry showed users approving roughly 93% of permission prompts, which produces approval fatigue and motivates a classifier that catches about 83% of overeager behaviors before execution. A sealed VM is the strongest containment but moves all the work into a guest OS and requires syncing credentials and files in and out.

                    less autonomy ◀───────────▶ more autonomy
                    less reach    ◀───────────▶ more reach
ephemeral container   ────────  human-in-loop sandbox  ────────  sealed VM
(gVisor, server-side)            (OS sandbox, approvals)         (hypervisor)

None of these patterns, on their own, was designed for a native desktop AI agent that needs to use the apps and files you already have on your machine. Which is why the desktop case looks different.

Sandboxing on the desktop is a different problem

The cloud-VM sandbox protects the provider from the agent. The desktop sandbox has to protect the user from the agent — and the user's threat surface is everything the user already has access to: documents, browser cookies, SSH keys, saved passwords, signed-in apps.

A desktop AI agent that does its job — reading your spreadsheets, sending emails through your real Gmail tab, editing files in your project folder — cannot run in a sealed cloud VM without losing the point. It has to operate inside your real OS user session, which means the sandbox is built out of OS-level primitives, not virtualization.

On macOS, that means Seatbelt sandbox profiles, Endpoint Security framework hooks, and TCC (Transparency, Consent, and Control) for filesystem and device access. On Windows, the equivalent layer is AppContainer, job objects, and Mandatory Integrity Control. Linux desktops add Landlock and Seccomp-BPF. These primitives existed long before agents and were designed to restrict what a process running as the user could do — exactly the right shape for the problem.

The diagram below shows the difference: a cloud VM puts a hypervisor between agent and host, while a desktop sandbox layers the agent's permission gate on top of the OS user session.

Cloud VM sandbox vs desktop agent sandbox — where the layers sit

Layer	Cloud VM sandbox	Desktop agent sandbox
Isolation primitive	Full hypervisor between agent and host	OS-level primitives (Seatbelt, AppContainer, Landlock) — no virtualization
Who it protects	The provider from the agent	The user from the agent
Where the work runs	Inside a disposable guest OS	Inside your real OS user session
Isolation model	Worst case: throw the VM away	Every read, write, and shell command checked against the task scope at the moment it is issued

The trade-off is honest. A desktop sandbox is closer to the user's data, so the isolation has to be tighter at every individual call site. It cannot rely on "the worst case is we throw the VM away" — there is no VM. Every read, every write, every shell command has to be checked against the current task scope at the moment it is issued.

NIST's AI Agent Standards Initiative, launched in early 2026 with a Request for Information on AI agent security (deadline March 9) and a Draft Concept Paper on agent identity and authorization, treats this as a standards-track problem: agents need enterprise-grade identities, short-lived tokens, deny-by-default access, and continuous runtime evaluation rather than a single pre-deployment check. The federal framing matches what the desktop case needs in practice.

What a good desktop agent sandbox actually does

If you are evaluating a desktop agent, here is the checklist that separates a sandbox from a marketing claim.

OS-level isolation, not just an in-process check. The agent's process should be confined by Seatbelt/AppContainer/Landlock so that even a memory-corruption bug cannot reach beyond the sandbox.
Per-action permissions, not session-level. Granting "file access" once at install time fails the same way granting an OS app full-disk access fails: it is a single decision that covers thousands of future actions. Each destructive action should require its own check.
Allowlisted filesystem scope. The current task's working folder is in scope; the home directory, browser profile, and SSH directory are out. Symlinks resolved before path validation, to block escape attempts.
Network egress policy. A clear rule about which domains the agent can reach. For most tasks, that list is short: the LLM API, the specific app the user told the agent to use, and nothing else.
A real audit trail. Every tool call, every permission prompt, every approval or denial — stored locally, queryable later. If the agent did something surprising, you should be able to see exactly what it did.
Credentials stay outside the sandbox. API keys and passwords live in the OS keychain; the sandbox can request a signed action through a broker, but the secret itself never enters the agent's address space. Anthropic's Claude Cowork architecture does this with vsock-bounded credential brokers; the same pattern fits a native desktop sandbox.

Lapu AI is designed against this checklist. The agent runs as a permissioned local process on macOS or Windows, confined by OS sandbox primitives, with each tool call checked against the current task's permission scope before execution. The audit trail records every decision, and destructive actions request explicit per-action approval. The model and the design assumption are the same as Anthropic's containment philosophy, applied to a native desktop process instead of a cloud VM.

For a longer walk through how the permission gate works at runtime, see least privilege AI agent on the desktop and what a desktop AI agent actually is. For the four permission tiers and the "is a desktop AI agent safe?" question head-on, see the desktop AI permission models walkthrough. The permission model and the sandbox model are complementary — together they are what "permissioned execution" means in practice.

Questions to ask any desktop agent vendor

Five questions that will tell you, quickly, whether a desktop agent has a real sandbox or a fig leaf:

What OS primitive confines the agent process? Seatbelt, AppContainer, Landlock, or none?
Is every tool call checked against a per-task scope, or is access granted once at install time?
Where do credentials live? If the agent process has plaintext access to your keychain, it is one prompt-injection away from being exfiltrated.
Can I read the audit trail? A vendor that cannot show you what the agent did is not a vendor that can show you what went wrong.
What does the agent do when a permission prompt is denied? Halt cleanly, or quietly try a different path?

Sandboxing is not a single technique; it is a discipline. The best desktop agent designs treat every tool call as untrusted, every model output as a hint rather than an instruction, and every sensitive action as something that earns its own explicit grant. Anything less is a chat assistant pretending to be safe enough to act.

The sandbox sits inside the wider agent security and permissions model — isolation, least privilege, and the audit trail are the three controls that make a desktop agent safe to run. If you want to try a desktop agent built around this model, download Lapu AI for macOS or Windows or read the pricing page for plan details. Everything stays on your machine; every action goes through the sandbox.

FAQ

What is an AI agent sandbox?: An AI agent sandbox is an isolated execution environment that limits what an autonomous agent can read, write, run, or send over the network. The agent still gets useful tools — a shell, a browser, a file API — but the blast radius of any single action is bounded to that environment. If the agent makes a mistake or gets prompt-injected, the damage stops at the sandbox boundary instead of reaching your real files, credentials, or production systems.
Why can't a regular container or VM be used as a sandbox?: A regular Docker container shares the host kernel and was designed for trusted code, not adversarial code. Once an agent has shell access inside a normal container, kernel-level exploits can escape to the host. That is why the 2026 sandbox stack has moved to microVMs (Firecracker, Kata) and user-space kernels like gVisor — they put a real isolation boundary between the agent's syscalls and the host. Anthropic itself uses gVisor for server-side Claude execution and full hypervisor isolation for Claude Cowork.
Is sandboxing the same thing as permissions?: No. Permissions decide what the agent is allowed to ask for; the sandbox decides what is reachable even if the agent asks for something else. Permissions are the front door; the sandbox is the wall. A good design uses both. Sandboxing without permissions still lets the agent take any action inside the box; permissions without sandboxing rely on the agent to behave, which prompt injection breaks regularly.
Do desktop AI agents need a sandbox if they only run locally?: Yes — arguably more than cloud agents. A cloud agent sandbox protects the provider's servers from the agent. A desktop agent sandbox protects the user's own files, browser cookies, SSH keys, and saved passwords from the agent. The threat model is closer: the agent is running inside the same OS user account where your real work lives. Without OS-level sandboxing (Seatbelt on macOS, AppContainer or job objects on Windows) plus permissioned execution, every tool call has the full reach of your login session.
What is the 'lethal trifecta' OWASP warns about?: OWASP's State of Agentic AI Security and Governance report describes a 'lethal trifecta': an agent that has access to private data, exposure to untrusted content, and the ability to communicate externally. Any agent with all three is a prompt-injection exfiltration risk. Meta's 'Agents Rule of Two' codifies the mitigation: an autonomous agent should satisfy at most two of the three properties; the third requires human-in-the-loop approval. Sandbox boundaries are how you enforce that the third property is actually constrained.
How does Lapu AI sandbox the agent on macOS and Windows?: Lapu AI runs the agent as a permissioned local process — not a cloud VM. The OS provides the outer boundary: macOS Seatbelt and Endpoint Security restrict file and network reach; Windows uses AppContainer and job objects for the same purpose. Inside that boundary, every tool call (read this file, run this command, hit this URL) is checked against the current task's permission scope, and destructive actions surface a per-action approval. The audit trail records every decision. The combination — OS sandbox plus permissioned execution plus audit — is closer to how a careful human assistant works than to how a containerized cloud agent works.
Is a cloud-VM sandbox safer than a desktop sandbox?: Safer for the provider; not necessarily safer for the user. A cloud VM gives the agent its own kernel and disposes of it cleanly when the task ends, which is excellent containment. But the user's files, credentials, and apps then have to be copied into the VM (or the agent has to call back to the user's machine) to do real work. A desktop sandbox keeps the work local, leaves credentials in the OS keychain where they belong, and asks the user to approve sensitive actions in the same window they are already using. The right answer depends on whether the task is 'analyze data you give the agent' (cloud VM fine) or 'do work across the apps already on your computer' (desktop sandbox wins).

Sources

How we contain Claude across products — Anthropic (2026-04-15) · accessed 2026-06-16
Computer use tool — Anthropic (2025-11-24) · accessed 2026-06-16
AI Agent Standards Initiative — NIST (2026-04-20) · accessed 2026-06-16
A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron? (2025-05-16) · accessed 2026-06-16
Prompt injection still drives most agentic AI security failures in production — Help Net Security / OWASP GenAI Security Project (2026-06-11) · accessed 2026-06-16

What Is an AI Agent Sandbox? — Lapu AI

What is an AI agent sandbox?

Why an agent needs a sandbox at all

The three sandbox patterns in use today

Sandboxing on the desktop is a different problem

What a good desktop agent sandbox actually does

Questions to ask any desktop agent vendor

FAQ

Sources

Related articles

Least Privilege AI Agent on the Desktop — Lapu AI

AI Agent Permissions: How to Keep Desktop AI Safe — Lapu AI

AI Agent Audit Trail: What to Log and Why It Matters

Automate the work between you and outcomes