Your Agent Has Access. The Question Is: How Much?
Agent sandboxing isn't one thing — it's a spectrum from lightweight process isolation to full policy-governed runtimes. This page maps every level, every tool, and helps you choose what fits your threat model.
The Isolation Spectrum
Before diving into the full taxonomy, here's the landscape. Every sandbox sits somewhere on this spectrum — the tradeoff is always isolation strength vs. overhead.
Protect Your Machine
Keep agents out of your files, secrets, and home directory. Stop accidental reads of ~/.ssh and .env.
Bubblewrap, Firejail
Zero overhead. Instant. Linux.
Isolate Workloads
Disposable environments that spin up and throw away. Agents get a fresh sandbox each task — state doesn't bleed between runs.
Docker, E2B, Fly.io
Ephemeral. Cross-platform.
Govern Access
Control exactly which APIs, methods, and endpoints an agent can reach. Policy-as-code enforcement at the network level.
NVIDIA OpenShell
Policy-as-code. Enterprise.
Isolation Levels — The Full Taxonomy
Seven levels, each wrapping the ones below. The further out you go, the stronger the boundary — and the higher the operational cost.
| Level | Tech | Kernel Shared? | Escape Risk | Overhead |
|---|---|---|---|---|
| L1: Process | seccomp-bpf, Landlock | Yes | Medium | Near-zero |
| L2: Namespace | Bubblewrap, Firejail | Yes | Low-Medium | Near-zero |
| L3: Container | Docker, Podman, Incus | Yes | Low (CVEs exist) | Low |
| L4: Userspace Kernel | gVisor | Partial | Low | Medium |
| L5: MicroVM | Firecracker, Cloud Hypervisor | No | Very Low | Medium-High |
| L6: Full VM | QEMU/KVM, Hyper-V | No | Minimal | High |
| L7: Policy Engine | NVIDIA OpenShell | Varies | Very Low | Medium |
Protects
- Filesystem paths
- Syscall surface
When to Use
Low-trust local scripts, helper processes, tight latency requirements
Trade-off
Shared kernel — kernel exploit can escape the sandbox
Protects
- Filesystem
- Network
- Process tree
- User IDs
When to Use
Local agent execution, dev machines, Linux-native workflows
Trade-off
Linux-only; still shares kernel with host
Protects
- Filesystem
- Network
- Process tree
- Resource limits
When to Use
Cross-platform agent workloads, ephemeral environments, most cloud deployments
Trade-off
Kernel escape CVEs (runc, containerd history); not true hardware isolation
Protects
- Filesystem
- Network
- Syscall surface
- Host kernel
When to Use
Untrusted code execution in cloud, GKE workloads, moderate performance requirement
Trade-off
Performance overhead on syscall-heavy workloads; not all syscalls supported
Protects
- Filesystem
- Network
- Kernel
- Memory isolation
When to Use
Cloud sandboxes (E2B, Fly.io), multi-tenant agent platforms, high-security requirements
Trade-off
Requires KVM; cold start 100ms–1s; harder local dev
Protects
- Full OS stack
- Hardware abstraction
- Network
- Storage
When to Use
Maximum security, legacy workloads, compliance-mandated isolation
Trade-off
High overhead; slow cold start (minutes); operational complexity
Protects
- Filesystem
- Network (method+path)
- Processes
- API endpoints
- GPU access
When to Use
Enterprise agent governance, GPU inference, multi-tenant, policy-as-code requirement
Trade-off
Per-endpoint policy maintenance; k3s cluster required; seconds cold start
Every Tool Compared
19 tools across the entire spectrum — from Linux-native namespace wrappers to managed cloud MicroVM platforms.
| Tool | Level | OS | GPU | Cold Start |
|---|---|---|---|---|
| Bubblewrap (bwrap) | L2 | Linux | — | Instant |
| Firejail | L2 | Linux | — | Instant |
| Docker Sandboxes | L3 | Lin/Mac/Win | Limited | Sub-second |
| Clampdown | L3 | Linux/macOS | — | Fast |
| code-on-incus | L3 | Linux | — | Fast |
| Daytona | L3 | Cross-platform | Yes | <90ms |
| BoxLite | L3+ | Linux/macOS | — | Fast |
| Sandbox0 | L3 | Linux | — | Fast |
| ContainAI | L3 | Linux | — | Fast |
| Cagent | L3+ | Linux | — | Fast |
| Modal | L4 | Cloud | Best | Sub-second |
| Google Agent Sandbox | L4+ | GKE | Yes | Varies |
| k8s-sigs/agent-sandbox | L4+ | Any K8s | Yes | Varies |
| E2B | L5 | Cloud | — | ~150ms |
| Fly.io Sprites | L5 | Cloud | — | 1–12s |
| Northflank | L5 | Cloud | — | Fast |
| Alibaba OpenSandbox | L3 | Linux | — | Varies |
| Microsandbox | L5 | Linux | — | Varies |
| NVIDIA OpenShell | L7 | Linux (k3s) | Yes | Seconds |
Choose by Threat Model
Don't pick a sandbox by vibes. Pick it by what you're defending against. Here's the decision map.
Local filesystem access
Agent reads ~/.ssh, .env, credentials
Network exfiltration
Agent sends data to unauthorized endpoints
Destructive commands
Agent runs rm -rf, drops tables
Privilege escalation
Agent gains root, installs packages
Kernel exploit
Agent escapes via kernel vulnerability
API abuse
Agent calls wrong endpoints or methods
State persistence
Agent needs to survive restarts
Latency-sensitive workflows
Interactive agent, can't wait for boot
GPU inference
Agent needs local model inference
Considerations
Nine things worth thinking through before you pick your sandbox strategy.
Performance vs Security
L2 is instant but shares the kernel. L5 is isolated but adds cold start overhead. Pick based on your actual threat model — most agents don't need MicroVM protection.
OS Portability
Bubblewrap, Firejail, and Landlock are Linux-only. If your agents run on macOS or Windows dev machines, containers are your minimum viable sandbox.
Policy Maintenance
L7 policy engines require per-endpoint maintenance. Is your team prepared to define and update network policies per agent binary? It's powerful but carries operational cost.
Escape Surface
Containers share the kernel — container escape CVEs (runc, containerd) have a documented history. MicroVMs isolate at hardware. Know what you're trusting.
Debugging Complexity
More isolation means harder debugging. Audit logs help but add overhead. Plan your observability strategy before locking down the sandbox.
GPU Passthrough
Most lightweight sandboxes can't pass GPUs to containerized workloads. If your agent needs local inference, plan for L5+ or a managed GPU sandbox like Modal or OpenShell.
Statefulness
Ephemeral sandboxes (E2B, Firecracker) are great for short-lived tasks. Persistent agents (long-running sessions, file editing) need snapshot support or persistent volumes.
Cold Start
Instant (bwrap) → sub-second (Docker) → ~150ms (E2B) → 1–12s (Fly.io Sprites) → minutes (full VM). Match cold start budget to the UX you need.
DNS in Cluster Sandboxes
K8s-based sandboxes can have DNS issues in child namespaces. Test your agent's DNS resolution behavior early — silent DNS failures are a common gotcha.
How Sandboxing Fits the Stack
Sandboxing is Layer 0 — the execution boundary beneath everything else. It's necessary but not sufficient. Above it sit instructions, hooks, and CI/CD gates that together make agentic safety structural.
| Layer | What | Where |
|---|---|---|
| 0: Sandbox | Execution boundary | This page ← you are here |
| 1: Instructions | Context engineering | Agent-Proof Architecture |
| 2: Hooks | Tool-call interception | Copilot CLI Hooks |
| 3: Gates | CI/CD validation | Agentic Workflows |
Sandboxing is Layer 0, not the whole solution
A sandbox defines the execution boundary — what the agent can and cannot touch. But it doesn't tell the agent what to do. Instructions (Layer 1) set context and intent. Hooks (Layer 2) enforce rules in real time. CI/CD gates (Layer 3) catch what slips through. The full Agentic DevOps stack uses all four layers together.
Further Reading
The articles that go deeper on specific tools and the philosophy behind agent sandboxing.
The Sandbox Your AI Agents Should Be Running In
NVIDIA's OpenShell is the first policy-governed, multi-layer sandbox purpose-built for GPU-accelerated agent workloads.
InfrastructureNVIDIA OpenShell and the Rise of Agent Sandboxes
NVIDIA's OpenShell dropped the first policy-driven sandbox for AI agents. Here's why sandboxes are Layer 0 of agentic DevOps.
Agentic DevOpsAgentic DevOps Hub
The full Agentic DevOps philosophy, stack, and tooling — sandboxing is Layer 0, but see how all five layers work together.
ArchitectureBuilding Agent-Proof Architecture
Layered enforcement that makes agents structurally incapable of shipping untested code — the full five-layer model.
Ready to Sandbox Your Agents?
I help engineering teams design and implement agent execution boundaries — from namespace-level isolation for local dev to policy-governed MicroVM environments for production. Let's talk.
Book a Free Consultation