Skip to content
// agent-sandboxing

Your Agent Has Access. The Question Is: How Much?

Agent sandboxing isn't one thing — it's a spectrum from lightweight process isolation to full policy-governed runtimes. This page maps every level, every tool, and helps you choose what fits your threat model.

// the big picture

The Isolation Spectrum

Before diving into the full taxonomy, here's the landscape. Every sandbox sits somewhere on this spectrum — the tradeoff is always isolation strength vs. overhead.

Lightest Strongest 🔓 Process seccomp Landlock Near-zero overhead 🧱 Namespace bubblewrap Firejail Near-zero overhead 📦 Container Docker Podman · Incus Low overhead 🖥️ MicroVM Firecracker E2B · Cloud Hypervisor Medium-High overhead 🛡️ Policy Engine OpenShell OPA + Landlock Medium overhead L1 L2 L3–L4 L5 L7
🏠

Protect Your Machine

Keep agents out of your files, secrets, and home directory. Stop accidental reads of ~/.ssh and .env.

Bubblewrap, Firejail

Zero overhead. Instant. Linux.

📦

Isolate Workloads

Disposable environments that spin up and throw away. Agents get a fresh sandbox each task — state doesn't bleed between runs.

Docker, E2B, Fly.io

Ephemeral. Cross-platform.

🔐

Govern Access

Control exactly which APIs, methods, and endpoints an agent can reach. Policy-as-code enforcement at the network level.

NVIDIA OpenShell

Policy-as-code. Enterprise.

// isolation taxonomy

Isolation Levels — The Full Taxonomy

Seven levels, each wrapping the ones below. The further out you go, the stronger the boundary — and the higher the operational cost.

L7: Policy Engine (OpenShell) L7 network inspection + multi-level enforcement + GPU policy L6: Full VM (QEMU/KVM, Hyper-V) Complete OS isolation, dedicated everything L5: MicroVM (Firecracker, Cloud Hypervisor) Dedicated kernel, hardware virtualization L4: Userspace Kernel (gVisor) Syscall interception in userspace L3: Container (Docker, Podman, Incus) cgroups + namespaces + layered filesystem L2: Namespace (Bubblewrap, Firejail) PID, mount, net, user namespaces L1: Process (seccomp, Landlock) Syscall filtering, filesystem restrictions 🤖 Agent Executes here L7 L6 L5 L4
Level Tech Kernel Shared? Escape Risk Overhead
L1: Process seccomp-bpf, Landlock Yes Medium Near-zero
L2: Namespace Bubblewrap, Firejail Yes Low-Medium Near-zero
L3: Container Docker, Podman, Incus Yes Low (CVEs exist) Low
L4: Userspace Kernel gVisor Partial Low Medium
L5: MicroVM Firecracker, Cloud Hypervisor No Very Low Medium-High
L6: Full VM QEMU/KVM, Hyper-V No Minimal High
L7: Policy Engine NVIDIA OpenShell Varies Very Low Medium
L1: Process seccomp-bpf, Landlock

Protects

  • Filesystem paths
  • Syscall surface

When to Use

Low-trust local scripts, helper processes, tight latency requirements

Trade-off

Shared kernel — kernel exploit can escape the sandbox

L2: Namespace Bubblewrap, Firejail

Protects

  • Filesystem
  • Network
  • Process tree
  • User IDs

When to Use

Local agent execution, dev machines, Linux-native workflows

Trade-off

Linux-only; still shares kernel with host

L3: Container Docker, Podman, Incus

Protects

  • Filesystem
  • Network
  • Process tree
  • Resource limits

When to Use

Cross-platform agent workloads, ephemeral environments, most cloud deployments

Trade-off

Kernel escape CVEs (runc, containerd history); not true hardware isolation

L4: Userspace Kernel gVisor

Protects

  • Filesystem
  • Network
  • Syscall surface
  • Host kernel

When to Use

Untrusted code execution in cloud, GKE workloads, moderate performance requirement

Trade-off

Performance overhead on syscall-heavy workloads; not all syscalls supported

L5: MicroVM Firecracker, Cloud Hypervisor

Protects

  • Filesystem
  • Network
  • Kernel
  • Memory isolation

When to Use

Cloud sandboxes (E2B, Fly.io), multi-tenant agent platforms, high-security requirements

Trade-off

Requires KVM; cold start 100ms–1s; harder local dev

L6: Full VM QEMU/KVM, Hyper-V

Protects

  • Full OS stack
  • Hardware abstraction
  • Network
  • Storage

When to Use

Maximum security, legacy workloads, compliance-mandated isolation

Trade-off

High overhead; slow cold start (minutes); operational complexity

L7: Policy Engine NVIDIA OpenShell

Protects

  • Filesystem
  • Network (method+path)
  • Processes
  • API endpoints
  • GPU access

When to Use

Enterprise agent governance, GPU inference, multi-tenant, policy-as-code requirement

Trade-off

Per-endpoint policy maintenance; k3s cluster required; seconds cold start

// tool comparison

Every Tool Compared

19 tools across the entire spectrum — from Linux-native namespace wrappers to managed cloud MicroVM platforms.

Tool Level OS GPU Cold Start
Bubblewrap (bwrap) L2 Linux Instant
Firejail L2 Linux Instant
Docker Sandboxes L3 Lin/Mac/Win Limited Sub-second
Clampdown L3 Linux/macOS Fast
code-on-incus L3 Linux Fast
Daytona L3 Cross-platform Yes <90ms
BoxLite L3+ Linux/macOS Fast
Sandbox0 L3 Linux Fast
ContainAI L3 Linux Fast
Cagent L3+ Linux Fast
Modal L4 Cloud Best Sub-second
Google Agent Sandbox L4+ GKE Yes Varies
k8s-sigs/agent-sandbox L4+ Any K8s Yes Varies
E2B L5 Cloud ~150ms
Fly.io Sprites L5 Cloud 1–12s
Northflank L5 Cloud Fast
Alibaba OpenSandbox L3 Linux Varies
Microsandbox L5 Linux Varies
NVIDIA OpenShell L7 Linux (k3s) Yes Seconds
// choose your sandbox

Choose by Threat Model

Don't pick a sandbox by vibes. Pick it by what you're defending against. Here's the decision map.

🏠

Local filesystem access

Agent reads ~/.ssh, .env, credentials

L2 Bubblewrap, Firejail
🌐

Network exfiltration

Agent sends data to unauthorized endpoints

L3–L7 Docker (port-level), OpenShell (method-level)
💣

Destructive commands

Agent runs rm -rf, drops tables

L2–L3 Bubblewrap (read-only mounts), Docker
🔑

Privilege escalation

Agent gains root, installs packages

L1–L3 seccomp, containers
🧬

Kernel exploit

Agent escapes via kernel vulnerability

L5–L6 Firecracker, QEMU (separate kernel)
📡

API abuse

Agent calls wrong endpoints or methods

L7 OpenShell (HTTP path + method inspection)
🔄

State persistence

Agent needs to survive restarts

L3+ BoxLite (snapshots), OpenShell, Sandbox0

Latency-sensitive workflows

Interactive agent, can't wait for boot

L1–L2 seccomp (zero), Bubblewrap (instant)
🎮

GPU inference

Agent needs local model inference

L5–L7 OpenShell (DGX/RTX), Modal, K8s
// before you decide

Considerations

Nine things worth thinking through before you pick your sandbox strategy.

⚖️

Performance vs Security

L2 is instant but shares the kernel. L5 is isolated but adds cold start overhead. Pick based on your actual threat model — most agents don't need MicroVM protection.

🖥️

OS Portability

Bubblewrap, Firejail, and Landlock are Linux-only. If your agents run on macOS or Windows dev machines, containers are your minimum viable sandbox.

📋

Policy Maintenance

L7 policy engines require per-endpoint maintenance. Is your team prepared to define and update network policies per agent binary? It's powerful but carries operational cost.

🚪

Escape Surface

Containers share the kernel — container escape CVEs (runc, containerd) have a documented history. MicroVMs isolate at hardware. Know what you're trusting.

🔍

Debugging Complexity

More isolation means harder debugging. Audit logs help but add overhead. Plan your observability strategy before locking down the sandbox.

🎮

GPU Passthrough

Most lightweight sandboxes can't pass GPUs to containerized workloads. If your agent needs local inference, plan for L5+ or a managed GPU sandbox like Modal or OpenShell.

💾

Statefulness

Ephemeral sandboxes (E2B, Firecracker) are great for short-lived tasks. Persistent agents (long-running sessions, file editing) need snapshot support or persistent volumes.

🕐

Cold Start

Instant (bwrap) → sub-second (Docker) → ~150ms (E2B) → 1–12s (Fly.io Sprites) → minutes (full VM). Match cold start budget to the UX you need.

🌐

DNS in Cluster Sandboxes

K8s-based sandboxes can have DNS issues in child namespaces. Test your agent's DNS resolution behavior early — silent DNS failures are a common gotcha.

// the bigger picture

How Sandboxing Fits the Stack

Sandboxing is Layer 0 — the execution boundary beneath everything else. It's necessary but not sufficient. Above it sit instructions, hooks, and CI/CD gates that together make agentic safety structural.

Layer What Where
0: Sandbox Execution boundary This page ← you are here
1: Instructions Context engineering Agent-Proof Architecture
2: Hooks Tool-call interception Copilot CLI Hooks
3: Gates CI/CD validation Agentic Workflows
💡

Sandboxing is Layer 0, not the whole solution

A sandbox defines the execution boundary — what the agent can and cannot touch. But it doesn't tell the agent what to do. Instructions (Layer 1) set context and intent. Hooks (Layer 2) enforce rules in real time. CI/CD gates (Layer 3) catch what slips through. The full Agentic DevOps stack uses all four layers together.

// let's build

Ready to Sandbox Your Agents?

I help engineering teams design and implement agent execution boundaries — from namespace-level isolation for local dev to policy-governed MicroVM environments for production. Let's talk.

Book a Free Consultation