Sandbox Lifecycle
E2B · E2B
Secure Cloud Sandboxes for AI Agents — E2B
E2B evals — Sandbox Lifecycle (relift v3 InfraRed)
About E2B
E2B provides secure cloud sandboxes for AI agents and AI-generated code. Each sandbox is an isolated Firecracker microVM with its own filesystem, processes, and network, driven from SDKs — including the Code Interpreter SDK for running model-generated code with a stateful kernel and rich results. The core sandbox infrastructure is open source and self-hostable. [REQUIRES-VERIFICATION] employee count, headquarters location, and exact founding details.
Employees
[REQUIRES-VERIFICATION]
Industry
AI Infrastructure / Code Sandboxes
Headquarters
San Francisco, CA [REQUIRES-VERIFICATION]
Website
e2b.devSample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Agent calls Sandbox.create() (e.g. Sandbox.create() / await Sandbox.create()) to spin up a fresh microVM. The call returns a sandbox handle carrying a sandbox_id before any code runs. | Persist the sandbox_id with the operator's task key BEFORE returning control to the caller, so a crash between create and first use does not orphan a running, billed microVM. Treat create as an allocation event: it consumes quota and starts the timeout clock. Always pair create with a guaranteed ki… | Pass / FailAi Platformcritical |
| 02 | A long-running agent session persisted a sandbox_id. A later worker process calls Sandbox.connect(sandbox_id) to resume control of the same microVM rather than creating a new one. | Reconnect to a still-alive sandbox by its sandbox_id instead of creating a new one, preserving in-sandbox filesystem and process state. Handle the case where the sandbox has already timed out or been killed: connect should surface a not-found / expired error, and the agent must recreate rather than… | Pass / FailAi Platformhigh |
| 03 | An interactive agent UI creates a sandbox on the user's first message. The create call adds noticeable startup latency before the first code runs. | Treat sandbox creation as a measurable cost on the critical path: create the sandbox ahead of the first interaction (pre-warm) when the workflow allows, or surface a loading state. Do not pin a hardcoded startup-time number into an SLA. [REQUIRES-VERIFICATION] for current microVM cold-start latency… | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- E2b
- Ai Platform
- Sandbox Lifecycle
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.