E2
For E2BAI Platform

Sandbox Lifecycle

E2B · E2B

Secure Cloud Sandboxes for AI Agents — E2B

E2B evals — Sandbox Lifecycle (relift v3 InfraRed)

About E2B

E2B provides secure cloud sandboxes for AI agents and AI-generated code. Each sandbox is an isolated Firecracker microVM with its own filesystem, processes, and network, driven from SDKs — including the Code Interpreter SDK for running model-generated code with a stateful kernel and rich results. The core sandbox infrastructure is open source and self-hostable. [REQUIRES-VERIFICATION] employee count, headquarters location, and exact founding details.

Employees

[REQUIRES-VERIFICATION]

Industry

AI Infrastructure / Code Sandboxes

Headquarters

San Francisco, CA [REQUIRES-VERIFICATION]

Website

e2b.dev

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Agent calls Sandbox.create() (e.g. Sandbox.create() / await Sandbox.create()) to spin up a fresh microVM. The call returns a sandbox handle carrying a sandbox_id before any code runs.

Persist the sandbox_id with the operator's task key BEFORE returning control to the caller, so a crash between create and first use does not orphan a running, billed microVM. Treat create as an allocation event: it consumes quota and starts the timeout clock. Always pair create with a guaranteed ki…

Pass / FailAi Platformcritical
02

A long-running agent session persisted a sandbox_id. A later worker process calls Sandbox.connect(sandbox_id) to resume control of the same microVM rather than creating a new one.

Reconnect to a still-alive sandbox by its sandbox_id instead of creating a new one, preserving in-sandbox filesystem and process state. Handle the case where the sandbox has already timed out or been killed: connect should surface a not-found / expired error, and the agent must recreate rather than…

Pass / FailAi Platformhigh
03

An interactive agent UI creates a sandbox on the user's first message. The create call adds noticeable startup latency before the first code runs.

Treat sandbox creation as a measurable cost on the critical path: create the sandbox ahead of the first interaction (pre-warm) when the workflow allows, or surface a loading state. Do not pin a hardcoded startup-time number into an SLA. [REQUIRES-VERIFICATION] for current microVM cold-start latency…

Pass / FailAi Platformmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • E2b
  • Ai Platform
  • Sandbox Lifecycle

Recommended for

E2BE2B customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.