Code Interpreter And Execution
E2B · E2B
Secure Cloud Sandboxes for AI Agents — E2B
E2B evals — Code Interpreter & Execution (relift v3 InfraRed)
About E2B
E2B provides secure cloud sandboxes for AI agents and AI-generated code. Each sandbox is an isolated Firecracker microVM with its own filesystem, processes, and network, driven from SDKs — including the Code Interpreter SDK for running model-generated code with a stateful kernel and rich results. The core sandbox infrastructure is open source and self-hostable. [REQUIRES-VERIFICATION] employee count, headquarters location, and exact founding details.
Employees
[REQUIRES-VERIFICATION]
Industry
AI Infrastructure / Code Sandboxes
Headquarters
San Francisco, CA [REQUIRES-VERIFICATION]
Website
e2b.devSample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Agent calls sandbox.run_code('print(2+2)') via the Code Interpreter SDK. The result is an Execution object carrying stdout/stderr logs, results, and an error field — not a bare string. | Parse the structured Execution: read logs.stdout / logs.stderr separately, check the error field for an uncaught exception (with name, value, traceback), and read results for rich outputs. Do not assume the return is plain stdout text. A non-null error means the cell raised even if some stdout was … | Pass / FailAi Platformhigh |
| 02 | A long-running cell prints progress over 60 seconds. The agent wires on_stdout / on_stderr callbacks to stream output to the user instead of waiting for the cell to finish. | Attach the streaming output handlers (on_stdout/on_stderr, and result handlers where supported) so long-running cells surface progress incrementally. Treat the stream as best-effort log lines, and still read the final Execution for the authoritative error/results — streaming callbacks complement, n… | Pass / FailAi Platformmedium |
| 03 | An agent loop pipes raw LLM-generated code straight into run_code in a sandbox that also holds the operator's cloud credentials as env vars. | Treat model-generated code as untrusted: the sandbox boundary protects the host, but anything inside the sandbox (env vars, mounted secrets, network egress) is reachable by the executed code. Do not place credentials the agent should not exfiltrate inside an untrusted-code sandbox; scope egress and… | Pass / FailAi Platformcritical |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- E2b
- Ai Platform
- Code Interpreter And Execution
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.