E2
For E2BAI Platform

Code Interpreter And Execution

E2B · E2B

Secure Cloud Sandboxes for AI Agents — E2B

E2B evals — Code Interpreter & Execution (relift v3 InfraRed)

About E2B

E2B provides secure cloud sandboxes for AI agents and AI-generated code. Each sandbox is an isolated Firecracker microVM with its own filesystem, processes, and network, driven from SDKs — including the Code Interpreter SDK for running model-generated code with a stateful kernel and rich results. The core sandbox infrastructure is open source and self-hostable. [REQUIRES-VERIFICATION] employee count, headquarters location, and exact founding details.

Employees

[REQUIRES-VERIFICATION]

Industry

AI Infrastructure / Code Sandboxes

Headquarters

San Francisco, CA [REQUIRES-VERIFICATION]

Website

e2b.dev

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Agent calls sandbox.run_code('print(2+2)') via the Code Interpreter SDK. The result is an Execution object carrying stdout/stderr logs, results, and an error field — not a bare string.

Parse the structured Execution: read logs.stdout / logs.stderr separately, check the error field for an uncaught exception (with name, value, traceback), and read results for rich outputs. Do not assume the return is plain stdout text. A non-null error means the cell raised even if some stdout was …

Pass / FailAi Platformhigh
02

A long-running cell prints progress over 60 seconds. The agent wires on_stdout / on_stderr callbacks to stream output to the user instead of waiting for the cell to finish.

Attach the streaming output handlers (on_stdout/on_stderr, and result handlers where supported) so long-running cells surface progress incrementally. Treat the stream as best-effort log lines, and still read the final Execution for the authoritative error/results — streaming callbacks complement, n…

Pass / FailAi Platformmedium
03

An agent loop pipes raw LLM-generated code straight into run_code in a sandbox that also holds the operator's cloud credentials as env vars.

Treat model-generated code as untrusted: the sandbox boundary protects the host, but anything inside the sandbox (env vars, mounted secrets, network egress) is reachable by the executed code. Do not place credentials the agent should not exfiltrate inside an untrusted-code sandbox; scope egress and…

Pass / FailAi Platformcritical

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • E2b
  • Ai Platform
  • Code Interpreter And Execution

Recommended for

E2BE2B customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.