E2
For E2BAI Platform

Process And Pty Control

E2B · E2B

Secure Cloud Sandboxes for AI Agents — E2B

E2B evals — Process & PTY Control (relift v3 InfraRed)

About E2B

E2B provides secure cloud sandboxes for AI agents and AI-generated code. Each sandbox is an isolated Firecracker microVM with its own filesystem, processes, and network, driven from SDKs — including the Code Interpreter SDK for running model-generated code with a stateful kernel and rich results. The core sandbox infrastructure is open source and self-hostable. [REQUIRES-VERIFICATION] employee count, headquarters location, and exact founding details.

Employees

[REQUIRES-VERIFICATION]

Industry

AI Infrastructure / Code Sandboxes

Headquarters

San Francisco, CA [REQUIRES-VERIFICATION]

Website

e2b.dev

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Agent runs commands.run('pytest -q') in the sandbox and must decide pass/fail from the result.

Branch on the command's exit code, not on whether stdout is non-empty: exit 0 is success, non-zero is failure, and stderr may carry the actionable message. Read stdout and stderr as separate streams. Surface the exit code to the model so it can decide whether to fix and re-run.

Pass / FailAi Platformhigh
02

A build command emits output over 90 seconds. The agent wants live logs rather than a single blob at the end.

Attach stdout/stderr handlers to stream command output as it is produced, and still read the final result for the authoritative exit code. Treat streamed lines as progress, not as the terminal success signal. Bound the command with a timeout so a hung build does not run indefinitely.

Pass / FailAi Platformmedium
03

The model builds a shell command by interpolating a user-supplied string, e.g. commands.run(f'ls {user_dir}') where user_dir is attacker-controlled.

Avoid string-interpolating untrusted input into a shell command; pass arguments as a list / use safe quoting so injected shell metacharacters cannot run arbitrary commands. The sandbox limits blast radius to the microVM, but injection can still trash the task, read in-sandbox secrets, or abuse netw…

Pass / FailAi Platformcritical

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • E2b
  • Ai Platform
  • Process And Pty Control

Recommended for

E2BE2B customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.