Process And Pty Control
E2B · E2B
Secure Cloud Sandboxes for AI Agents — E2B
E2B evals — Process & PTY Control (relift v3 InfraRed)
About E2B
E2B provides secure cloud sandboxes for AI agents and AI-generated code. Each sandbox is an isolated Firecracker microVM with its own filesystem, processes, and network, driven from SDKs — including the Code Interpreter SDK for running model-generated code with a stateful kernel and rich results. The core sandbox infrastructure is open source and self-hostable. [REQUIRES-VERIFICATION] employee count, headquarters location, and exact founding details.
Employees
[REQUIRES-VERIFICATION]
Industry
AI Infrastructure / Code Sandboxes
Headquarters
San Francisco, CA [REQUIRES-VERIFICATION]
Website
e2b.devSample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Agent runs commands.run('pytest -q') in the sandbox and must decide pass/fail from the result. | Branch on the command's exit code, not on whether stdout is non-empty: exit 0 is success, non-zero is failure, and stderr may carry the actionable message. Read stdout and stderr as separate streams. Surface the exit code to the model so it can decide whether to fix and re-run. | Pass / FailAi Platformhigh |
| 02 | A build command emits output over 90 seconds. The agent wants live logs rather than a single blob at the end. | Attach stdout/stderr handlers to stream command output as it is produced, and still read the final result for the authoritative exit code. Treat streamed lines as progress, not as the terminal success signal. Bound the command with a timeout so a hung build does not run indefinitely. | Pass / FailAi Platformmedium |
| 03 | The model builds a shell command by interpolating a user-supplied string, e.g. commands.run(f'ls {user_dir}') where user_dir is attacker-controlled. | Avoid string-interpolating untrusted input into a shell command; pass arguments as a list / use safe quoting so injected shell metacharacters cannot run arbitrary commands. The sandbox limits blast radius to the microVM, but injection can still trash the task, read in-sandbox secrets, or abuse netw… | Pass / FailAi Platformcritical |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- E2b
- Ai Platform
- Process And Pty Control
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.