Filesystem Operations
E2B · E2B
Secure Cloud Sandboxes for AI Agents — E2B
E2B evals — Filesystem Operations (relift v3 InfraRed)
About E2B
E2B provides secure cloud sandboxes for AI agents and AI-generated code. Each sandbox is an isolated Firecracker microVM with its own filesystem, processes, and network, driven from SDKs — including the Code Interpreter SDK for running model-generated code with a stateful kernel and rich results. The core sandbox infrastructure is open source and self-hostable. [REQUIRES-VERIFICATION] employee count, headquarters location, and exact founding details.
Employees
[REQUIRES-VERIFICATION]
Industry
AI Infrastructure / Code Sandboxes
Headquarters
San Francisco, CA [REQUIRES-VERIFICATION]
Website
e2b.devSample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Agent uses files.write('/home/user/data.csv', content) and later files.read('/home/user/data.csv') to round-trip data through the sandbox filesystem. | Use absolute paths under the sandbox's working/home directory and confirm writes before depending on them downstream. Treat the sandbox filesystem as ephemeral: it lives and dies with the sandbox, so persist anything durable to the operator's own storage. Match read mode (text vs bytes) to how the … | Pass / FailAi Platformhigh |
| 02 | Operator uploads a 200MB dataset from the host into the sandbox before running analysis code on it. | Stream large uploads rather than buffering the whole file in memory, and verify the upload landed (size/checksum) before the analysis step depends on it. Account for the sandbox's disk capacity. [REQUIRES-VERIFICATION] for the current per-sandbox disk size and any upload size limits. | Pass / FailAi Platformmedium |
| 03 | Generated code writes a report.pdf inside the sandbox. The agent must download it to durable storage before the sandbox is killed. | Download generated artifacts out of the sandbox to durable storage BEFORE teardown, because kill destroys the filesystem. Verify the downloaded bytes (size/hash) match what was written. Sequence the download ahead of any kill in the same finally/cleanup path so an early exception does not lose the … | Pass / FailAi Platformhigh |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- E2b
- Ai Platform
- Filesystem Operations
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.