E2
For E2BAI Platform

Filesystem Operations

E2B · E2B

Secure Cloud Sandboxes for AI Agents — E2B

E2B evals — Filesystem Operations (relift v3 InfraRed)

About E2B

E2B provides secure cloud sandboxes for AI agents and AI-generated code. Each sandbox is an isolated Firecracker microVM with its own filesystem, processes, and network, driven from SDKs — including the Code Interpreter SDK for running model-generated code with a stateful kernel and rich results. The core sandbox infrastructure is open source and self-hostable. [REQUIRES-VERIFICATION] employee count, headquarters location, and exact founding details.

Employees

[REQUIRES-VERIFICATION]

Industry

AI Infrastructure / Code Sandboxes

Headquarters

San Francisco, CA [REQUIRES-VERIFICATION]

Website

e2b.dev

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Agent uses files.write('/home/user/data.csv', content) and later files.read('/home/user/data.csv') to round-trip data through the sandbox filesystem.

Use absolute paths under the sandbox's working/home directory and confirm writes before depending on them downstream. Treat the sandbox filesystem as ephemeral: it lives and dies with the sandbox, so persist anything durable to the operator's own storage. Match read mode (text vs bytes) to how the …

Pass / FailAi Platformhigh
02

Operator uploads a 200MB dataset from the host into the sandbox before running analysis code on it.

Stream large uploads rather than buffering the whole file in memory, and verify the upload landed (size/checksum) before the analysis step depends on it. Account for the sandbox's disk capacity. [REQUIRES-VERIFICATION] for the current per-sandbox disk size and any upload size limits.

Pass / FailAi Platformmedium
03

Generated code writes a report.pdf inside the sandbox. The agent must download it to durable storage before the sandbox is killed.

Download generated artifacts out of the sandbox to durable storage BEFORE teardown, because kill destroys the filesystem. Verify the downloaded bytes (size/hash) match what was written. Sequence the download ahead of any kill in the same finally/cleanup path so an early exception does not lose the …

Pass / FailAi Platformhigh

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • E2b
  • Ai Platform
  • Filesystem Operations

Recommended for

E2BE2B customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.