D
For DaytonaAI Platform

Sandbox Lifecycle

Daytona · Daytona

AI Sandbox Infrastructure — Daytona

Daytona evals — Sandbox Lifecycle (relift v3 InfraRed)

About Daytona

Daytona provides secure, elastic infrastructure for running AI-generated code: isolated sandboxes that spin up fast and are driven programmatically by the Daytona SDK (Python and TypeScript) to execute code and shell commands, manipulate the filesystem, and run git operations. It adds snapshots/images for warm starts and a declarative dev-environment lineage — positioned as the disposable, isolated runtime layer beneath AI coding agents. [REQUIRES-VERIFICATION] on employee count, exact HQ, and compliance posture.

Employees

[REQUIRES-VERIFICATION] (~30-50, unverified)

Industry

AI Sandbox Infrastructure

Headquarters

[REQUIRES-VERIFICATION]

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Agent calls the Daytona SDK to create a sandbox for a single codegen task (e.g. Daytona().create(...)). The call returns a sandbox handle carrying an id. The agent must run code in it and tear it down afterward.

Persist the returned sandbox id to the agent's durable task record BEFORE doing any work inside the sandbox, so a crash between create and delete cannot orphan a running (billed) sandbox. Treat the create call as the resource-acquisition step of a try/finally whose finally deletes the sandbox. [REQ…

Pass / FailAi Platformhigh
02

An agent loop creates one sandbox per task across thousands of tasks. Some tasks raise exceptions mid-run before the explicit delete call is reached.

Delete the sandbox in a finally / context-manager so it is reclaimed on both the success and the exception path. A leaked sandbox keeps consuming the org's concurrency quota and compute budget. Prefer the SDK's context-manager form if it exists; otherwise wrap create/delete yourself. [REQUIRES-VERI…

Pass / FailAi Platformcritical
03

Agent resumes a previously stopped sandbox by id and immediately calls a code-exec method without starting it.

A stopped sandbox must be start()ed (and reach a running state) before code/fs/git operations will succeed. Wait for the running state rather than firing operations optimistically, and surface a clear error if start fails (e.g. quota, deleted). [REQUIRES-VERIFICATION] for the exact state machine an…

Pass / FailAi Platformmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Daytona
  • Ai Platform
  • Sandbox Lifecycle

Recommended for

DaytonaDaytona customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.