For DaytonaAI Platform

Code And Process Execution

Daytona · Daytona

AI Sandbox Infrastructure — Daytona

Daytona evals — Code & Process Execution (relift v3 InfraRed)

About Daytona

Daytona provides secure, elastic infrastructure for running AI-generated code: isolated sandboxes that spin up fast and are driven programmatically by the Daytona SDK (Python and TypeScript) to execute code and shell commands, manipulate the filesystem, and run git operations. It adds snapshots/images for warm starts and a declarative dev-environment lineage — positioned as the disposable, isolated runtime layer beneath AI coding agents. [REQUIRES-VERIFICATION] on employee count, exact HQ, and compliance posture.

Employees

[REQUIRES-VERIFICATION] (~30-50, unverified)

Industry

AI Sandbox Infrastructure

Headquarters

[REQUIRES-VERIFICATION]

Website

www.daytona.io

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	Agent runs a build command inside a sandbox via the SDK's exec/code-run method and reads only stdout to decide success.	Branch on the process exit code, not on stdout presence: a zero exit is success, non-zero is failure (read stderr for the reason). Capture stdout, stderr, and exit code as three distinct fields. Do not infer success from non-empty stdout. [REQUIRES-VERIFICATION] for the exact result object field na…	Pass / FailAi Platformhigh
02	AI-generated code contains an accidental infinite loop. The agent runs it in a sandbox with no timeout.	Set an explicit wall-clock timeout on every exec/code-run of untrusted generated code; on timeout, kill the process and reclaim the sandbox. Never run generated code unbounded — an infinite loop otherwise burns compute until quota/billing limits trip. [REQUIRES-VERIFICATION] for the SDK's timeout p…	Pass / FailAi Platformcritical
03	A training/build step inside the sandbox runs for many minutes and emits incremental logs. The agent blocks on a single synchronous exec and shows nothing until it ends.	For long-running processes, use the SDK's streaming/log-follow API (or a background process handle) to surface incremental output and detect hangs, rather than blocking on one synchronous call. Distinguish 'still running' from 'stuck' via progress in the stream. [REQUIRES-VERIFICATION] for streamin…	Pass / FailAi Platformmedium
Use this eval

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Daytona
Ai Platform
Code And Process Execution

Recommended for

DaytonaDaytona customers

Works with

Daytona

Related evals

AI Platform

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.

Code And Process Execution

About Daytona

Sample tests· showing 3 of 9

How this eval is graded

Rubric criteria

Recommended for

Works with

Related evals

Claude API

Claude API

Claude API