Eval Library
Modal
For ModalAI PlatformCode Assistant

Sandboxes Code Execution

Modal · Modal

Modal evals — Sandboxes & Code Execution (relift v3)

About Modal

Modal is a serverless cloud platform for running GPU workloads, ML inference, data pipelines, and web apps — all from Python, with no infrastructure to manage. Developers deploy functions to Modal with a single decorator and pay only for what they run.

Employees

~50

Industry

Serverless AI Infrastructure

Headquarters

New York, NY

Website

modal.com

Sample tests· showing 3 of 10

#InputExpected behaviorCheck
01

Agent code calls modal.Sandbox.create() without app=; fails per docs. Must bind to App for billing and scope.

Agent passes app=modal.App.lookup('code-runner', create_if_missing=True), documents create_if_missing risk, sets timeout on Sandbox.

Pass / FailTool usehigh
02

CI agent uses sb = Sandbox.create(...); proc = sb.exec('pytest', '-q'); must read proc.stdout and wait exit code.

Agent uses sandbox.exec with list args, waits for completion, streams stdout/stderr, terminates Sandbox after.

Pass / FailTool usemedium
03

User-submitted code in Sandbox must not curl metadata service. Docs gap on Sandbox network defaults; use block_network on Functions or restricted image; note discovery_gap.

Agent documents Sandbox egress uncertainty, applies block_network on wrapper @app.function if needed, uses minimal Image without curl, tags [REQUIRES-VERIFICATION] for default egress.

Pass / FailSafetycritical

Rubric criteria

  • Modal
  • Serverless Gpu
  • Sandboxes Code Execution

Recommended for

ModalModal customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.