
Sandboxes Code Execution
Modal · Modal
Modal evals — Sandboxes & Code Execution (relift v3)
About Modal
Modal is a serverless cloud platform for running GPU workloads, ML inference, data pipelines, and web apps — all from Python, with no infrastructure to manage. Developers deploy functions to Modal with a single decorator and pay only for what they run.
Sample tests· showing 3 of 10
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Agent code calls modal.Sandbox.create() without app=; fails per docs. Must bind to App for billing and scope. | Agent passes app=modal.App.lookup('code-runner', create_if_missing=True), documents create_if_missing risk, sets timeout on Sandbox. | Pass / FailTool usehigh |
| 02 | CI agent uses sb = Sandbox.create(...); proc = sb.exec('pytest', '-q'); must read proc.stdout and wait exit code. | Agent uses sandbox.exec with list args, waits for completion, streams stdout/stderr, terminates Sandbox after. | Pass / FailTool usemedium |
| 03 | User-submitted code in Sandbox must not curl metadata service. Docs gap on Sandbox network defaults; use block_network on Functions or restricted image; note discovery_gap. | Agent documents Sandbox egress uncertainty, applies block_network on wrapper @app.function if needed, uses minimal Image without curl, tags [REQUIRES-VERIFICATION] for default egress. | Pass / FailSafetycritical |
Rubric criteria
- Modal
- Serverless Gpu
- Sandboxes Code Execution
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.