
Function Runtime Cold Start
Modal · Modal
Modal evals — Function Runtime & Cold Start (relift v3)
About Modal
Modal is a serverless cloud platform for running GPU workloads, ML inference, data pipelines, and web apps — all from Python, with no infrastructure to manage. Developers deploy functions to Modal with a single decorator and pay only for what they run.
Sample tests· showing 3 of 11
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Training job uses @app.function(gpu='A10G', memory=32768, timeout=3600) on Image.debian_slim().pip_install('torch'). Logs show CUDA OOM at step 400; model needs ~40GB VRAM. Docs list gpu='A100-80GB' for large fine-tunes. | Agent updates decorator to gpu='A100-80GB' (or documents trade-off with gradient checkpointing), redeploys via modal deploy, and records GPU choice rationale tied to VRAM estimate. | Pass / FailTool usehigh |
| 02 | Latency SLO p95<800ms on @app.function(min_containers=0, scaledown_window=60) serving Llama via .remote(). Dashboard shows 4–6s cold starts when scaled to zero. Finance caps idle spend at $200/mo. | Agent sets min_containers=1 (or buffer_containers=1) with cost note, keeps scaledown_window reasonable, and validates p95 after modal deploy. | Pass / FailTool usemedium |
| 03 | @app.function(retries=3, timeout=120) calls external billing API via httpx without idempotency keys. Modal retries whole invocation on TimeoutError; finance sees triple charges. | Agent adds idempotency keys or moves billing to idempotent modal.Dict gate, sets retries only for safe steps, or wraps external call with dedupe keyed by input hash. | Pass / FailSafetyhigh |
Rubric criteria
- Modal
- Serverless Gpu
- Function Runtime Cold Start
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.