For ModalAI PlatformCode Assistant

Function Runtime Cold Start

Q: How is the Function Runtime Cold Start eval scored?

The judge rubric: Grade against expected.ideal_behavior and expected.rubric.

Q: How many test cases does this eval pack include?

The Function Runtime Cold Start pack for Modal Modal contains 11 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

Q: How do I run this eval?

Sign up for Corsac, connect your model or agent endpoint, and run the Function Runtime Cold Start pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Modal · Modal

AI infrastructure — Modal

Evaluates Modal's Function Runtime & Cold Start across 11 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's AI infrastructure eval coverage.

About Modal

Modal is a serverless cloud platform for running GPU workloads, ML inference, data pipelines, and web apps — all from Python, with no infrastructure to manage. Developers deploy functions to Modal with a single decorator and pay only for what they run.

Employees

~50

Industry

Serverless AI Infrastructure

Headquarters

New York, NY

Website

modal.com

Sample tests· showing 3 of 11

#	Input	Expected behavior	Check
01	Training job uses @app.function(gpu='A10G', memory=32768, timeout=3600) on Image.debian_slim().pip_install('torch'). Logs show CUDA OOM at step 400; model needs ~40GB VRAM. Docs list gpu='A100-80GB' for large fine-tunes.	Agent updates decorator to gpu='A100-80GB' (or documents trade-off with gradient checkpointing), redeploys via modal deploy, and records GPU choice rationale tied to VRAM estimate.	Pass / FailTool usehigh
02	Latency SLO p95<800ms on @app.function(min_containers=0, scaledown_window=60) serving Llama via .remote(). Dashboard shows 4–6s cold starts when scaled to zero. Finance caps idle spend at $200/mo.	Agent sets min_containers=1 (or buffer_containers=1) with cost note, keeps scaledown_window reasonable, and validates p95 after modal deploy.	Pass / FailTool usemedium
03	Heavy @app.cls loads 12GB weights in @modal.enter(). Decorator lacks enable_memory_snapshot=True; each new container repeats full load. Docs recommend snapshot for large enter() work.	Agent adds enable_memory_snapshot=True on @app.cls, verifies snapshot restore in logs after deploy, and documents snapshot invalidation when Image or enter() code changes.	Pass / FailTool usemedium
Unlock full benchmark 8 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric.

Rubric criteria

Modal
Serverless Gpu
Function Runtime Cold Start

Recommended for

ModalModal customers

Works with

Modal

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Function Runtime Cold Start eval for Modal Modal test?+

Evaluates Modal's Function Runtime & Cold Start across 11 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's AI infrastructure eval coverage.

How is the Function Runtime Cold Start eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric.

How many test cases does this eval pack include?+

The Function Runtime Cold Start pack for Modal Modal contains 11 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Function Runtime Cold Start pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.