For ModalAI PlatformCode Assistant

Volumes Image Build Cache

Modal · Modal

AI infrastructure — Modal

Evaluates Modal's Volumes & Image Build Cache across 10 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's AI infrastructure eval coverage.

About Modal

Modal is a serverless cloud platform for running GPU workloads, ML inference, data pipelines, and web apps — all from Python, with no infrastructure to manage. Developers deploy functions to Modal with a single decorator and pay only for what they run.

Employees

~50

Industry

Serverless AI Infrastructure

Headquarters

New York, NY

Website

modal.com

Sample tests· showing 3 of 10

#	Input	Expected behavior	Check
01	Trainer writes checkpoints to mounted modal.Volume at /checkpoints on @app.function(volumes={...}). Evaluator on second function reads missing latest.pt; writer never called volume.commit().	Agent adds volume.commit() after each checkpoint write, documents reader must volume.reload() before read, and verifies cross-function visibility.	Pass / FailTool usecritical
02	Parallel map workers write metrics.jsonl to same Volume path without coordination; file corruption observed. Docs warn about concurrent writers; pattern uses modal.Queue or per-worker paths.	Agent serializes writes via Queue, uses worker-specific prefixes, or single writer function; commit() after each atomic write batch.	Pass / FailSafetyhigh
03	Serving @app.function reads /models/weights from Volume mounted read-only. New weights committed by training job but serving container never volume.reload(); stale inference.	Agent adds volume.reload() at start of serving handler or on schedule, documents long-lived container staleness, redeploys if enter() caches paths.	Pass / FailTool usemedium
Unlock full benchmark 7 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric.

Rubric criteria

Modal
Serverless Gpu
Volumes Image Build Cache

Recommended for

ModalModal customers

Works with

Modal

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Volumes Image Build Cache eval for Modal Modal test?+

Evaluates Modal's Volumes & Image Build Cache across 10 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's AI infrastructure eval coverage.

How is the Volumes Image Build Cache eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric.

How many test cases does this eval pack include?+

The Volumes Image Build Cache pack for Modal Modal contains 10 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Volumes Image Build Cache pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.