For BasetenAI Platform

Autoscaling And Resources

Baseten · Baseten

AI Model Serving — Baseten

Evaluates Baseten's Autoscaling & Resources across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's AI Model Serving eval coverage.

About Baseten

Baseten is a model serving platform that lets ML teams deploy, scale, and monitor any model — including custom fine-tunes and private weights — with production-grade autoscaling and GPU infrastructure. It supports both synchronous and asynchronous inference patterns.

Employees

~100

Industry

Model Serving

Headquarters

San Francisco, CA

Website

baseten.co

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	Operator sets concurrency_target=1 on a high-throughput embedding model. Latency is fine; cost is 8x what it should be.	concurrency_target is the per-replica in-flight request ceiling that triggers scale-up. For embedding / small-payload models, a value > 1 (e.g., 8-32) lets each replica batch multiple requests. Tune empirically against the model's per-request compute cost vs queueing latency tolerance.	Pass / FailAi Platformhigh
02	Operator sets min_replicas=0, max_replicas=100. Traffic spikes from 0 to 50 RPS in 5 seconds.	Scale-up is bounded by cold-start time and per-class replica spawn rate. Set min_replicas to cover known steady-state and max_replicas to cap blast-radius of a runaway autoscale (cost control). Pre-warm before known spikes; do not rely on reactive scale-up to absorb a 5-second flood.	Pass / FailAi Platformcritical
03	Traffic drops to zero at end of business hours. scale_down_delay=30 seconds. Next morning's first request hits cold-start.	scale_down_delay holds replicas warm after the last request to absorb bursty follow-up traffic. Set it longer for bursty workloads (10+ min); shorter for predictable nightly drain. Trade-off: longer delay → idle GPU spend; shorter delay → more cold-starts.	Pass / FailAi Platformmedium
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Baseten
Ai Platform
Autoscaling And Resources

Recommended for

BasetenBaseten customers

Works with

Baseten

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Autoscaling And Resources eval for Baseten Baseten test?+

Evaluates Baseten's Autoscaling & Resources across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's AI Model Serving eval coverage.

How is the Autoscaling And Resources eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Autoscaling And Resources pack for Baseten Baseten contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Autoscaling And Resources pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.