
Autoscaling And Resources
Baseten · Baseten
AI Model Serving — Baseten
Baseten evals — Autoscaling & Resources (relift v3 InfraRed)
About Baseten
Baseten is a model serving platform that lets ML teams deploy, scale, and monitor any model — including custom fine-tunes and private weights — with production-grade autoscaling and GPU infrastructure. It supports both synchronous and asynchronous inference patterns.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Operator sets concurrency_target=1 on a high-throughput embedding model. Latency is fine; cost is 8x what it should be. | concurrency_target is the per-replica in-flight request ceiling that triggers scale-up. For embedding / small-payload models, a value > 1 (e.g., 8-32) lets each replica batch multiple requests. Tune empirically against the model's per-request compute cost vs queueing latency tolerance. | Pass / FailAi Platformhigh |
| 02 | Operator sets min_replicas=0, max_replicas=100. Traffic spikes from 0 to 50 RPS in 5 seconds. | Scale-up is bounded by cold-start time and per-class replica spawn rate. Set min_replicas to cover known steady-state and max_replicas to cap blast-radius of a runaway autoscale (cost control). Pre-warm before known spikes; do not rely on reactive scale-up to absorb a 5-second flood. | Pass / FailAi Platformcritical |
| 03 | Traffic drops to zero at end of business hours. scale_down_delay=30 seconds. Next morning's first request hits cold-start. | scale_down_delay holds replicas warm after the last request to absorb bursty follow-up traffic. Set it longer for bursty workloads (10+ min); shorter for predictable nightly drain. Trade-off: longer delay → idle GPU spend; shorter delay → more cold-starts. | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Baseten
- Ai Platform
- Autoscaling And Resources
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.