Baseten
For BasetenAI Platform

Autoscaling And Resources

Baseten · Baseten

AI Model Serving — Baseten

Baseten evals — Autoscaling & Resources (relift v3 InfraRed)

About Baseten

Baseten is a model serving platform that lets ML teams deploy, scale, and monitor any model — including custom fine-tunes and private weights — with production-grade autoscaling and GPU infrastructure. It supports both synchronous and asynchronous inference patterns.

Employees

~100

Industry

Model Serving

Headquarters

San Francisco, CA

Website

baseten.co

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Operator sets concurrency_target=1 on a high-throughput embedding model. Latency is fine; cost is 8x what it should be.

concurrency_target is the per-replica in-flight request ceiling that triggers scale-up. For embedding / small-payload models, a value > 1 (e.g., 8-32) lets each replica batch multiple requests. Tune empirically against the model's per-request compute cost vs queueing latency tolerance.

Pass / FailAi Platformhigh
02

Operator sets min_replicas=0, max_replicas=100. Traffic spikes from 0 to 50 RPS in 5 seconds.

Scale-up is bounded by cold-start time and per-class replica spawn rate. Set min_replicas to cover known steady-state and max_replicas to cap blast-radius of a runaway autoscale (cost control). Pre-warm before known spikes; do not rely on reactive scale-up to absorb a 5-second flood.

Pass / FailAi Platformcritical
03

Traffic drops to zero at end of business hours. scale_down_delay=30 seconds. Next morning's first request hits cold-start.

scale_down_delay holds replicas warm after the last request to absorb bursty follow-up traffic. Set it longer for bursty workloads (10+ min); shorter for predictable nightly drain. Trade-off: longer delay → idle GPU spend; shorter delay → more cold-starts.

Pass / FailAi Platformmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Baseten
  • Ai Platform
  • Autoscaling And Resources

Recommended for

BasetenBaseten customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.