Baseten
For BasetenAI Platform

Deployments And Environments

Baseten · Baseten

AI Model Serving — Baseten

Baseten evals — Deployments & Environments (relift v3 InfraRed)

About Baseten

Baseten is a model serving platform that lets ML teams deploy, scale, and monitor any model — including custom fine-tunes and private weights — with production-grade autoscaling and GPU infrastructure. It supports both synchronous and asynchronous inference patterns.

Employees

~100

Industry

Model Serving

Headquarters

San Francisco, CA

Website

baseten.co

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Operator validates v2 of a model on /development/predict and clicks 'Promote to Production' in the workspace UI.

Promotion atomically points the production URL at the dev-tested image. In-flight production requests against the old image complete on their replicas; new requests are routed to the new image as replicas come up. Treat promotion as a release event and tag it in the operator's deploy log.

Pass / FailAi Platformhigh
02

v2 fails in production with elevated 5xx. Operator clicks 'Rollback' to v1 in the deployments UI.

Rollback re-routes production URL traffic to the prior image. v2 replicas drain their in-flight requests and terminate. Confirm rollback by observing replica counts and error rate restore. The rollback action does NOT delete v2 — keep it for forensics.

Pass / FailAi Platformcritical
03

Two engineers push to /development/predict at the same time. Each clobbers the other's dev image.

Development is a single shared environment per model — concurrent pushes race. Either coordinate via the operator's own task queue, create per-engineer model names for parallel iteration, or use a feature-branch model. Do not assume dev preserves multiple in-flight versions.

Pass / FailAi Platformmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Baseten
  • Ai Platform
  • Deployments And Environments

Recommended for

BasetenBaseten customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.