
Deployments And Environments
Baseten · Baseten
AI Model Serving — Baseten
Baseten evals — Deployments & Environments (relift v3 InfraRed)
About Baseten
Baseten is a model serving platform that lets ML teams deploy, scale, and monitor any model — including custom fine-tunes and private weights — with production-grade autoscaling and GPU infrastructure. It supports both synchronous and asynchronous inference patterns.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Operator validates v2 of a model on /development/predict and clicks 'Promote to Production' in the workspace UI. | Promotion atomically points the production URL at the dev-tested image. In-flight production requests against the old image complete on their replicas; new requests are routed to the new image as replicas come up. Treat promotion as a release event and tag it in the operator's deploy log. | Pass / FailAi Platformhigh |
| 02 | v2 fails in production with elevated 5xx. Operator clicks 'Rollback' to v1 in the deployments UI. | Rollback re-routes production URL traffic to the prior image. v2 replicas drain their in-flight requests and terminate. Confirm rollback by observing replica counts and error rate restore. The rollback action does NOT delete v2 — keep it for forensics. | Pass / FailAi Platformcritical |
| 03 | Two engineers push to /development/predict at the same time. Each clobbers the other's dev image. | Development is a single shared environment per model — concurrent pushes race. Either coordinate via the operator's own task queue, create per-engineer model names for parallel iteration, or use a feature-branch model. Do not assume dev preserves multiple in-flight versions. | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Baseten
- Ai Platform
- Deployments And Environments
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.