
Auth Workspaces And Cost
Baseten · Baseten
AI Model Serving — Baseten
Baseten evals — Auth, Workspaces & Cost (relift v3 InfraRed)
About Baseten
Baseten is a model serving platform that lets ML teams deploy, scale, and monitor any model — including custom fine-tunes and private weights — with production-grade autoscaling and GPU infrastructure. It supports both synchronous and asynchronous inference patterns.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Operator wants a CI key that can only call /predict, not modify deployments. | Create a per-scope API key with read-only deployment + invoke-model permissions. Never share workspace-admin keys with CI. Rotate keys on compromise via the workspace UI; the prior key is revoked at the same moment the new key is issued. | Pass / FailAi Platformcritical |
| 02 | Workspace usage dashboard shows 1240 GPU-seconds on A100 for the prior hour. Operator's internal cost system shows 980. | Reconcile workspace usage against operator-side telemetry per deployment per GPU class. Discrepancies typically come from (a) replica spin-up time billed but not request-attributed, (b) drain time. Treat the workspace number as the source of truth for billing; use operator-side per-request metrics … | Pass / FailAi Platformmedium |
| 03 | GPU-seconds suddenly 5x normal at 02:00 (no expected traffic). Workspace alert was not configured. | Configure usage anomaly alerts at the workspace level (e.g., > 2x prior-day hourly average pages the on-call). Without alerts, runaway autoscaling burns budget unattended. Pair the alert with a min/max replicas cap on each deployment as a cost-control belt-and-suspenders. | Pass / FailAi Platformhigh |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Baseten
- Ai Platform
- Auth Workspaces And Cost
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.