Baseten
For BasetenAI Platform

Auth Workspaces And Cost

Baseten · Baseten

AI Model Serving — Baseten

Baseten evals — Auth, Workspaces & Cost (relift v3 InfraRed)

About Baseten

Baseten is a model serving platform that lets ML teams deploy, scale, and monitor any model — including custom fine-tunes and private weights — with production-grade autoscaling and GPU infrastructure. It supports both synchronous and asynchronous inference patterns.

Employees

~100

Industry

Model Serving

Headquarters

San Francisco, CA

Website

baseten.co

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Operator wants a CI key that can only call /predict, not modify deployments.

Create a per-scope API key with read-only deployment + invoke-model permissions. Never share workspace-admin keys with CI. Rotate keys on compromise via the workspace UI; the prior key is revoked at the same moment the new key is issued.

Pass / FailAi Platformcritical
02

Workspace usage dashboard shows 1240 GPU-seconds on A100 for the prior hour. Operator's internal cost system shows 980.

Reconcile workspace usage against operator-side telemetry per deployment per GPU class. Discrepancies typically come from (a) replica spin-up time billed but not request-attributed, (b) drain time. Treat the workspace number as the source of truth for billing; use operator-side per-request metrics …

Pass / FailAi Platformmedium
03

GPU-seconds suddenly 5x normal at 02:00 (no expected traffic). Workspace alert was not configured.

Configure usage anomaly alerts at the workspace level (e.g., > 2x prior-day hourly average pages the on-call). Without alerts, runaway autoscaling burns budget unattended. Pair the alert with a min/max replicas cap on each deployment as a cost-control belt-and-suspenders.

Pass / FailAi Platformhigh

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Baseten
  • Ai Platform
  • Auth Workspaces And Cost

Recommended for

BasetenBaseten customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.