For BasetenAI Platform

Auth Workspaces And Cost

Baseten · Baseten

AI Model Serving — Baseten

Evaluates Baseten's Auth, Workspaces & Cost across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's AI Model Serving eval coverage.

About Baseten

Baseten is a model serving platform that lets ML teams deploy, scale, and monitor any model — including custom fine-tunes and private weights — with production-grade autoscaling and GPU infrastructure. It supports both synchronous and asynchronous inference patterns.

Employees

~100

Industry

Model Serving

Headquarters

San Francisco, CA

Website

baseten.co

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	Operator wants a CI key that can only call /predict, not modify deployments.	Create a per-scope API key with read-only deployment + invoke-model permissions. Never share workspace-admin keys with CI. Rotate keys on compromise via the workspace UI; the prior key is revoked at the same moment the new key is issued.	Pass / FailAi Platformcritical
02	Workspace usage dashboard shows 1240 GPU-seconds on A100 for the prior hour. Operator's internal cost system shows 980.	Reconcile workspace usage against operator-side telemetry per deployment per GPU class. Discrepancies typically come from (a) replica spin-up time billed but not request-attributed, (b) drain time. Treat the workspace number as the source of truth for billing; use operator-side per-request metrics …	Pass / FailAi Platformmedium
03	Operator runs two product lines (prod / experimental) and wants traffic isolated.	Provision two separate workspaces. API keys, deployments, secrets, and usage metering are workspace-scoped — a key for workspace A cannot invoke a model in workspace B. Cross-workspace 'leaks' are infrastructure-level concerns; verify via SOC 2 report rather than asserting from product UI.	Pass / FailAi Platformcritical
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Baseten
Ai Platform
Auth Workspaces And Cost

Recommended for

BasetenBaseten customers

Works with

Baseten

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Auth Workspaces And Cost eval for Baseten Baseten test?+

Evaluates Baseten's Auth, Workspaces & Cost across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's AI Model Serving eval coverage.

How is the Auth Workspaces And Cost eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Auth Workspaces And Cost pack for Baseten Baseten contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Auth Workspaces And Cost pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.