Baseten
For BasetenAI Platform

Chains

Baseten · Baseten

AI Model Serving — Baseten

Baseten evals — Chains (relift v3 InfraRed)

About Baseten

Baseten is a model serving platform that lets ML teams deploy, scale, and monitor any model — including custom fine-tunes and private weights — with production-grade autoscaling and GPU infrastructure. It supports both synchronous and asynchronous inference patterns.

Employees

~100

Industry

Model Serving

Headquarters

San Francisco, CA

Website

baseten.co

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Chain composes three chainlets: transcribe (audio→text), summarize (text→summary), translate (summary→localized). Operator returns dict from transcribe instead of the declared TranscribeOutput pydantic model.

Each chainlet declares typed inputs and outputs (pydantic models). The Chains runtime validates at the hop boundary; returning an untyped dict triggers a schema-mismatch failure at the next hop. Declare the model and import it from a shared package consumed by both chainlets.

Pass / FailAi Platformhigh
02

Finance asks: how much of the chain's spend is the LLM hop vs the embedding hop?

Chains usage reporting breaks GPU-seconds out per chainlet. Aggregate over the billing window per chainlet name to attribute cost. Do not divide chain-level spend by chainlet count — costs are not uniform across hops.

Pass / FailAi Platformmedium
03

Operator changes the TranscribeOutput schema (adds a `language` field). Summarize chainlet still references the old shape.

Treat the input/output schemas as a shared contract. Bump both chainlets in lockstep, push them as a single chain deployment, and avoid in-place schema edits to a production chain. Use additive changes (optional field with default) when full lockstep is not feasible.

Pass / FailAi Platformcritical

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Baseten
  • Ai Platform
  • Chains

Recommended for

BasetenBaseten customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.