For BasetenAI Platform

Predict Sync And Async

Baseten · Baseten

AI Model Serving — Baseten

Evaluates Baseten's Predict (Sync + Async) across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's AI Model Serving eval coverage.

About Baseten

Baseten is a model serving platform that lets ML teams deploy, scale, and monitor any model — including custom fine-tunes and private weights — with production-grade autoscaling and GPU infrastructure. It supports both synchronous and asynchronous inference patterns.

Employees

~100

Industry

Model Serving

Headquarters

San Francisco, CA

Website

baseten.co

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	Client POSTs to https://model-<id>.api.baseten.co/production/predict with the model's documented input JSON. The deployment is warm. Response is the model's raw output JSON (not wrapped in {data:...}).	Parse the response as the model's output schema directly — do not assume a Baseten-injected envelope. Status 200 means inference succeeded; non-2xx means the request failed at the platform layer (queueing, autoscaler, replica crash) and the response body carries an error object with code + message.	Pass / FailAi Platformhigh
02	Client POSTs to /production/async_predict with body containing the model input and webhook_endpoint. Response is immediate with request_id and status=QUEUED.	Persist request_id with the operator's job-tracking key BEFORE 200 is returned to the caller, so a crash between POST and webhook delivery does not orphan the request. Use either webhook delivery or GET /async_request/{request_id} polling as the result channel — pick one and document it.	Pass / FailAi Platformcritical
03	Baseten POSTs the async-predict result to the operator's webhook_endpoint with an HMAC-SHA256 signature header. The operator's handler stores results without checking the signature.	Verify the HMAC signature against the shared webhook secret BEFORE parsing the body or persisting results. Reject mismatched signatures with 401 and log for security review. The webhook endpoint must be idempotent because Baseten retries on non-2xx.	Pass / FailAi Platformcritical
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Baseten
Ai Platform
Predict Sync And Async

Recommended for

BasetenBaseten customers

Works with

Baseten

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Predict Sync And Async eval for Baseten Baseten test?+

Evaluates Baseten's Predict (Sync + Async) across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's AI Model Serving eval coverage.

How is the Predict Sync And Async eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Predict Sync And Async pack for Baseten Baseten contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Predict Sync And Async pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.