For Vercel AI SDKCode AssistantAi Platform

Structured Outputs

Vercel AI SDK (`ai` npm package) · Vercel AI SDK

AI SDK — Vercel AI SDK

Evaluates Vercel AI SDK's Structured Outputs (generateObject / streamObject) across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's AI SDK eval coverage.

About Vercel AI SDK

Vercel AI SDK is the open-source TypeScript-first AI framework from Vercel — the `ai` npm package. It gives developers provider-agnostic primitives (generateText, streamText, generateObject, streamObject), tool calling with Zod-typed schemas, AI SDK UI hooks (useChat, useCompletion, useObject) for React/Vue/Svelte, and RSC streaming via streamUI — so the same chat or agent code runs against OpenAI, Anthropic, Google, and more.

Employees

~500

Industry

AI Framework / SDK

Headquarters

San Francisco, CA

Website

ai-sdk.dev

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	Operator calls generateObject with mode:'auto' against a provider that does not support JSON mode; output silently degrades to free-form text wrapped in {}.	mode:'auto' picks the best-supported strategy per provider (JSON mode where available, tool-call otherwise). For deterministic enforcement, pin mode:'json' (requires provider JSON-mode support) or mode:'tool' (uses a hidden tool whose parameters are the schema). Verify the chosen mode in provider d…	Pass / FailAi Platformhigh
02	generateObject with a Zod schema produces wrong-shape output more than 10% of the time. Operator pinned mode:'tool' but omitted schemaName and schemaDescription.	Set `schemaName: 'invoice'` and `schemaDescription: 'Structured invoice extracted from the user message'`. These names propagate to the underlying tool/JSON-mode call so the model has a concrete label — without them the SDK uses generic defaults that hurt accuracy.	Pass / FailAi Platformmedium
03	Operator needs an array of invoice line items as the root. Uses schema = z.array(LineItem) with default output.	Default output is 'object' which requires the root to be a Zod object. For array roots, pass `output:'array'` AND `schema: LineItem` (the element schema, not the array). The SDK wraps it and returns result.object as the array. For enums use output:'enum' with `enum:['a','b','c']`.	Pass / FailAi Platformmedium
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Vercel Ai Sdk
Ai Platform
Structured Outputs

Recommended for

Vercel AI SDK (`ai` npm package)Vercel AI SDK customers

Works with

Vercel AI SDK

Related evals

Code Assistant

Browserbase

Evaluates Browserbase's Captcha Handling across scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Browser infrastructure eval coverage.

View Code Assistant

Browserbase

Evaluates Browserbase's Concurrency & Rate Limits across scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Browser infrastructure eval coverage.

View Code Assistant

Browserbase

Evaluates Browserbase's Live Debugging & Session Inspector across scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Browser infrastructure eval coverage.

View

Frequently asked questions

What does the Structured Outputs eval for Vercel AI SDK Vercel AI SDK (`ai` npm package) test?+

How is the Structured Outputs eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Structured Outputs pack for Vercel AI SDK Vercel AI SDK (`ai` npm package) contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Structured Outputs pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.