For Vercel AI SDKCode AssistantAi Platform

Middleware Telemetry Safety

Vercel AI SDK (`ai` npm package) · Vercel AI SDK

AI SDK — Vercel AI SDK

Evaluates Vercel AI SDK's Middleware, Telemetry & Safety across 10 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's AI SDK eval coverage.

About Vercel AI SDK

Vercel AI SDK is the open-source TypeScript-first AI framework from Vercel — the `ai` npm package. It gives developers provider-agnostic primitives (generateText, streamText, generateObject, streamObject), tool calling with Zod-typed schemas, AI SDK UI hooks (useChat, useCompletion, useObject) for React/Vue/Svelte, and RSC streaming via streamUI — so the same chat or agent code runs against OpenAI, Anthropic, Google, and more.

Employees

~500

Industry

AI Framework / SDK

Headquarters

San Francisco, CA

Website

ai-sdk.dev

Sample tests· showing 3 of 10

#	Input	Expected behavior	Check
01	Operator wants every call to enforce maxTokens<=2000 regardless of caller. They mutate the params arg in-place inside transformParams.	Middleware transformParams({params}) must return a new params object (immutable transform). Returning undefined or mutating in place leads to inconsistent provider state across concurrent calls. Use `return { ...params, maxTokens: Math.min(params.maxTokens ?? 2000, 2000) }`.	Pass / FailAi Platformhigh
02	Cache middleware keys requests by JSON.stringify(messages); a small change in the system prompt does not invalidate cached responses.	Cache key must include every field the provider sees: model id, system, messages, tools (schemas), tool_choice, temperature, maxTokens, and any providerOptions that affect output (cache_control, response_format). Hash the post-transformParams payload, not the caller-supplied one.	Pass / FailAi Platformhigh
03	wrapStream middleware buffers all stream parts to log them, then emits the buffered parts at the end. UI shows nothing until completion.	wrapStream must forward parts as they arrive — implement as an async iterable that yields each upstream part after side effects. Buffering destroys the streaming UX. If you must log, log on-the-fly into the side channel and pass through unchanged.	Pass / FailAi Platformhigh
Unlock full benchmark 7 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Vercel Ai Sdk
Ai Platform
Middleware Telemetry Safety

Recommended for

Vercel AI SDK (`ai` npm package)Vercel AI SDK customers

Works with

Vercel AI SDK

Related evals

Code Assistant

Browserbase

Evaluates Browserbase's Captcha Handling across scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Browser infrastructure eval coverage.

View Code Assistant

Browserbase

Evaluates Browserbase's Concurrency & Rate Limits across scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Browser infrastructure eval coverage.

View Code Assistant

Browserbase

Evaluates Browserbase's Live Debugging & Session Inspector across scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Browser infrastructure eval coverage.

View

Frequently asked questions

What does the Middleware Telemetry Safety eval for Vercel AI SDK Vercel AI SDK (`ai` npm package) test?+

Evaluates Vercel AI SDK's Middleware, Telemetry & Safety across 10 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's AI SDK eval coverage.

How is the Middleware Telemetry Safety eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Middleware Telemetry Safety pack for Vercel AI SDK Vercel AI SDK (`ai` npm package) contains 10 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Middleware Telemetry Safety pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.