For Vercel AI SDKCode AssistantAi Platform

Tool Calling

Vercel AI SDK (`ai` npm package) · Vercel AI SDK

AI SDK — Vercel AI SDK

Evaluates Vercel AI SDK's Tool Calling across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's AI SDK eval coverage.

About Vercel AI SDK

Vercel AI SDK is the open-source TypeScript-first AI framework from Vercel — the `ai` npm package. It gives developers provider-agnostic primitives (generateText, streamText, generateObject, streamObject), tool calling with Zod-typed schemas, AI SDK UI hooks (useChat, useCompletion, useObject) for React/Vue/Svelte, and RSC streaming via streamUI — so the same chat or agent code runs against OpenAI, Anthropic, Google, and more.

Employees

~500

Industry

AI Framework / SDK

Headquarters

San Francisco, CA

Website

ai-sdk.dev

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	Operator defines a tool with `parameters: { city: 'string' }` (plain object) instead of `parameters: z.object({ city: z.string() })`.	tools[].parameters must be a Zod schema (or jsonSchema()-wrapped JSON Schema) — the SDK converts it to a provider tool-spec at call time. Plain objects are rejected with InvalidArgumentError. Use z.object({...}).describe('...') to give each field a description the model can see.	Pass / FailAi Platformcritical
02	Tool's execute is `(args) => fetch(...)` returning a Response object directly (not awaited, not serialized).	execute({input, toolCallId, messages}) must be async and return a JSON-serializable value — that value becomes the toolResult passed back to the model. Await any IO inside execute; resolve to a plain object (or string). Throwing inside execute surfaces as a tool-error part on the fullStream.	Pass / FailAi Platformcritical
03	generateText with two tools and no maxSteps; the model keeps calling tools in a loop, exhausting the operator's token budget.	Default maxSteps is 1 — generateText returns after a single step (which may include tool calls but does not auto-resume). Set maxSteps to a sane cap (e.g., 5) for agentic loops. For finer control, use stopWhen with a custom predicate. Always cap; never set maxSteps=Infinity in production.	Pass / FailAi Platformcritical
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Vercel Ai Sdk
Ai Platform
Tool Calling

Recommended for

Vercel AI SDK (`ai` npm package)Vercel AI SDK customers

Works with

Vercel AI SDK

Related evals

Code Assistant

Browserbase

Evaluates Browserbase's Captcha Handling across scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Browser infrastructure eval coverage.

View Code Assistant

Browserbase

Evaluates Browserbase's Concurrency & Rate Limits across scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Browser infrastructure eval coverage.

View Code Assistant

Browserbase

Evaluates Browserbase's Live Debugging & Session Inspector across scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Browser infrastructure eval coverage.

View

Frequently asked questions

What does the Tool Calling eval for Vercel AI SDK Vercel AI SDK (`ai` npm package) test?+

Evaluates Vercel AI SDK's Tool Calling across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's AI SDK eval coverage.

How is the Tool Calling eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Tool Calling pack for Vercel AI SDK Vercel AI SDK (`ai` npm package) contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Tool Calling pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.