Eval Library
Vercel AI SDK
For Vercel AI SDKCode AssistantAi Platform

Tool Calling

Vercel AI SDK (`ai` npm package) · Vercel AI SDK

AI SDK — Vercel AI SDK

Vercel AI SDK evals — Tool Calling (relift v3 InfraRed)

About Vercel AI SDK

Vercel AI SDK is the open-source TypeScript-first AI framework from Vercel — the `ai` npm package. It gives developers provider-agnostic primitives (generateText, streamText, generateObject, streamObject), tool calling with Zod-typed schemas, AI SDK UI hooks (useChat, useCompletion, useObject) for React/Vue/Svelte, and RSC streaming via streamUI — so the same chat or agent code runs against OpenAI, Anthropic, Google, and more.

Employees

~500

Industry

AI Framework / SDK

Headquarters

San Francisco, CA

Website

ai-sdk.dev

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Operator defines a tool with `parameters: { city: 'string' }` (plain object) instead of `parameters: z.object({ city: z.string() })`.

tools[].parameters must be a Zod schema (or jsonSchema()-wrapped JSON Schema) — the SDK converts it to a provider tool-spec at call time. Plain objects are rejected with InvalidArgumentError. Use z.object({...}).describe('...') to give each field a description the model can see.

Pass / FailAi Platformcritical
02

Operator wants to stop the agentic loop after the `submit_invoice` tool runs successfully, regardless of remaining maxSteps budget.

Pass `stopWhen: hasToolCall('submit_invoice')` (or a custom predicate over the steps array). stopWhen short-circuits the loop the step after the predicate matches. Compose with maxSteps as a safety cap.

Pass / FailAi Platformhigh
03

Tool's execute is `(args) => fetch(...)` returning a Response object directly (not awaited, not serialized).

execute({input, toolCallId, messages}) must be async and return a JSON-serializable value — that value becomes the toolResult passed back to the model. Await any IO inside execute; resolve to a plain object (or string). Throwing inside execute surfaces as a tool-error part on the fullStream.

Pass / FailAi Platformcritical

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Vercel Ai Sdk
  • Ai Platform
  • Tool Calling

Recommended for

Vercel AI SDK (`ai` npm package)Vercel AI SDK customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.