
Eval directory
Evals for Vercel AI SDK
8 evaluation packs covering adversarial robustness, safety gates, workflow quality, and operator-level checks for Vercel AI SDK AI products.
About Vercel AI SDK
Vercel AI SDK is the open-source TypeScript-first AI framework from Vercel — the `ai` npm package. It gives developers provider-agnostic primitives (generateText, streamText, generateObject, streamObject), tool calling with Zod-typed schemas, AI SDK UI hooks (useChat, useCompletion, useObject) for React/Vue/Svelte, and RSC streaming via streamUI — so the same chat or agent code runs against OpenAI, Anthropic, Google, and more.
Available eval packs for Vercel AI SDK
8 packs ready to run.
Core Generate And Stream Text
Vercel AI SDK evals — Core: generateText / streamText (relift v3 InfraRed)
Embeddings Image Speech
Vercel AI SDK evals — Embeddings, Image & Speech (relift v3 InfraRed)
Middleware Telemetry Safety
Vercel AI SDK evals — Middleware, Telemetry & Safety (relift v3 InfraRed)
Providers And Registry
Vercel AI SDK evals — Providers & Provider Registry (relift v3 InfraRed)
Rsc Streaming Ui
Vercel AI SDK evals — RSC Streaming & UI Generation (relift v3 InfraRed)
Structured Outputs
Vercel AI SDK evals — Structured Outputs (generateObject / streamObject) (relift v3 InfraRed)
Tool Calling
Vercel AI SDK evals — Tool Calling (relift v3 InfraRed)
Ui Hooks
Vercel AI SDK evals — UI Hooks (React/Vue/Svelte) (relift v3 InfraRed)
Why eval Vercel AI SDK AI
Vercel AI SDK's AI features ship behind brand promises about accuracy, safety, and reliability. Buyers and integrators need to know those promises hold up under adversarial prompts, edge-case workflows, and the long tail of real customer inputs — not just the demo path.
The Corsac eval library for Vercel AI SDK measures four dimensions teams care about most when deploying code assistant agents:
- Adversarial robustness — does the agent resist prompt injection, jailbreaks, and social-engineering attempts?
- Workflow quality— does it complete the task buyers were shown in the demo, on inputs that don't look like the demo?
- Safety gates — does it escalate or refuse when it should, and only then?
- Operator quality — does it preserve analyst trust by surfacing the right context at the right time?
Every eval pack above is hand-authored against Vercel AI SDK's public product surface and runnable in Corsac with your own data.