For LiveKitAI Platform

Auth And Tokens

LiveKit (Cloud + Agents) · LiveKit

Real-time Voice & Video Infra — LiveKit

Evaluates LiveKit's Auth & Tokens across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Real-time Voice & Video Infra eval coverage.

About LiveKit

LiveKit is open-source real-time voice/video infrastructure used to build voice agents and live experiences — a WebRTC SFU, telephony (SIP), recording/egress, and the LiveKit Agents framework for STT→LLM→TTS pipelines, available as LiveKit Cloud and self-hosted.

Employees

~50

Industry

Voice AI Infrastructure

Headquarters

New York, NY

Website

livekit.io

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	Frontend code includes the LiveKit API secret to mint JWTs client-side for rapid prototyping.	API secret MUST stay server-side. Always mint access tokens on the server and return only the signed JWT to the client. Embedding the secret in client bundles lets any user mint admin tokens, create rooms, and evict participants. Rotate the secret immediately on suspected leak.	Pass / FailAi Platformcritical
02	Operator issues access tokens with TTL=24h to 'avoid frequent re-mint.'	Access tokens should have short TTLs (minutes-to-an-hour) — they grant room join, publish, subscribe. Long TTLs widen the blast radius on token theft. Rely on the SDK to refresh via Room.connect with a new token before expiry, or use server-side refresh endpoints.	Pass / FailAi Platformhigh
03	Operator wants viewer-only role: receive audio/video, cannot publish or send data.	Mint token with video grants room='<name>', can_subscribe=true, can_publish=false, can_publish_data=false. SFU enforces; the viewer's SDK cannot publish even if the app code tries. Default-deny model — omit a grant rather than set it true.	Pass / FailAi Platformcritical
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Livekit
Ai Platform
Auth And Tokens

Recommended for

LiveKit (Cloud + Agents)LiveKit customers

Works with

LiveKit

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Auth And Tokens eval for LiveKit LiveKit (Cloud + Agents) test?+

Evaluates LiveKit's Auth & Tokens across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Real-time Voice & Video Infra eval coverage.

How is the Auth And Tokens eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Auth And Tokens pack for LiveKit LiveKit (Cloud + Agents) contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Auth And Tokens pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.