For LiveKitAI Platform

Sfu And Media Transport

LiveKit (Cloud + Agents) · LiveKit

Real-time Voice & Video Infra — LiveKit

Evaluates LiveKit's SFU & Media Transport across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Real-time Voice & Video Infra eval coverage.

About LiveKit

LiveKit is open-source real-time voice/video infrastructure used to build voice agents and live experiences — a WebRTC SFU, telephony (SIP), recording/egress, and the LiveKit Agents framework for STT→LLM→TTS pipelines, available as LiveKit Cloud and self-hosted.

Employees

~50

Industry

Voice AI Infrastructure

Headquarters

New York, NY

Website

livekit.io

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	Participant on a corporate network with strict UDP egress fails to establish direct UDP to the LiveKit SFU. WebRTC stays in 'checking' for 30 seconds.	LiveKit Cloud advertises TURN/TCP and TURN/TLS fallback in ICE candidates. Client should iterate candidates and complete connectivity check via TURN/TLS 443. Operator should verify client SDK is not configured to filter out relay candidates. Self-hosted deployments must run their own TURN server.	Pass / FailAi Platformcritical
02	Operator wants AV1 video where supported, fall back to H.264. Publisher SDK negotiates codecs per peer connection.	Set publication video codec preference (AV1 first, then H.264). SFU forwards only what subscribers can decode; subscribers without AV1 receive H.264. Verify both sides report the same codec via getStats. Do not force AV1 with no fallback — Safari clients will fail.	Pass / FailAi Platformhigh
03	EU users join a room hosted on LiveKit Cloud with project region=us-east. Voice latency is high.	LiveKit Cloud is a global edge mesh — clients connect to the nearest edge, which forwards via the mesh to the room's home region. Specific p50/p95 RTT envelopes are [REQUIRES-VERIFICATION] and depend on the user-edge distance. For lowest latency, pin room creation to a region close to the majority …	Pass / FailAi Platformmedium
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Livekit
Ai Platform
Sfu And Media Transport

Recommended for

LiveKit (Cloud + Agents)LiveKit customers

Works with

LiveKit

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Sfu And Media Transport eval for LiveKit LiveKit (Cloud + Agents) test?+

How is the Sfu And Media Transport eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Sfu And Media Transport pack for LiveKit LiveKit (Cloud + Agents) contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Sfu And Media Transport pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.