For LiveKitAI Platform

Recording And Egress

LiveKit (Cloud + Agents) · LiveKit

Real-time Voice & Video Infra — LiveKit

Evaluates LiveKit's Recording & Egress across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Real-time Voice & Video Infra eval coverage.

About LiveKit

LiveKit is open-source real-time voice/video infrastructure used to build voice agents and live experiences — a WebRTC SFU, telephony (SIP), recording/egress, and the LiveKit Agents framework for STT→LLM→TTS pipelines, available as LiveKit Cloud and self-hosted.

Employees

~50

Industry

Voice AI Infrastructure

Headquarters

New York, NY

Website

livekit.io

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	Compliance needs the full call as a single MP4 with both speakers. Operator starts a TrackEgress per participant.	RoomCompositeEgress produces one MP4/HLS with the composed layout — single file per call, ready for review. TrackEgress produces raw per-track files (useful for ML, not compliance review). Choose RoomComposite for compliance; TrackEgress for downstream processing.	Pass / FailAi Platformhigh
02	Operator uses RoomCompositeEgress with HLS output and 6-second segments. After egress_ended, the playlist references 980 segments.	All segments are uploaded to the sink with the playlist .m3u8 as index. CDN the playlist + segments; do not link directly to the sink (private). Verify the playlist's EXT-X-ENDLIST tag is present (final, not live). Retain segments per compliance policy.	Pass / FailAi Platformmedium
03	Egress request specifies AWS credentials inline. Six months later the key is rotated; existing egress jobs continue uploading mid-flight.	Inline credentials are scoped to the egress request and are not re-fetched. After rotation, NEW egress requests must use the new key — but already-running egress completes with the old key (assuming AWS has not invalidated). Use IAM role assumption with short-lived STS tokens for production; rotate…	Pass / FailAi Platformhigh
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Livekit
Ai Platform
Recording And Egress

Recommended for

LiveKit (Cloud + Agents)LiveKit customers

Works with

LiveKit

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Recording And Egress eval for LiveKit LiveKit (Cloud + Agents) test?+

Evaluates LiveKit's Recording & Egress across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Real-time Voice & Video Infra eval coverage.

How is the Recording And Egress eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Recording And Egress pack for LiveKit LiveKit (Cloud + Agents) contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Recording And Egress pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.