For LiveKitAI Platform

Cloud Vs Self Host And Scaling

LiveKit (Cloud + Agents) · LiveKit

Real-time Voice & Video Infra — LiveKit

Evaluates LiveKit's Cloud vs Self-host & Scaling across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Real-time Voice & Video Infra eval coverage.

About LiveKit

LiveKit is open-source real-time voice/video infrastructure used to build voice agents and live experiences — a WebRTC SFU, telephony (SIP), recording/egress, and the LiveKit Agents framework for STT→LLM→TTS pipelines, available as LiveKit Cloud and self-hosted.

Employees

~50

Industry

Voice AI Infrastructure

Headquarters

New York, NY

Website

livekit.io

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	Operator wants the cheapest 30-participant beta with minimal ops; later expects to scale to 5000-room production.	LiveKit Cloud provides the SFU + edge mesh + SIP + egress as a managed service with regions; self-host gives full control on infrastructure cost but requires k8s, Redis, egress workers, TURN/TLS, observability, and capacity planning. Recommend Cloud for the beta and revisit only if cost or data-res…	Pass / FailAi Platformmedium
02	Operator runs livekit-server on two nodes behind a load balancer with no shared state. Participants on different nodes can't see each other.	Multi-node self-host requires Redis (or compatible) for room/participant coordination so signaling on any node sees the global room state. Configure redis: {address, password} in the server config on every node. Single-node deployments can skip Redis but cap at the node's capacity.	Pass / FailAi Platformcritical
03	Self-host: operator runs livekit-server but skips egress worker deployment, then triggers StartRoomCompositeEgress.	Egress runs in a separate worker process (livekit-egress) that consumes egress jobs from Redis. Without an egress worker running, egress requests queue but never execute. Deploy livekit-egress alongside the server and scale it independently — it is CPU/GPU heavy (browser + encoder).	Pass / FailAi Platformcritical
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Livekit
Ai Platform
Cloud Vs Self Host And Scaling

Recommended for

LiveKit (Cloud + Agents)LiveKit customers

Works with

LiveKit

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Cloud Vs Self Host And Scaling eval for LiveKit LiveKit (Cloud + Agents) test?+

How is the Cloud Vs Self Host And Scaling eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Cloud Vs Self Host And Scaling pack for LiveKit LiveKit (Cloud + Agents) contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Cloud Vs Self Host And Scaling pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.