Livekit Agents Framework
LiveKit (Cloud + Agents) · LiveKit
Real-time Voice & Video Infra — LiveKit
LiveKit evals — LiveKit Agents Framework (relift v3 InfraRed)
About LiveKit
LiveKit is open-source real-time voice/video infrastructure used to build voice agents and live experiences — a WebRTC SFU, telephony (SIP), recording/egress, and the LiveKit Agents framework for STT→LLM→TTS pipelines, available as LiveKit Cloud and self-hosted.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Operator runs `python agent.py start` to register a worker against LiveKit Cloud. Worker prints 'registered' but no agent ever joins a room. | Worker registration only advertises availability — agents join rooms via (a) automatic dispatch matching the worker's room-name pattern, or (b) explicit AgentDispatchService.CreateDispatch from server code. Verify the dispatch path is wired; do not assume 'registered' implies 'joined'. | Pass / FailAi Platformhigh |
| 02 | Agent is mid-TTS playback. User starts speaking. Agent's voice keeps playing for another 2s before stopping. | On VAD-detected user speech start, immediately stop TTS playback and cancel any in-flight LLM completion that produced upstream tokens for the now-stopped playback. Mark the assistant turn as interrupted and persist the partial assistant transcript in conversation history so the model knows what wa… | Pass / FailAi Platformcritical |
| 03 | Operator wants to A/B test Deepgram STT vs AssemblyAI STT inside the same VoicePipelineAgent. | Both plugins implement the agents.stt.STT interface; swap by changing the stt= argument to VoicePipelineAgent. Preserve identical VAD and turn-taking config across the test arms; only the STT plugin should differ. Capture per-plugin latency + WER on the same room recordings. | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Livekit
- Ai Platform
- Livekit Agents Framework
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.