Eval Library
L
For LiveKitAI Platform

Livekit Agents Framework

LiveKit (Cloud + Agents) · LiveKit

Real-time Voice & Video Infra — LiveKit

LiveKit evals — LiveKit Agents Framework (relift v3 InfraRed)

About LiveKit

LiveKit is open-source real-time voice/video infrastructure used to build voice agents and live experiences — a WebRTC SFU, telephony (SIP), recording/egress, and the LiveKit Agents framework for STT→LLM→TTS pipelines, available as LiveKit Cloud and self-hosted.

Employees

~50

Industry

Voice AI Infrastructure

Headquarters

New York, NY

Website

livekit.io

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Operator runs `python agent.py start` to register a worker against LiveKit Cloud. Worker prints 'registered' but no agent ever joins a room.

Worker registration only advertises availability — agents join rooms via (a) automatic dispatch matching the worker's room-name pattern, or (b) explicit AgentDispatchService.CreateDispatch from server code. Verify the dispatch path is wired; do not assume 'registered' implies 'joined'.

Pass / FailAi Platformhigh
02

Agent is mid-TTS playback. User starts speaking. Agent's voice keeps playing for another 2s before stopping.

On VAD-detected user speech start, immediately stop TTS playback and cancel any in-flight LLM completion that produced upstream tokens for the now-stopped playback. Mark the assistant turn as interrupted and persist the partial assistant transcript in conversation history so the model knows what wa…

Pass / FailAi Platformcritical
03

Operator wants to A/B test Deepgram STT vs AssemblyAI STT inside the same VoicePipelineAgent.

Both plugins implement the agents.stt.STT interface; swap by changing the stt= argument to VoicePipelineAgent. Preserve identical VAD and turn-taking config across the test arms; only the STT plugin should differ. Capture per-plugin latency + WER on the same room recordings.

Pass / FailAi Platformmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Livekit
  • Ai Platform
  • Livekit Agents Framework

Recommended for

LiveKit (Cloud + Agents)LiveKit customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.