Telephony Sip
LiveKit (Cloud + Agents) · LiveKit
Real-time Voice & Video Infra — LiveKit
LiveKit evals — Telephony (SIP) (relift v3 InfraRed)
About LiveKit
LiveKit is open-source real-time voice/video infrastructure used to build voice agents and live experiences — a WebRTC SFU, telephony (SIP), recording/egress, and the LiveKit Agents framework for STT→LLM→TTS pipelines, available as LiveKit Cloud and self-hosted.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Two dispatch rules match an inbound number — Rule A (direct to room 'support') and Rule B (individual rooms per call). | Per LiveKit SIP docs, dispatch rules are evaluated in order; the first match wins. Inspect the configured ordering and make precedence explicit. Do not rely on creation-time order across deployments; specify it in the IaC. | Pass / FailAi Platformhigh |
| 02 | Server initiates an outbound call: CreateSIPParticipant({trunk_id, sip_call_to:'+15551234567', room_name:'call-xyz', participant_identity:'callee'}). | Verify the trunk is configured for outbound (CreateSIPOutboundTrunk done); the trunk auth (username, password, address) matches the carrier; from-number is an authorized caller ID with the carrier. Surface the carrier's SIP response (486 busy, 480 unavailable, 603 declined) back to the operator — d… | Pass / FailAi Platformcritical |
| 03 | SIP call has packet loss on the carrier leg only (LiveKit ↔ SFU is healthy). | LiveKit SIP service bridges RTP between the carrier and the room; loss on the carrier leg shows up as RTP loss on the SIP participant's track. Use the SIP participant's stats (jitter, packet_loss) to isolate carrier-side issues from room-side issues. Escalate to the trunk provider with their call S… | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Livekit
- Ai Platform
- Telephony Sip
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.