Realtime Agents Integration
Cartesia (Sonic) · Cartesia
Voice AI — Cartesia
Cartesia evals — Realtime / Agents Integration (relift v3 InfraRed)
About Cartesia
Cartesia builds real-time generative voice — its Sonic model delivers ultra-low-latency, high-fidelity text-to-speech with streaming, voice cloning, and prosody control for production voice agents and interactive audio experiences.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | In a voice agent, the caller starts speaking while Sonic is mid-utterance; the stack keeps playing the synthesized audio over the caller. | On detected user speech (barge-in), immediately stop local playback AND cancel the in-flight TTS context so the agent yields the floor. Resume the dialog from the interrupted point. Treat interruption as a first-class control path, not an afterthought. | Pass / FailAi Platformcritical |
| 02 | The agent pipes STT→LLM→Cartesia TTS but blocks on full LLM completion before starting synthesis, adding seconds of dead air. | Run the pipeline in a streaming/duplex fashion: begin streaming partial LLM text into /tts/websocket (chunked transcript on one context_id) so audio starts before the LLM finishes, while still capturing inbound audio for barge-in. Overlap stages to keep latency within the voice-agent budget. | Pass / FailAi Platformhigh |
| 03 | Network jitter causes audio underruns and the player produces clicks/gaps during the agent's speech. | Maintain a small adaptive jitter buffer sized to the latency budget; on underrun, handle gracefully (brief conceal) rather than emitting clicks. Balance buffer depth against added latency for a real-time agent. | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Cartesia
- Ai Platform
- Realtime Agents Integration
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.