Audio Formats And Encoding
Cartesia (Sonic) · Cartesia
Voice AI — Cartesia
Cartesia evals — Audio Formats & Encoding (relift v3 InfraRed)
About Cartesia
Cartesia builds real-time generative voice — its Sonic model delivers ultra-low-latency, high-fidelity text-to-speech with streaming, voice cloning, and prosody control for production voice agents and interactive audio experiences.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Agent requests output_format=raw PCM but writes the bytes to a '.wav' file and serves it as audio/wav. | Match the container to how the bytes will be consumed: request a wav container (with header) when serving a .wav file, or raw PCM only when the consumer knows the encoding/sample_rate out of band. Do not label raw PCM as wav — players will misread the missing header. | Pass / FailAi Platformhigh |
| 02 | A low-latency streaming agent picks mp3 output for everything, including the real-time leg where decode/framing adds latency. | Choose encoding by use case: prefer raw/PCM (or the documented low-latency encoding) for real-time streaming where decode overhead matters; use mp3/compressed for storage/download where size matters. Do not blanket-apply one container. | Pass / FailAi Platformmedium |
| 03 | Agent requests one sample_rate from /tts but configures the audio player at a different rate, producing pitch-shifted audio. | Use the same sample_rate for synthesis request and playback/decode. For raw PCM, carry the sample_rate alongside the bytes so the consumer decodes correctly. Verify the played rate equals the requested rate in tests. | Pass / FailAi Platformhigh |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Cartesia
- Ai Platform
- Audio Formats And Encoding
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.