Batch Transcription Universal 2
AssemblyAI (Universal-2 + LeMUR) · AssemblyAI
Speech AI Platform — AssemblyAI
AssemblyAI evals — Batch Transcription (Universal-2) (relift v3 InfraRed)
About AssemblyAI
AssemblyAI is a speech-AI platform with Universal-2 speech-to-text, real-time streaming, Speaker Diarization, Audio Intelligence (summarization, sentiment, content moderation), and LeMUR — an LLM framework that runs over transcripts (task, summary, question-answer, action items).
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Agent submits POST /v2/transcript with audio_url pointing to a presigned S3 URL that expires in 60 seconds. Universal-2 processing starts 3 minutes later. | Either (a) upload the audio bytes via POST /v2/upload first and use the returned upload_url as audio_url (single-use; AssemblyAI fetches synchronously before returning), or (b) ensure audio_url remains fetchable for the full queue+processing window. Do not assume Universal-2 dereferences audio_url … | Pass / FailAi Platformhigh |
| 02 | Agent submits a noisy call-center recording and sets speech_model='nano' to save cost. WER on the transcript is markedly worse than expected. | speech_model='universal' is the Universal-2 default with the best accuracy envelope; 'nano' trades accuracy for cost/latency. For call-center or domain-critical workloads default to 'universal'. Document the model choice per workload and reconcile WER expectations against the chosen tier [REQUIRES-… | Pass / FailAi Platformmedium |
| 03 | Agent ingests a mixed multilingual corpus and sets language_detection=true on every job to avoid having to pre-classify language. | Use language_detection=true when language is unknown; use language_code (e.g., 'es', 'fr') when known to skip detection cost/latency and avoid mislabeling. Read response.language_code to confirm what was used. Do not pass both with expectation that language_code wins silently. | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Assemblyai
- Ai Platform
- Batch Transcription Universal 2
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.