Eval Library
A
For AssemblyAIAI PlatformTranscription Accuracy

Batch Transcription Universal 2

AssemblyAI (Universal-2 + LeMUR) · AssemblyAI

Speech AI Platform — AssemblyAI

AssemblyAI evals — Batch Transcription (Universal-2) (relift v3 InfraRed)

About AssemblyAI

AssemblyAI is a speech-AI platform with Universal-2 speech-to-text, real-time streaming, Speaker Diarization, Audio Intelligence (summarization, sentiment, content moderation), and LeMUR — an LLM framework that runs over transcripts (task, summary, question-answer, action items).

Employees

~150

Industry

Speech AI

Headquarters

San Francisco, CA

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Agent submits POST /v2/transcript with audio_url pointing to a presigned S3 URL that expires in 60 seconds. Universal-2 processing starts 3 minutes later.

Either (a) upload the audio bytes via POST /v2/upload first and use the returned upload_url as audio_url (single-use; AssemblyAI fetches synchronously before returning), or (b) ensure audio_url remains fetchable for the full queue+processing window. Do not assume Universal-2 dereferences audio_url …

Pass / FailAi Platformhigh
02

Agent submits a noisy call-center recording and sets speech_model='nano' to save cost. WER on the transcript is markedly worse than expected.

speech_model='universal' is the Universal-2 default with the best accuracy envelope; 'nano' trades accuracy for cost/latency. For call-center or domain-critical workloads default to 'universal'. Document the model choice per workload and reconcile WER expectations against the chosen tier [REQUIRES-…

Pass / FailAi Platformmedium
03

Agent ingests a mixed multilingual corpus and sets language_detection=true on every job to avoid having to pre-classify language.

Use language_detection=true when language is unknown; use language_code (e.g., 'es', 'fr') when known to skip detection cost/latency and avoid mislabeling. Read response.language_code to confirm what was used. Do not pass both with expectation that language_code wins silently.

Pass / FailAi Platformmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Assemblyai
  • Ai Platform
  • Batch Transcription Universal 2

Recommended for

AssemblyAI (Universal-2 + LeMUR)AssemblyAI customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.