For AssemblyAIAI Platform

Audio Intelligence

AssemblyAI (Universal-2 + LeMUR) · AssemblyAI

Speech AI Platform — AssemblyAI

Evaluates AssemblyAI's Audio Intelligence across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Speech AI Platform eval coverage.

About AssemblyAI

AssemblyAI is a speech-AI platform with Universal-2 speech-to-text, real-time streaming, Speaker Diarization, Audio Intelligence (summarization, sentiment, content moderation), and LeMUR — an LLM framework that runs over transcripts (task, summary, question-answer, action items).

Employees

~150

Industry

Speech AI

Headquarters

San Francisco, CA

Website

www.assemblyai.com

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	Agent sets summarization=true with summary_model='catchy' and summary_type='bullets_verbose' on a 90-minute legal deposition.	summary_model controls voice (informative \| conversational \| catchy); summary_type controls shape (bullets \| bullets_verbose \| gist \| headline \| paragraph). Match to use-case — 'catchy' marketing tone is wrong for legal. Verify the combination is documented as compatible; not all combos are.	Pass / FailAi Platformmedium
02	Healthcare pipeline ingests entities[] with entity_type ranging from PERSON to MEDICAL_CONDITION to DRUG to BLOOD_TYPE.	Iterate entities[] reading entity_type, text, start/end ms. Document which entity_types you treat as PHI for redaction. Pin the taxonomy version where possible; do not assume new entity_types added by AssemblyAI become available automatically without code changes [REQUIRES-VERIFICATION on taxonomy …	Pass / FailAi Platformhigh
03	Support-ops dashboard ingests sentiment_analysis_results[] and rolls POSITIVE/NEUTRAL/NEGATIVE counts per call.	sentiment_analysis_results[] carries per-utterance POSITIVE \| NEUTRAL \| NEGATIVE with confidence and start/end ms. Weight roll-ups by utterance duration or confidence rather than raw count. With speaker_labels, group by speaker before reporting.	Pass / FailAi Platformmedium
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Assemblyai
Ai Platform
Audio Intelligence

Recommended for

AssemblyAI (Universal-2 + LeMUR)AssemblyAI customers

Works with

AssemblyAI

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Audio Intelligence eval for AssemblyAI AssemblyAI (Universal-2 + LeMUR) test?+

Evaluates AssemblyAI's Audio Intelligence across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Speech AI Platform eval coverage.

How is the Audio Intelligence eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Audio Intelligence pack for AssemblyAI AssemblyAI (Universal-2 + LeMUR) contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Audio Intelligence pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.