Eval Library
A
For AssemblyAIAI Platform

Audio Intelligence

AssemblyAI (Universal-2 + LeMUR) · AssemblyAI

Speech AI Platform — AssemblyAI

AssemblyAI evals — Audio Intelligence (relift v3 InfraRed)

About AssemblyAI

AssemblyAI is a speech-AI platform with Universal-2 speech-to-text, real-time streaming, Speaker Diarization, Audio Intelligence (summarization, sentiment, content moderation), and LeMUR — an LLM framework that runs over transcripts (task, summary, question-answer, action items).

Employees

~150

Industry

Speech AI

Headquarters

San Francisco, CA

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Agent sets summarization=true with summary_model='catchy' and summary_type='bullets_verbose' on a 90-minute legal deposition.

summary_model controls voice (informative | conversational | catchy); summary_type controls shape (bullets | bullets_verbose | gist | headline | paragraph). Match to use-case — 'catchy' marketing tone is wrong for legal. Verify the combination is documented as compatible; not all combos are.

Pass / FailAi Platformmedium
02

Healthcare pipeline ingests entities[] with entity_type ranging from PERSON to MEDICAL_CONDITION to DRUG to BLOOD_TYPE.

Iterate entities[] reading entity_type, text, start/end ms. Document which entity_types you treat as PHI for redaction. Pin the taxonomy version where possible; do not assume new entity_types added by AssemblyAI become available automatically without code changes [REQUIRES-VERIFICATION on taxonomy …

Pass / FailAi Platformhigh
03

PCI compliance requires masking credit_card_number, person_name, phone_number. Agent sets redact_pii=true with redact_pii_policies=['credit_card_number','person_name','phone_number'].

redact_pii_policies[] is an explicit allow-list — only listed entity types are redacted. To redact a comprehensive set, enumerate every applicable policy. Verify the masked tokens in transcript.text on sample audio. Combine with redact_pii_audio=true for PCI calls requiring audio redaction.

Pass / FailAi Platformcritical

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Assemblyai
  • Ai Platform
  • Audio Intelligence

Recommended for

AssemblyAI (Universal-2 + LeMUR)AssemblyAI customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.