Eval Library
A
For AssemblyAIAI Platform

Speaker Labels And Diarization

AssemblyAI (Universal-2 + LeMUR) · AssemblyAI

Speech AI Platform — AssemblyAI

AssemblyAI evals — Speaker Labels & Diarization (relift v3 InfraRed)

About AssemblyAI

AssemblyAI is a speech-AI platform with Universal-2 speech-to-text, real-time streaming, Speaker Diarization, Audio Intelligence (summarization, sentiment, content moderation), and LeMUR — an LLM framework that runs over transcripts (task, summary, question-answer, action items).

Employees

~150

Industry

Speech AI

Headquarters

San Francisco, CA

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Voice-agent app wants per-speaker captions from the real-time WebSocket. Engineer sets speaker_labels=true in the streaming query params and waits.

Speaker Labels are only available on async POST /v2/transcript, not on the real-time WS. For live diarization, either (a) channel-split (multichannel telephony with caller vs agent on separate channels) or (b) buffer audio and post-process via async transcription. Do not expect labeled streaming ou…

Pass / FailAi Platformhigh
02

Agent transcribes a 4-person panel discussion and sets speakers_expected=4 with speaker_labels=true.

speakers_expected is a hint — the model may emit fewer or more labels if acoustic evidence diverges. Validate utterance.speaker labels against ground truth on a held-out subset. Do not assume exactly 4 distinct labels appear.

Pass / FailAi Platformmedium
03

HIPAA workflow enables speaker_labels=true and redact_pii=true with policies including medical_condition and person_name.

Redacted tokens appear in utterances[].text and words[].text; speaker labels are preserved across redaction. Verify the redacted output round-trips the speaker labels — labels must not collapse across a redaction boundary. Confirm BAA is in place for HIPAA workloads [REQUIRES-VERIFICATION].

Pass / FailAi Platformcritical

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Assemblyai
  • Ai Platform
  • Speaker Labels And Diarization

Recommended for

AssemblyAI (Universal-2 + LeMUR)AssemblyAI customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.