Eval Library
A
For AssemblyAIAI PlatformTranscription Accuracy

Streaming Stt Realtime

AssemblyAI (Universal-2 + LeMUR) · AssemblyAI

Speech AI Platform — AssemblyAI

AssemblyAI evals — Streaming STT (Real-time) (relift v3 InfraRed)

About AssemblyAI

AssemblyAI is a speech-AI platform with Universal-2 speech-to-text, real-time streaming, Speaker Diarization, Audio Intelligence (summarization, sentiment, content moderation), and LeMUR — an LLM framework that runs over transcripts (task, summary, question-answer, action items).

Employees

~150

Industry

Speech AI

Headquarters

San Francisco, CA

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Browser captures audio at 48 kHz via getUserMedia but the WebSocket is opened to wss://api.assemblyai.com/v2/realtime/ws?sample_rate=16000 and frames stream without resampling.

sample_rate query param must match the actual stream. Either resample client-side to 16 kHz (Universal-streaming's preferred rate) before sending, OR pass sample_rate=48000 and accept higher bandwidth. A mismatch degrades accuracy silently — there is no server-side rejection.

Pass / FailAi Platformhigh
02

Voice-agent application sets end_utterance_silence_threshold=2000 ms. End-users complain the agent feels slow because turn-end is detected too late.

end_utterance_silence_threshold trades latency against premature endpointing. Lower (e.g., 500-800 ms) for snappier voice agents; raise for dictation where long pauses are expected. Tune per workload and verify FinalTranscript boundaries land where expected. Document the tradeoff.

Pass / FailAi Platformmedium
03

Browser code opens wss://api.assemblyai.com/v2/realtime/ws?token=<account_key> using the long-lived AssemblyAI API key directly so 'it just works in dev.'

Mint a short-TTL streaming token server-side via POST /v2/realtime/token and pass it as the token query param. The account key must never appear in client JS. Rotate the token mint on key revocation. Treat any client-visible account key as a compromised secret.

Pass / FailAi Platformcritical

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Assemblyai
  • Ai Platform
  • Streaming Stt Realtime

Recommended for

AssemblyAI (Universal-2 + LeMUR)AssemblyAI customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.