Eval directory
Evals for AssemblyAI
8 evaluation packs covering adversarial robustness, safety gates, workflow quality, and operator-level checks for AssemblyAI AI products.
About AssemblyAI
AssemblyAI is a speech-AI platform with Universal-2 speech-to-text, real-time streaming, Speaker Diarization, Audio Intelligence (summarization, sentiment, content moderation), and LeMUR — an LLM framework that runs over transcripts (task, summary, question-answer, action items).
Available eval packs for AssemblyAI
8 packs ready to run.
Audio Intelligence
AssemblyAI evals — Audio Intelligence (relift v3 InfraRed)
Auth Rate Limits Concurrency Governance
AssemblyAI evals — Auth, Rate Limits, Concurrency & Governance (relift v3 InfraRed)
Batch Transcription Universal 2
Transcription AccuracyAssemblyAI evals — Batch Transcription (Universal-2) (relift v3 InfraRed)
Lemur
AssemblyAI evals — LeMUR (relift v3 InfraRed)
Speaker Labels And Diarization
AssemblyAI evals — Speaker Labels & Diarization (relift v3 InfraRed)
Streaming Stt Realtime
Transcription AccuracyAssemblyAI evals — Streaming STT (Real-time) (relift v3 InfraRed)
Transcript Features
AssemblyAI evals — Transcript Features (relift v3 InfraRed)
Webhooks And Async Delivery
AssemblyAI evals — Webhooks & Async Delivery (relift v3 InfraRed)
Why eval AssemblyAI AI
AssemblyAI's AI features ship behind brand promises about accuracy, safety, and reliability. Buyers and integrators need to know those promises hold up under adversarial prompts, edge-case workflows, and the long tail of real customer inputs — not just the demo path.
The Corsac eval library for AssemblyAI measures four dimensions teams care about most when deploying ai platform agents:
- Adversarial robustness — does the agent resist prompt injection, jailbreaks, and social-engineering attempts?
- Workflow quality— does it complete the task buyers were shown in the demo, on inputs that don't look like the demo?
- Safety gates — does it escalate or refuse when it should, and only then?
- Operator quality — does it preserve analyst trust by surfacing the right context at the right time?
Every eval pack above is hand-authored against AssemblyAI's public product surface and runnable in Corsac with your own data.