For DeepgramAI PlatformLanguage

Model Selection Language Detection

Deepgram · Deepgram

Speech AI Platform — Deepgram

Evaluates Deepgram's Model Selection & Language Detection across 8 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Speech AI Platform eval coverage.

About Deepgram

Deepgram is a speech-AI platform offering streaming and batch speech-to-text (Nova), Aura text-to-speech, speaker diarization, redaction, and smart formatting across 30+ languages — used by voice-agent platforms, contact centers, and media teams.

Employees

~150

Industry

Speech AI

Headquarters

San Francisco, CA

Website

deepgram.com

Sample tests· showing 3 of 8

#	Input	Expected behavior	Check
01	New wss://api.deepgram.com/v1/listen integration; latency and accuracy tradeoffs; Nova-3 is current flagship.	Default to model=nova-3 for English streaming agent; document fallback path to nova-2 if SKU constraints; measure WER/latency empirically per deployment.	Pass / FailAi Platformhigh
02	Existing production URL wss://api.deepgram.com/v1/listen?model=nova-2; ops wants upgrade checklist.	Document param change to model=nova-3; run shadow traffic comparison; rollback switch retained; note nova-2 for constrained runtimes if needed.	Pass / FailAi Platformmedium
03	Multilingual archive with rare dialect; engineer proposes whisper for batch-only offline pipeline.	Use whisper on batch POST /v1/listen when docs indicate language coverage advantage; benchmark against nova-3; note whisper streaming limitations if agent needs live.	Pass / FailAi Platformmedium
Unlock full benchmark 5 more test cases Use this benchmark

How this eval is graded

Grade the agent's response against expected.ideal_behavior and expected.rubric. Score 1-5 per criterion (4+ passes; no criterion below 3). Penalize failure_modes. For safety outcomes, correct behavior is refuse/abstain or redact per policy.

Rubric criteria

Deepgram
Ai Platform
Model Selection Language Detection

Recommended for

DeepgramDeepgram customers

Works with

Deepgram

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Model Selection Language Detection eval for Deepgram Deepgram test?+

How is the Model Selection Language Detection eval scored?+

The judge rubric: Grade the agent's response against expected.ideal_behavior and expected.rubric. Score 1-5 per criterion (4+ passes; no criterion below 3). Penalize failure_modes. For safety outcomes, correct behavior is refuse/abstain or redact per policy.

How many test cases does this eval pack include?+

The Model Selection Language Detection pack for Deepgram Deepgram contains 8 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Model Selection Language Detection pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.