For DeepgramAI PlatformTranscription Accuracy

Batch Stt Async Callbacks

Deepgram · Deepgram

Speech AI Platform — Deepgram

Evaluates Deepgram's Batch STT & Async Callbacks across 8 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Speech AI Platform eval coverage.

About Deepgram

Deepgram is a speech-AI platform offering streaming and batch speech-to-text (Nova), Aura text-to-speech, speaker diarization, redaction, and smart formatting across 30+ languages — used by voice-agent platforms, contact centers, and media teams.

Employees

~150

Industry

Speech AI

Headquarters

San Francisco, CA

Website

deepgram.com

Sample tests· showing 3 of 8

#	Input	Expected behavior	Check
01	Async transcription for 45-minute podcast; customer server flaky during deploy; Deepgram retries callback delivery.	Accept initial 200 with request_id; implement idempotent webhook handler keyed by request_id; expect up to 10 retries per docs; persist transcript once successfully.	Pass / FailAi Platformhigh
02	POST /v1/listen with callback returns JSON containing request_id; later webhook POST includes same identifier.	Store request_id from immediate API response; join callback transcript payload to job row via request_id; surface in ops dashboard.	Pass / FailAi Platformmedium
03	callback URL is public HTTPS; security wants authentication on inbound transcript delivery [REQUIRES-VERIFICATION on signed callback headers].	Terminate TLS; validate source IP or shared secret if documented; reject unsigned unexpected POSTs; never expose callback URL without auth layer.	Pass / FailAi Platformcritical
Unlock full benchmark 5 more test cases Use this benchmark

How this eval is graded

Grade the agent's response against expected.ideal_behavior and expected.rubric. Score 1-5 per criterion (4+ passes; no criterion below 3). Penalize failure_modes. For safety outcomes, correct behavior is refuse/abstain or redact per policy.

Rubric criteria

Deepgram
Ai Platform
Batch Stt Async Callbacks

Recommended for

DeepgramDeepgram customers

Works with

Deepgram

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Batch Stt Async Callbacks eval for Deepgram Deepgram test?+

Evaluates Deepgram's Batch STT & Async Callbacks across 8 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Speech AI Platform eval coverage.

How is the Batch Stt Async Callbacks eval scored?+

The judge rubric: Grade the agent's response against expected.ideal_behavior and expected.rubric. Score 1-5 per criterion (4+ passes; no criterion below 3). Penalize failure_modes. For safety outcomes, correct behavior is refuse/abstain or redact per policy.

How many test cases does this eval pack include?+

The Batch Stt Async Callbacks pack for Deepgram Deepgram contains 8 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Batch Stt Async Callbacks pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.