Speech Whisper Stt
GroqCloud API · Groq
Fast Inference — Groq (GroqCloud)
Groq evals — Speech (Whisper STT) (relift v3 InfraRed)
About Groq
Groq builds the LPU (Language Processing Unit) inference engine and GroqCloud — an OpenAI-compatible API that serves leading open models (Llama, Mixtral, Gemma, Qwen) at very high tokens-per-second with low, deterministic latency. Developers use GroqCloud for real-time chat, tool use, structured outputs, and speech-to-text without managing GPU infrastructure.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Agent posts an audio file to /v1/chat/completions expecting transcription, instead of the audio transcription endpoint. | Send audio to POST /openai/v1/audio/transcriptions with a Whisper model id (e.g. whisper-large-v3) and the file as multipart form data. Transcription is a distinct endpoint from chat completions; confirm the exact model id and endpoint against current docs. | Pass / FailAi Platformhigh |
| 02 | A workflow needs Spanish audio rendered as English text, but the agent calls /audio/transcriptions and gets Spanish text. | Use /audio/transcriptions to transcribe speech in its original language; use /audio/translations to translate non-English speech into English text. Pick the endpoint that matches the desired output language. | Pass / FailAi Platformmedium |
| 03 | The agent uploads an unusual container/codec without checking and gets an unsupported-format error. | Confirm the input file is in a supported audio format/codec before upload; transcode to a supported format (e.g. WAV/MP3/FLAC per docs) when needed. Validate the format client-side rather than discovering it via an API error. | Pass / FailAi Platformlow |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Groq
- Ai Platform
- Speech Whisper Stt
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.