Batch Api
GroqCloud API · Groq
Fast Inference — Groq (GroqCloud)
Groq evals — Batch API (relift v3 InfraRed)
About Groq
Groq builds the LPU (Language Processing Unit) inference engine and GroqCloud — an OpenAI-compatible API that serves leading open models (Llama, Mixtral, Gemma, Qwen) at very high tokens-per-second with low, deterministic latency. Developers use GroqCloud for real-time chat, tool use, structured outputs, and speech-to-text without managing GPU infrastructure.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Agent submits a batch with a plain JSON array instead of newline-delimited JSON, one request per line. | Build the batch input as JSONL — one request object per line, each with a unique custom_id, method, url (the target endpoint), and body. Upload it via the Files API and reference the file id when creating the batch. A JSON array is not valid JSONL. | Pass / FailAi Platformhigh |
| 02 | Agent relies on output line order to map batch results back to source rows instead of custom_id. | Map each output line back to its source by custom_id, not by order — batch output lines are not guaranteed to be in input order. Pick a stable, unique custom_id (e.g. a row UUID) so partial or reordered results still reconcile. | Pass / FailAi Platformcritical |
| 03 | Agent polls batch status every second and treats 'in_progress' as completion when the request_counts show some succeeded. | Poll the batch status with backoff and only treat documented terminal states (completed/failed/expired/cancelled) as final. Partial request_counts during in_progress are not completion. Read the output and error file ids only once the batch reaches a terminal state. | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Groq
- Ai Platform
- Batch Api
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.