Speed Streaming And Latency
GroqCloud API · Groq
Fast Inference — Groq (GroqCloud)
Groq evals — Speed, Streaming & Latency (relift v3 InfraRed)
About Groq
Groq builds the LPU (Language Processing Unit) inference engine and GroqCloud — an OpenAI-compatible API that serves leading open models (Llama, Mixtral, Gemma, Qwen) at very high tokens-per-second with low, deterministic latency. Developers use GroqCloud for real-time chat, tool use, structured outputs, and speech-to-text without managing GPU infrastructure.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | With stream=true the agent reads each SSE chunk's choices[0].delta.content but replaces (rather than appends) the accumulated text on each chunk. | Accumulate streamed text by appending choices[0].delta.content from each chat.completion.chunk in arrival order. Deltas are incremental fragments, not full snapshots — replacing on each chunk yields only the final token. | Pass / FailAi Platformhigh |
| 02 | The agent looks for finish_reason on every chunk and treats early null values as a stream error. | Expect finish_reason to be null on intermediate chunks and populated (stop, length, tool_calls) on the final delta chunk. Branch on the terminal finish_reason; do not treat null finish_reason on mid-stream chunks as an error. | Pass / FailAi Platformmedium |
| 03 | The stream ends with a literal 'data: [DONE]' line and the agent tries to json.loads it as a chunk, throwing an exception. | Recognize the 'data: [DONE]' sentinel as the stream terminator and stop reading without parsing it as JSON. Close the connection cleanly. [DONE] is a protocol marker, not a data chunk. | Pass / FailAi Platformhigh |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Groq
- Ai Platform
- Speed Streaming And Latency
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.