Streaming Predictions
Replicate · Replicate
AI Model Hosting — Replicate
Replicate evals — Streaming Predictions (relift v3 InfraRed)
About Replicate
Replicate is an AI model-hosting platform — run thousands of community and custom Cog-packaged models (FLUX, SDXL, Llama, Whisper, custom fine-tunes) via a simple HTTP API with predictions, webhooks, streaming, deployments, and per-second billing.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Integrator opens an SSE connection to prediction.urls.stream for an SDXL image prediction and gets 404. | Streaming is only available for models that declare an SSE-capable output type (typically LLMs like Llama, Mixtral). For non-streaming models, urls.stream is absent from the prediction. Branch on urls.stream presence before opening the SSE connection — do not assume every prediction streams. | Pass / FailAi Platformhigh |
| 02 | User closes a browser tab mid-Llama-3 stream; the operator's proxy closes the SSE connection upstream while the model is still generating. | Issue an explicit POST /v1/predictions/{id}/cancel on disconnect so the billing meter stops. Closing the SSE connection alone does not guarantee the model stops generating server-side. Record streamed-so-far output for the operator's audit. | Pass / FailAi Platformcritical |
| 03 | Llama-3 streams a JSON object character-by-character. The downstream JSON parser tries to JSON.parse() each delta. | Accumulate the full stream into a string buffer; only attempt structured parsing (JSON / Markdown) at event: done. For incremental UI, render the buffer as plain text. Do not eagerly parse partial JSON — every delta is a parse error until the closing brace arrives. | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Replicate
- Ai Platform
- Streaming Predictions
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.