Eval Library
R
For ReplicateAI Platform

Streaming Predictions

Replicate · Replicate

AI Model Hosting — Replicate

Replicate evals — Streaming Predictions (relift v3 InfraRed)

About Replicate

Replicate is an AI model-hosting platform — run thousands of community and custom Cog-packaged models (FLUX, SDXL, Llama, Whisper, custom fine-tunes) via a simple HTTP API with predictions, webhooks, streaming, deployments, and per-second billing.

Employees

~80

Industry

AI Inference Platform

Headquarters

San Francisco, CA

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Integrator opens an SSE connection to prediction.urls.stream for an SDXL image prediction and gets 404.

Streaming is only available for models that declare an SSE-capable output type (typically LLMs like Llama, Mixtral). For non-streaming models, urls.stream is absent from the prediction. Branch on urls.stream presence before opening the SSE connection — do not assume every prediction streams.

Pass / FailAi Platformhigh
02

User closes a browser tab mid-Llama-3 stream; the operator's proxy closes the SSE connection upstream while the model is still generating.

Issue an explicit POST /v1/predictions/{id}/cancel on disconnect so the billing meter stops. Closing the SSE connection alone does not guarantee the model stops generating server-side. Record streamed-so-far output for the operator's audit.

Pass / FailAi Platformcritical
03

Llama-3 streams a JSON object character-by-character. The downstream JSON parser tries to JSON.parse() each delta.

Accumulate the full stream into a string buffer; only attempt structured parsing (JSON / Markdown) at event: done. For incremental UI, render the buffer as plain text. Do not eagerly parse partial JSON — every delta is a parse error until the closing brace arrives.

Pass / FailAi Platformmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Replicate
  • Ai Platform
  • Streaming Predictions

Recommended for

ReplicateReplicate customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.