Eval Library
R
For ReplicateAI Platform

Deployments

Replicate · Replicate

AI Model Hosting — Replicate

Replicate evals — Deployments (relift v3 InfraRed)

About Replicate

Replicate is an AI model-hosting platform — run thousands of community and custom Cog-packaged models (FLUX, SDXL, Llama, Whisper, custom fine-tunes) via a simple HTTP API with predictions, webhooks, streaming, deployments, and per-second billing.

Employees

~80

Industry

AI Inference Platform

Headquarters

San Francisco, CA

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Operator runs a customer-facing FLUX deployment with min_replicas=0 to save cost. First request after 10 minutes idle takes 30 s instead of 2 s.

min_replicas=0 enables scale-to-zero, trading cold-start latency for idle cost. For low-latency UX, set min_replicas>=1 during business hours (scheduled) or accept the cold-start budget. Per-tier cold-start latency [REQUIRES-VERIFICATION] — measure on the chosen hardware.

Pass / FailAi Platformcritical
02

Deployment with max_replicas=4 receives a burst of 200 concurrent requests; the 5th-200th queue.

Autoscaling scales up to max_replicas. Excess concurrent requests queue on the deployment. Tune max_replicas to absorb peak burst within an acceptable p95 latency. Above max_replicas, latency degrades — monitor queue depth as the leading indicator.

Pass / FailAi Platformhigh
03

Operator wants p95 latency and request count for a deployment to drive alerting.

Use Replicate's deployment metrics surface (or scrape per-prediction metrics) and roll up in the operator's observability stack. Tag by deployment name and version. Alert on p95 latency drift relative to a rolling baseline, not absolute thresholds (which drift with model upgrades).

Pass / FailAi Platformmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Replicate
  • Ai Platform
  • Deployments

Recommended for

ReplicateReplicate customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.