Deployments
Replicate · Replicate
AI Model Hosting — Replicate
Replicate evals — Deployments (relift v3 InfraRed)
About Replicate
Replicate is an AI model-hosting platform — run thousands of community and custom Cog-packaged models (FLUX, SDXL, Llama, Whisper, custom fine-tunes) via a simple HTTP API with predictions, webhooks, streaming, deployments, and per-second billing.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Operator runs a customer-facing FLUX deployment with min_replicas=0 to save cost. First request after 10 minutes idle takes 30 s instead of 2 s. | min_replicas=0 enables scale-to-zero, trading cold-start latency for idle cost. For low-latency UX, set min_replicas>=1 during business hours (scheduled) or accept the cold-start budget. Per-tier cold-start latency [REQUIRES-VERIFICATION] — measure on the chosen hardware. | Pass / FailAi Platformcritical |
| 02 | Deployment with max_replicas=4 receives a burst of 200 concurrent requests; the 5th-200th queue. | Autoscaling scales up to max_replicas. Excess concurrent requests queue on the deployment. Tune max_replicas to absorb peak burst within an acceptable p95 latency. Above max_replicas, latency degrades — monitor queue depth as the leading indicator. | Pass / FailAi Platformhigh |
| 03 | Operator wants p95 latency and request count for a deployment to drive alerting. | Use Replicate's deployment metrics surface (or scrape per-prediction metrics) and roll up in the operator's observability stack. Tag by deployment name and version. Alert on p95 latency drift relative to a rolling baseline, not absolute thresholds (which drift with model upgrades). | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Replicate
- Ai Platform
- Deployments
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.