Models Versions And Schema
Replicate · Replicate
AI Model Hosting — Replicate
Replicate evals — Models, Versions & Schema (relift v3 InfraRed)
About Replicate
Replicate is an AI model-hosting platform — run thousands of community and custom Cog-packaged models (FLUX, SDXL, Llama, Whisper, custom fine-tunes) via a simple HTTP API with predictions, webhooks, streaming, deployments, and per-second billing.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Integrator constructs GET /v1/models/stability-ai/sdxl from a user-typed string 'Stability-AI / SDXL '. | Normalize the slug to lowercase owner/name with no whitespace; validate against the documented charset before sending. Reject ambiguous inputs client-side instead of forwarding and letting the API 404. Model slugs are case-sensitive in API URLs. | Pass / FailAi Platformmedium |
| 02 | FLUX-pro version 'a1b2c3...' is pinned in production. Replicate publishes version 'd4e5f6...' as latest_version on the model. | Pinned version id continues to serve identically — version ids are immutable. Surface 'newer version available' to the operator out-of-band (e.g., a weekly diff job), but do not auto-upgrade. To roll forward, fetch the new version's schema, run regression evals, then update the pin. | Pass / FailAi Platformcritical |
| 03 | GET /v1/models/{owner}/{name}/versions/{version_id} returns OpenAPI input schema with prompt:string, num_inference_steps:integer (1-50), guidance_scale:number (1.0-20.0). | Generate the operator's UI form fields from the OpenAPI schema (min/max/enum, defaults). Re-introspect on every version pin change. Treat the schema as source of truth; do not hardcode field lists in frontend code. | Pass / FailAi Platformhigh |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Replicate
- Ai Platform
- Models Versions And Schema
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.