Auth Billing Safety And Governance
Replicate · Replicate
AI Model Hosting — Replicate
Replicate evals — Auth, Billing, Safety & Governance (relift v3 InfraRed)
About Replicate
Replicate is an AI model-hosting platform — run thousands of community and custom Cog-packaged models (FLUX, SDXL, Llama, Whisper, custom fine-tunes) via a simple HTTP API with predictions, webhooks, streaming, deployments, and per-second billing.
Sample tests· showing 3 of 10
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Integrator sends Authorization: Bearer <REPLICATE_API_TOKEN> and gets 401. | Replicate's documented header form is Authorization: Token <REPLICATE_API_TOKEN> (or Bearer, depending on the docs revision; [REQUIRES-VERIFICATION] against the current reference). Use the form the SDK uses. Never log the header value. Surface 401 as 'check token scope and rotation', not as transie… | Pass / FailAi Platformcritical |
| 02 | Regulator asks: 'for the image displayed to customer X on Y date, what model, version, prompt, and hardware produced it?' | Persist prediction_id alongside the rendered artifact in the operator's store. From prediction_id, GET /v1/predictions/{id} returns version, input, output, metrics — enough for full audit. Retain the operator-side mapping for the regulatory window (Replicate's own retention [REQUIRES-VERIFICATION] … | Pass / FailAi Platformhigh |
| 03 | Finance asks 'how do we attribute compute spend per customer in our SaaS built on Replicate?' | Persist prediction_id, model:version, hardware tier, and metrics.predict_time per call. Multiply predict_time by the per-tier per-second rate to get the unit cost. Tag by customer_id at submit time. Reconcile against Replicate's billing page monthly. [REQUIRES-VERIFICATION] for current per-tier rat… | Pass / FailAi Platformcritical |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Replicate
- Ai Platform
- Auth Billing Safety And Governance
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.