Cog And Custom Model Push
Replicate · Replicate
AI Model Hosting — Replicate
Replicate evals — Cog & Custom Model Push (relift v3 InfraRed)
About Replicate
Replicate is an AI model-hosting platform — run thousands of community and custom Cog-packaged models (FLUX, SDXL, Llama, Whisper, custom fine-tunes) via a simple HTTP API with predictions, webhooks, streaming, deployments, and per-second billing.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Integrator's cog.yaml declares build.gpu=true but does not specify a GPU class. Push succeeds but predictions OOM on the assigned tier. | Match the predict.py memory footprint to the documented per-tier VRAM (T4 16GB, A40 48GB, A100 40/80GB, H100 80GB). Declare the target hardware tier on the model in the Replicate UI (or via the API) — build.gpu=true is necessary but not sufficient. Test with a representative input before shipping. | Pass / FailAi Platformhigh |
| 02 | Custom Llama-7B model loads weights into GPU memory on every predict() call, adding 18 s per request. | Load weights once in setup() (called on container boot) and hold them in self.model. predict() should be a hot-path call against the warm model. Cold-start cost amortizes across all subsequent predictions on that container. [REQUIRES-VERIFICATION] for current container-lifetime caps. | Pass / FailAi Platformcritical |
| 03 | Long-running custom model wants to emit progress (denoising step k of N) to the prediction.logs field. | Use cog logging (print() or logging) to write progress lines; they appear in prediction.logs and stream over SSE/webhooks for clients that opt into the logs event filter. Do not stuff progress into output — output is the model's final payload. | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Replicate
- Ai Platform
- Cog And Custom Model Push
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.