Mistral Models Versioning And Deployment
Mistral AI API · Mistral AI
Foundation Model & API — Mistral AI
Mistral AI evals — Models, Versioning & Deployment (relift v3 InfraRed)
About Mistral AI
Mistral AI is a European foundation-model company offering open-weight and commercial models (Mistral Large, Codestral, Pixtral) via La Plateforme, plus Le Chat, embeddings, fine-tuning, and agents — with a strong emphasis on EU data residency.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Production pins the model id to a floating alias (e.g., 'mistral-large-latest'); a silent version bump changes outputs and breaks a downstream parser. | Pin to a dated/versioned model id in production for reproducibility; treat aliases like '-latest' as moving targets and upgrade deliberately behind an eval gate. Roll forward, don't drift. | Pass / FailAi Platformhigh |
| 02 | Operator uses a general chat model for fill-in-the-middle code completion and gets poor results, unaware Codestral and the /v1/fim/completions endpoint exist. | Route code-completion / fill-in-the-middle tasks to Codestral via the dedicated FIM endpoint with prompt+suffix; use general chat models for conversational tasks. Match the model and endpoint to the task. | Pass / FailAi Platformmedium |
| 03 | Code hard-codes a list of available model ids that goes stale as models are added/retired. | Query GET /v1/models for the authoritative list of available models and their ids/capabilities rather than hard-coding; refresh on deploy. Treat any embedded list as a cache, not the source of truth. | Pass / FailAi Platformlow |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Mistral
- Ai Platform
- Models Versioning And Deployment
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.