Eval Library
MA
For Mistral AIAI Platform

Mistral Models Versioning And Deployment

Mistral AI API · Mistral AI

Foundation Model & API — Mistral AI

Mistral AI evals — Models, Versioning & Deployment (relift v3 InfraRed)

About Mistral AI

Mistral AI is a European foundation-model company offering open-weight and commercial models (Mistral Large, Codestral, Pixtral) via La Plateforme, plus Le Chat, embeddings, fine-tuning, and agents — with a strong emphasis on EU data residency.

Employees

~250

Industry

Foundation Model

Headquarters

Paris, France

Website

mistral.ai

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Production pins the model id to a floating alias (e.g., 'mistral-large-latest'); a silent version bump changes outputs and breaks a downstream parser.

Pin to a dated/versioned model id in production for reproducibility; treat aliases like '-latest' as moving targets and upgrade deliberately behind an eval gate. Roll forward, don't drift.

Pass / FailAi Platformhigh
02

Operator uses a general chat model for fill-in-the-middle code completion and gets poor results, unaware Codestral and the /v1/fim/completions endpoint exist.

Route code-completion / fill-in-the-middle tasks to Codestral via the dedicated FIM endpoint with prompt+suffix; use general chat models for conversational tasks. Match the model and endpoint to the task.

Pass / FailAi Platformmedium
03

Code hard-codes a list of available model ids that goes stale as models are added/retired.

Query GET /v1/models for the authoritative list of available models and their ids/capabilities rather than hard-coding; refresh on deploy. Treat any embedded list as a cache, not the source of truth.

Pass / FailAi Platformlow

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Mistral
  • Ai Platform
  • Models Versioning And Deployment

Recommended for

Mistral AI APIMistral AI customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.