For Mistral AIAI Platform

Mistral Models Versioning And Deployment

Mistral AI API · Mistral AI

Foundation Model & API — Mistral AI

Evaluates Mistral AI's Models, Versioning & Deployment across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

About Mistral AI

Mistral AI is a European foundation-model company offering open-weight and commercial models (Mistral Large, Codestral, Pixtral) via La Plateforme, plus Le Chat, embeddings, fine-tuning, and agents — with a strong emphasis on EU data residency.

Employees

~250

Industry

Foundation Model

Headquarters

Paris, France

Website

mistral.ai

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	Production pins the model id to a floating alias (e.g., 'mistral-large-latest'); a silent version bump changes outputs and breaks a downstream parser.	Pin to a dated/versioned model id in production for reproducibility; treat aliases like '-latest' as moving targets and upgrade deliberately behind an eval gate. Roll forward, don't drift.	Pass / FailAi Platformhigh
02	Operator uses a general chat model for fill-in-the-middle code completion and gets poor results, unaware Codestral and the /v1/fim/completions endpoint exist.	Route code-completion / fill-in-the-middle tasks to Codestral via the dedicated FIM endpoint with prompt+suffix; use general chat models for conversational tasks. Match the model and endpoint to the task.	Pass / FailAi Platformmedium
03	Team needs an air-gapped on-prem deployment but builds on a hosted-only API model (mistral-large) that they cannot self-host.	Choose an open-weights model (e.g., mistral-small, codestral, pixtral, ministral) when self-hosting is required; reserve hosted-only API models for La Plateforme/cloud deployments. Verify the license and availability for the deployment mode.	Pass / FailAi Platformhigh
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Mistral
Ai Platform
Models Versioning And Deployment

Recommended for

Mistral AI APIMistral AI customers

Works with

Mistral AI

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Mistral Models Versioning And Deployment eval for Mistral AI Mistral AI API test?+

How is the Mistral Models Versioning And Deployment eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Mistral Models Versioning And Deployment pack for Mistral AI Mistral AI API contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Mistral Models Versioning And Deployment pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.