For Mistral AIAI Platform

Mistral Fine Tuning And Model Customization

Mistral AI API · Mistral AI

Foundation Model & API — Mistral AI

Evaluates Mistral AI's Fine-tuning & Model Customization across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

About Mistral AI

Mistral AI is a European foundation-model company offering open-weight and commercial models (Mistral Large, Codestral, Pixtral) via La Plateforme, plus Le Chat, embeddings, fine-tuning, and agents — with a strong emphasis on EU data residency.

Employees

~250

Industry

Foundation Model

Headquarters

Paris, France

Website

mistral.ai

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	Operator uploads a JSONL training file for a fine-tuning job where ~8% of lines are malformed (missing the assistant turn).	Validate the training file format (one chat per line with the required roles) before creating the job; pre-checking avoids burning a failed job. Fix or drop malformed lines and re-validate.	Pass / FailAi Platformhigh
02	Operator sets a high training_steps value on a tiny dataset and the model memorizes/overfits.	Start from documented default hyperparameters and tune learning_rate and training_steps against validation loss; high steps on small data overfit. Watch the validation curve, not just training loss.	Pass / FailAi Platformmedium
03	Operator creates a fine-tuning job with no validation file and reports training loss as the quality signal.	Provide a held-out validation file to track validation loss; training loss alone overfits. Evaluate the resulting model on a separate eval set before production use.	Pass / FailAi Platformhigh
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Mistral
Ai Platform
Fine Tuning And Model Customization

Recommended for

Mistral AI APIMistral AI customers

Works with

Mistral AI

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Mistral Fine Tuning And Model Customization eval for Mistral AI Mistral AI API test?+

How is the Mistral Fine Tuning And Model Customization eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Mistral Fine Tuning And Model Customization pack for Mistral AI Mistral AI API contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Mistral Fine Tuning And Model Customization pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.