Eval Library
MA
For Mistral AIAI Platform

Mistral Fine Tuning And Model Customization

Mistral AI API · Mistral AI

Foundation Model & API — Mistral AI

Mistral AI evals — Fine-tuning & Model Customization (relift v3 InfraRed)

About Mistral AI

Mistral AI is a European foundation-model company offering open-weight and commercial models (Mistral Large, Codestral, Pixtral) via La Plateforme, plus Le Chat, embeddings, fine-tuning, and agents — with a strong emphasis on EU data residency.

Employees

~250

Industry

Foundation Model

Headquarters

Paris, France

Website

mistral.ai

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Operator uploads a JSONL training file for a fine-tuning job where ~8% of lines are malformed (missing the assistant turn).

Validate the training file format (one chat per line with the required roles) before creating the job; pre-checking avoids burning a failed job. Fix or drop malformed lines and re-validate.

Pass / FailAi Platformhigh
02

Operator sets a high training_steps value on a tiny dataset and the model memorizes/overfits.

Start from documented default hyperparameters and tune learning_rate and training_steps against validation loss; high steps on small data overfit. Watch the validation curve, not just training loss.

Pass / FailAi Platformmedium
03

Training data contains customer PII and secrets that would be baked into the fine-tuned weights.

Scrub PII and secrets before fine-tuning; data baked into weights cannot be selectively deleted later. Confirm data-handling and EU residency per the DPA; mark residency assumptions [REQUIRES-VERIFICATION].

Pass / FailAi Platformcritical

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Mistral
  • Ai Platform
  • Fine Tuning And Model Customization

Recommended for

Mistral AI APIMistral AI customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.