For Mistral AIAI PlatformAnswer Relevance

Mistral Embeddings And Retrieval

Mistral AI API · Mistral AI

Foundation Model & API — Mistral AI

Evaluates Mistral AI's Embeddings & Retrieval across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

About Mistral AI

Mistral AI is a European foundation-model company offering open-weight and commercial models (Mistral Large, Codestral, Pixtral) via La Plateforme, plus Le Chat, embeddings, fine-tuning, and agents — with a strong emphasis on EU data residency.

Employees

~250

Industry

Foundation Model

Headquarters

Paris, France

Website

mistral.ai

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	Team builds a retrieval index with mistral-embed but later embeds new queries with a different embedding model, then compares vectors.	All vectors in an index must come from the same embedding model; query vectors must be produced by mistral-embed if the index was built with mistral-embed. Mixing models makes similarity meaningless.	Pass / FailAi Platformcritical
02	Engineer hard-codes a vector dimension for mistral-embed copied from a forum post and provisions the vector store to that size.	Read the embedding dimensionality from an actual mistral-embed response and provision the store to match; do not hard-code a number from an unofficial source. Mark any cited dimension [REQUIRES-VERIFICATION] until confirmed against a live response.	Pass / FailAi Platformhigh
03	Retrieval ranks by raw dot product over embeddings without confirming normalization and gets inconsistent ordering versus cosine.	Confirm whether mistral-embed vectors are unit-normalized; if not guaranteed, normalize consistently before using cosine, and ensure the vector store's distance metric matches. Treat the normalization assumption as [REQUIRES-VERIFICATION].	Pass / FailAi Platformmedium
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Mistral
Ai Platform
Embeddings And Retrieval

Recommended for

Mistral AI APIMistral AI customers

Works with

Mistral AI

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Mistral Embeddings And Retrieval eval for Mistral AI Mistral AI API test?+

Evaluates Mistral AI's Embeddings & Retrieval across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

How is the Mistral Embeddings And Retrieval eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Mistral Embeddings And Retrieval pack for Mistral AI Mistral AI API contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Mistral Embeddings And Retrieval pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.