Eval directory
Evals for Mistral AI
8 evaluation packs covering adversarial robustness, safety gates, workflow quality, and operator-level checks for Mistral AI AI products.
About Mistral AI
Mistral AI is a European foundation-model company offering open-weight and commercial models (Mistral Large, Codestral, Pixtral) via La Plateforme, plus Le Chat, embeddings, fine-tuning, and agents — with a strong emphasis on EU data residency.
Available eval packs for Mistral AI
8 packs ready to run.
Mistral Chat Completions And Streaming
Mistral AI evals — Chat Completions & Streaming (relift v3 InfraRed)
Mistral Embeddings And Retrieval
Answer RelevanceMistral AI evals — Embeddings & Retrieval (relift v3 InfraRed)
Mistral Fine Tuning And Model Customization
Mistral AI evals — Fine-tuning & Model Customization (relift v3 InfraRed)
Mistral Function Calling And Tool Use
Tool SelectionMistral AI evals — Function Calling & Tool Use (relift v3 InfraRed)
Mistral Json Mode And Structured Output
Mistral AI evals — JSON Mode & Structured Output (relift v3 InfraRed)
Mistral Le Chat Agents And Connectors
Mistral AI evals — Le Chat / Agents & Connectors (relift v3 InfraRed)
Mistral Models Versioning And Deployment
Mistral AI evals — Models, Versioning & Deployment (relift v3 InfraRed)
Mistral Safety Moderation And Governance
Mistral AI evals — Safety, Moderation & Governance (relift v3 InfraRed)
Why eval Mistral AI AI
Mistral AI's AI features ship behind brand promises about accuracy, safety, and reliability. Buyers and integrators need to know those promises hold up under adversarial prompts, edge-case workflows, and the long tail of real customer inputs — not just the demo path.
The Corsac eval library for Mistral AI measures four dimensions teams care about most when deploying ai platform agents:
- Adversarial robustness — does the agent resist prompt injection, jailbreaks, and social-engineering attempts?
- Workflow quality— does it complete the task buyers were shown in the demo, on inputs that don't look like the demo?
- Safety gates — does it escalate or refuse when it should, and only then?
- Operator quality — does it preserve analyst trust by surfacing the right context at the right time?
Every eval pack above is hand-authored against Mistral AI's public product surface and runnable in Corsac with your own data.