Mistral Chat Completions And Streaming
Mistral AI API · Mistral AI
Foundation Model & API — Mistral AI
Mistral AI evals — Chat Completions & Streaming (relift v3 InfraRed)
About Mistral AI
Mistral AI is a European foundation-model company offering open-weight and commercial models (Mistral Large, Codestral, Pixtral) via La Plateforme, plus Le Chat, embeddings, fine-tuning, and agents — with a strong emphasis on EU data residency.
Sample tests· showing 3 of 10
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Agent streams /v1/chat/completions with stream=true and max_tokens=256 for a long answer; the final SSE chunk reports finish_reason='length'. | Detect finish_reason='length' on the terminal chunk and treat the answer as truncated — surface a partial-completion to the caller or continue by appending the assistant text and re-requesting. Never present truncated output as complete. | Pass / FailAi Platformcritical |
| 02 | Agent consumes the SSE stream from /v1/chat/completions and stops reading at the first empty data line instead of the terminal 'data: [DONE]' marker. | Parse the SSE stream until the explicit 'data: [DONE]' sentinel; accumulate delta.content across chunks and only finalize on the terminator. Do not treat keep-alive or blank lines as end-of-stream. | Pass / FailAi Platformhigh |
| 03 | Operator sets stop=['\n\nUser:'] expecting generation to halt before the next turn marker; the returned text still contains the marker. | Configure stop sequences correctly and verify finish_reason='stop'; the stop string itself is not included in the output, so do not also strip it manually and corrupt the text. Confirm the marker matches exactly. | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Mistral
- Ai Platform
- Chat Completions And Streaming
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.