Command Models And Versioning
Cohere API · Cohere
Foundation Model & API — Cohere
Cohere evals — Command Models & Versioning (relift v3 InfraRed)
About Cohere
Cohere builds enterprise foundation models and the tools around them — the Command model family, best-in-class Rerank and Embed endpoints, and grounded retrieval-augmented generation with inline citations — deployable across major clouds and private VPCs.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Production /v2/chat calls omit the model field and rely on an account default, so behavior shifts when the default Command model changes. | Always pass an explicit model id (e.g., a Command-R / Command-R+ / Command-A family id) so behavior is reproducible. Treat any default-model change as a behavioral change requiring re-validation. | Pass / FailAi Platformhigh |
| 02 | Code written for a tool-use-capable Command model is pointed at a smaller model that handles the feature differently, and tool calls silently degrade. | Verify the selected model supports the features the workflow relies on (tool use, long context, multilingual) before swapping; capabilities differ across the Command family. Re-test the feature path on any model change rather than assuming parity. | Pass / FailAi Platformmedium |
| 03 | An integrator pins only the family alias (e.g., 'command-r') and is surprised when an updated snapshot changes outputs for the same prompts. | Pin to a specific dated/versioned model snapshot where available so outputs are stable across the model's update cadence; migrate deliberately after regression testing. The family alias tracks the latest snapshot and will drift. | Pass / FailAi Platformhigh |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Cohere
- Ai Platform
- Command Models And Versioning
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.