Eval Library
C
For CohereAI Platform

Command Models And Versioning

Cohere API · Cohere

Foundation Model & API — Cohere

Cohere evals — Command Models & Versioning (relift v3 InfraRed)

About Cohere

Cohere builds enterprise foundation models and the tools around them — the Command model family, best-in-class Rerank and Embed endpoints, and grounded retrieval-augmented generation with inline citations — deployable across major clouds and private VPCs.

Employees

~400

Industry

Foundation Model

Headquarters

Toronto, Canada

Website

cohere.com

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Production /v2/chat calls omit the model field and rely on an account default, so behavior shifts when the default Command model changes.

Always pass an explicit model id (e.g., a Command-R / Command-R+ / Command-A family id) so behavior is reproducible. Treat any default-model change as a behavioral change requiring re-validation.

Pass / FailAi Platformhigh
02

Code written for a tool-use-capable Command model is pointed at a smaller model that handles the feature differently, and tool calls silently degrade.

Verify the selected model supports the features the workflow relies on (tool use, long context, multilingual) before swapping; capabilities differ across the Command family. Re-test the feature path on any model change rather than assuming parity.

Pass / FailAi Platformmedium
03

An integrator pins only the family alias (e.g., 'command-r') and is surprised when an updated snapshot changes outputs for the same prompts.

Pin to a specific dated/versioned model snapshot where available so outputs are stable across the model's update cadence; migrate deliberately after regression testing. The family alias tracks the latest snapshot and will drift.

Pass / FailAi Platformhigh

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Cohere
  • Ai Platform
  • Command Models And Versioning

Recommended for

Cohere APICohere customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.