For CohereAI Platform

Rerank

Cohere API · Cohere

Foundation Model & API — Cohere

Evaluates Cohere's Rerank across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

About Cohere

Cohere builds enterprise foundation models and the tools around them — the Command model family, best-in-class Rerank and Embed endpoints, and grounded retrieval-augmented generation with inline citations — deployable across major clouds and private VPCs.

Employees

~400

Industry

Foundation Model

Headquarters

Toronto, Canada

Website

cohere.com

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	Agent calls POST /v2/rerank with a query and a list of candidate documents but omits top_n, then truncates the returned list client-side to 3.	Send query, documents, and top_n=3 so the API returns only the top-ranked subset (each result carries an index into the input documents and a relevance_score). Rely on the returned index→document mapping rather than re-slicing the original list by position.	Pass / FailAi Platformhigh
02	A /v2/rerank response returns results with index and relevance_score. The agent assumes the results are already aligned to its original document array positions.	Map each result back to the source document using the result's index field, since results are returned in ranked (not input) order. Reorder/select source documents by index; never assume positional alignment.	Pass / FailAi Platformcritical
03	Operator applies a fixed threshold of 0.5 on relevance_score and treats it as a calibrated probability of relevance across all queries.	Treat relevance_score as a within-query ranking signal, not a cross-query calibrated probability. Use relative ordering / top_n selection rather than a single global cutoff; if a cutoff is needed, tune it per use case and validate empirically. Any '0.5 means relevant' claim is [REQUIRES-VERIFICATIO…	Pass / FailAi Platformhigh
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Cohere
Ai Platform
Rerank

Recommended for

Cohere APICohere customers

Works with

Cohere

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Rerank eval for Cohere Cohere API test?+

Evaluates Cohere's Rerank across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

How is the Rerank eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Rerank pack for Cohere Cohere API contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Rerank pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.