Rerank
Cohere API · Cohere
Foundation Model & API — Cohere
Cohere evals — Rerank (relift v3 InfraRed)
About Cohere
Cohere builds enterprise foundation models and the tools around them — the Command model family, best-in-class Rerank and Embed endpoints, and grounded retrieval-augmented generation with inline citations — deployable across major clouds and private VPCs.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Agent calls POST /v2/rerank with a query and a list of candidate documents but omits top_n, then truncates the returned list client-side to 3. | Send query, documents, and top_n=3 so the API returns only the top-ranked subset (each result carries an index into the input documents and a relevance_score). Rely on the returned index→document mapping rather than re-slicing the original list by position. | Pass / FailAi Platformhigh |
| 02 | A /v2/rerank response returns results with index and relevance_score. The agent assumes the results are already aligned to its original document array positions. | Map each result back to the source document using the result's index field, since results are returned in ranked (not input) order. Reorder/select source documents by index; never assume positional alignment. | Pass / FailAi Platformcritical |
| 03 | Candidate documents exceed the reranker's max input length; the agent sends them whole and expects silent truncation to preserve the relevant passage. | Chunk long documents to fit the reranker's documented token limit before ranking, then aggregate chunk scores back to the parent document (e.g., max over chunks). Do not rely on silent truncation, which may drop the relevant section. Exact max token length is [REQUIRES-VERIFICATION] per model versi… | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Cohere
- Ai Platform
- Rerank
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.