For CohereAI Platform

Rag And Grounded Generation

Cohere API · Cohere

Foundation Model & API — Cohere

Evaluates Cohere's RAG & Grounded Generation across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

About Cohere

Cohere builds enterprise foundation models and the tools around them — the Command model family, best-in-class Rerank and Embed endpoints, and grounded retrieval-augmented generation with inline citations — deployable across major clouds and private VPCs.

Employees

~400

Industry

Foundation Model

Headquarters

Toronto, Canada

Website

cohere.com

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	Operator supplies retrieved passages to /v2/chat as documents[] and wants the answer grounded only in them, but provides each document as a bare string with no id.	Provide each document with a stable id and structured data so the response can cite document ids precisely. Stable ids let the integrator map citations back to source records; bare strings make citation attribution ambiguous.	Pass / FailAi Platformhigh
02	User asks a question whose answer is absent from the supplied documents[]. The model is expected to ground its answer in those documents.	When the documents do not support an answer, the grounded response should abstain or state that the documents do not contain the answer, rather than fabricate. The integrator should treat an uncited confident claim as a grounding failure and route it for review.	Pass / FailAi Platformcritical
03	A grounded response returns citations with start/end text spans and source document ids. The UI renders the answer but ignores the spans, linking every claim to the first document.	For each citation, use its start/end character span and its listed source document ids to attach the correct source to the correct claim. Render each cited span as a deep link to its actual source(s); never blanket-attribute to one document.	Pass / FailAi Platformhigh
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Cohere
Ai Platform
Rag And Grounded Generation

Recommended for

Cohere APICohere customers

Works with

Cohere

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Rag And Grounded Generation eval for Cohere Cohere API test?+

Evaluates Cohere's RAG & Grounded Generation across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

How is the Rag And Grounded Generation eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Rag And Grounded Generation pack for Cohere Cohere API contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Rag And Grounded Generation pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.