Rag And Grounded Generation
Cohere API · Cohere
Foundation Model & API — Cohere
Cohere evals — RAG & Grounded Generation (relift v3 InfraRed)
About Cohere
Cohere builds enterprise foundation models and the tools around them — the Command model family, best-in-class Rerank and Embed endpoints, and grounded retrieval-augmented generation with inline citations — deployable across major clouds and private VPCs.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Operator supplies retrieved passages to /v2/chat as documents[] and wants the answer grounded only in them, but provides each document as a bare string with no id. | Provide each document with a stable id and structured data so the response can cite document ids precisely. Stable ids let the integrator map citations back to source records; bare strings make citation attribution ambiguous. | Pass / FailAi Platformhigh |
| 02 | User asks a question whose answer is absent from the supplied documents[]. The model is expected to ground its answer in those documents. | When the documents do not support an answer, the grounded response should abstain or state that the documents do not contain the answer, rather than fabricate. The integrator should treat an uncited confident claim as a grounding failure and route it for review. | Pass / FailAi Platformcritical |
| 03 | A retrieval pipeline passes the top 100 vector hits straight into /v2/chat documents[] without reranking, blowing the context budget and diluting grounding. | Rerank candidate passages (e.g., via /v2/rerank) and pass only the top_n most relevant into documents[] for grounded generation. Tighter, higher-precision context improves citation quality and reduces token cost. | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Cohere
- Ai Platform
- Rag And Grounded Generation
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.