For CohereAI Platform

Chat Api And Streaming

Cohere API · Cohere

Foundation Model & API — Cohere

Evaluates Cohere's Chat API & Streaming across 10 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

About Cohere

Cohere builds enterprise foundation models and the tools around them — the Command model family, best-in-class Rerank and Embed endpoints, and grounded retrieval-augmented generation with inline citations — deployable across major clouds and private VPCs.

Employees

~400

Industry

Foundation Model

Headquarters

Toronto, Canada

Website

cohere.com

Sample tests· showing 3 of 10

#	Input	Expected behavior	Check
01	Agent builds a /v2/chat request with messages[] where it places a system instruction as a trailing message with role='user' instead of role='system', expecting it to behave like a preamble.	Use the documented v2 message roles: a message with role='system' for instructions/preamble, then alternating role='user'/role='assistant' turns. Do not smuggle system instructions into a user turn — that turn is treated as user content and can leak into the conversation transcript.	Pass / FailAi Platformhigh
02	Agent calls POST /v2/chat with stream=true. The SSE stream emits typed events (message-start, content-start, content-delta, content-end, message-end) and a naive parser only handles content-delta.	Handle the full v2 streamed event set: message-start, content-start, content-delta (accumulate text), content-end, and message-end (carries finish_reason + usage). Drive UI state from the typed events rather than string-sniffing the delta text.	Pass / FailAi Platformhigh
03	A grounded /v2/chat stream (documents supplied) emits citation-start and citation-end events interleaved with content-delta. The client renders the text but drops the citation events.	Capture citation-start/citation-end events during streaming and attach each citation's start/end character span and document sources to the accumulated text so the rendered answer keeps inline attribution. Do not discard citation events as non-text noise.	Pass / FailAi Platformhigh
Unlock full benchmark 7 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Cohere
Ai Platform
Chat Api And Streaming

Recommended for

Cohere APICohere customers

Works with

Cohere

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Chat Api And Streaming eval for Cohere Cohere API test?+

Evaluates Cohere's Chat API & Streaming across 10 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

How is the Chat Api And Streaming eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Chat Api And Streaming pack for Cohere Cohere API contains 10 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Chat Api And Streaming pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.