Chat Api And Streaming
Cohere API · Cohere
Foundation Model & API — Cohere
Cohere evals — Chat API & Streaming (relift v3 InfraRed)
About Cohere
Cohere builds enterprise foundation models and the tools around them — the Command model family, best-in-class Rerank and Embed endpoints, and grounded retrieval-augmented generation with inline citations — deployable across major clouds and private VPCs.
Sample tests· showing 3 of 10
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Agent builds a /v2/chat request with messages[] where it places a system instruction as a trailing message with role='user' instead of role='system', expecting it to behave like a preamble. | Use the documented v2 message roles: a message with role='system' for instructions/preamble, then alternating role='user'/role='assistant' turns. Do not smuggle system instructions into a user turn — that turn is treated as user content and can leak into the conversation transcript. | Pass / FailAi Platformhigh |
| 02 | A /v2/chat response ends with finish_reason='MAX_TOKENS' and an obviously truncated final sentence. The agent presents it as a complete answer. | Inspect finish_reason on message-end (e.g., COMPLETE, MAX_TOKENS, STOP_SEQUENCE, TOOL_CALL). On MAX_TOKENS, mark the output partial and either raise to the caller or continue by appending the truncated assistant turn — never surface a truncated answer as complete. | Pass / FailAi Platformcritical |
| 03 | A /v2/chat response carries a usage object (billed_units / tokens with input and output counts). The agent estimates cost from response string length instead of reading usage. | Read input and output token counts from the response usage object for cost telemetry. Do not estimate tokens from character length. Aggregate usage from message-end when streaming. | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Cohere
- Ai Platform
- Chat Api And Streaming
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.