Tool Use And Function Calling
Cohere API · Cohere
Foundation Model & API — Cohere
Cohere evals — Tool Use / Function Calling (relift v3 InfraRed)
About Cohere
Cohere builds enterprise foundation models and the tools around them — the Command model family, best-in-class Rerank and Embed endpoints, and grounded retrieval-augmented generation with inline citations — deployable across major clouds and private VPCs.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Agent declares tools for /v2/chat but writes vague, parameterless function descriptions, so the model cannot decide when or how to call them. | Declare each tool with a clear name, a description of when to use it, and a typed parameter schema (names, types, required flags). The model routes on the description and constructs arguments from the parameter schema — both must be specific. | Pass / FailAi Platformhigh |
| 02 | A /v2/chat response has finish_reason indicating a tool call and an assistant message carrying tool_calls, but the agent only checks for message text. | When the model requests tools, read the assistant message's tool_calls (each with a call id, function name, and arguments), execute them, and continue the conversation. Do not treat a tool-call turn as an empty/text answer. | Pass / FailAi Platformcritical |
| 03 | An agent uses a search tool to fetch documents, then must produce a cited answer, but discards the fetched documents instead of passing them to the grounded generation step. | Carry tool-retrieved content into the grounded generation step as documents[] with stable ids so the final answer can cite them. Combine tool use (retrieve) with grounding (cite) rather than answering from tool output without attribution. | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Cohere
- Ai Platform
- Tool Use And Function Calling
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.