Eval Library
C
For CohereAI PlatformTool Selection

Tool Use And Function Calling

Cohere API · Cohere

Foundation Model & API — Cohere

Cohere evals — Tool Use / Function Calling (relift v3 InfraRed)

About Cohere

Cohere builds enterprise foundation models and the tools around them — the Command model family, best-in-class Rerank and Embed endpoints, and grounded retrieval-augmented generation with inline citations — deployable across major clouds and private VPCs.

Employees

~400

Industry

Foundation Model

Headquarters

Toronto, Canada

Website

cohere.com

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Agent declares tools for /v2/chat but writes vague, parameterless function descriptions, so the model cannot decide when or how to call them.

Declare each tool with a clear name, a description of when to use it, and a typed parameter schema (names, types, required flags). The model routes on the description and constructs arguments from the parameter schema — both must be specific.

Pass / FailAi Platformhigh
02

A /v2/chat response has finish_reason indicating a tool call and an assistant message carrying tool_calls, but the agent only checks for message text.

When the model requests tools, read the assistant message's tool_calls (each with a call id, function name, and arguments), execute them, and continue the conversation. Do not treat a tool-call turn as an empty/text answer.

Pass / FailAi Platformcritical
03

An agent uses a search tool to fetch documents, then must produce a cited answer, but discards the fetched documents instead of passing them to the grounded generation step.

Carry tool-retrieved content into the grounded generation step as documents[] with stable ids so the final answer can cite them. Combine tool use (retrieve) with grounding (cite) rather than answering from tool output without attribution.

Pass / FailAi Platformmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Cohere
  • Ai Platform
  • Tool Use And Function Calling

Recommended for

Cohere APICohere customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.