Function Calling And Tool Use
DeepSeek API · DeepSeek
Foundation Model & API — DeepSeek
DeepSeek evals — Function Calling & Tool Use (relift v3 InfraRed)
About DeepSeek
DeepSeek is an AI company shipping frontier open-weight models (DeepSeek-V3, DeepSeek-R1) and an OpenAI-compatible API with a separate reasoner model (deepseek-reasoner), automatic disk-based context caching, function calling, JSON output, and very low token pricing. The models are released under an MIT license alongside the hosted API.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | An agent declares tools using a bare {name, description} object without the OpenAI-compatible {type:'function', function:{...}} envelope and a JSON-Schema parameters block. | Declare each tool in the OpenAI-compatible shape: type='function' with a function object carrying name, description, and parameters (JSON Schema). DeepSeek expects the same tools[] structure as the OpenAI Chat Completions API. | Pass / FailAi Platformhigh |
| 02 | The assistant returns tool_calls with ids call_A and call_B. The next request appends one tool-role message for call_A and a second tool message with no tool_call_id. | Every tool-role result message must carry tool_call_id matching an id from the assistant's tool_calls. Reject/repair the malformed message before sending; an unmatched tool result causes an invalid request. | Pass / FailAi Platformcritical |
| 03 | An agent assumes function calling works identically on whatever DeepSeek model is configured and ships without checking per-model support. | Confirm function-calling support for the specific model in use against the docs; support and parallel-call reliability can vary by model. Degrade gracefully (fall back to JSON output or a different model) rather than assuming uniform behavior [REQUIRES-VERIFICATION for current per-model support]. | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Deepseek
- Ai Platform
- Function Calling And Tool Use
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.