Function Calling And Tool Use
xAI API (Grok) · xAI
Foundation Model & API — xAI (Grok)
xAI evals — Function Calling & Tool Use (relift v3 InfraRed)
About xAI
xAI builds the Grok foundation-model family and the xAI API — OpenAI-compatible chat completions, function calling, Live Search / DeepSearch real-time web grounding, Grok Vision multimodal inputs, reasoning models with a thinking-effort budget, and Grok / Aurora image generation.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Agent declares tools=[{type:'function', function:{name, description, parameters: <JSON Schema>}}] on /v1/chat/completions and expects Grok to decide whether to call get_weather. | Pass tools[] in the OpenAI-compatible shape (type='function', nested function object with name/description/parameters). parameters MUST be a valid JSON Schema object. Distinct, action-oriented descriptions enable correct routing. | Pass / FailAi Platformcritical |
| 02 | User asks 'what time is it in Tokyo?' with get_time and lookup_news tools available. Agent debates tool_choice='auto' vs tool_choice='required'. | tool_choice='auto' (default when tools[] is set) lets Grok decide. Use tool_choice='required' only when downstream MUST call a tool. Use tool_choice={type:'function', function:{name:'get_time'}} to pin one tool. Do not pre-route by string-matching the user query. | Pass / FailAi Platformmedium |
| 03 | After executing get_inventory, the agent constructs a role='tool' message to report results back to Grok. | Send {role:'tool', tool_call_id:'<id>', content:'<string-or-JSON-string>'}. content must be a STRING — JSON-encode dict results. Do not send content as a raw dict (most clients reject; some normalize incorrectly). Keep responses concise to stay in context window budget. | Pass / FailAi Platformhigh |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Xai
- Ai Platform
- Function Calling And Tool Use
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.