Eval Library
OpenAI
For OpenAIAI Platform

Function Calling And Tool Orchestration

OpenAI API · OpenAI

Foundation Model & API — OpenAI (GPT)

OpenAI evals — Function Calling & Tool Orchestration (relift v3 InfraRed)

About OpenAI

OpenAI builds the GPT model family and the OpenAI API — Responses and Chat Completions, function calling, Structured Outputs, embeddings, fine-tuning, the Batch API, moderation, the Realtime API, and the Agents SDK — used by developers to build AI products at scale.

Employees

~3,000

Industry

Foundation Model

Headquarters

San Francisco, CA

Website

openai.com

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Workflow requires the next turn to call some tool (no plain text); operator wants to force a tool call.

Set tool_choice='required' so the model must call one of the provided tools. To force a specific tool, use tool_choice={type:'function',function:{name:...}}. Handle the returned tool_calls[].

Pass / FailAi Platformhigh
02

Assistant turn returns two tool_calls (get_weather SF, get_weather NYC), each with a distinct id; agent executes both.

Return one role=tool message per tool_call_id, matched exactly by id, before the next model turn. Every tool_call must be answered.

Pass / FailAi Platformcritical
03

Two tools: search_orders (by order id) and search_customers (by email). User asks 'find the order for jane@acme.com'.

Descriptions must encode routing — search_orders should state it needs an order id, search_customers that it resolves a customer by email first. Verify the model can chain customer→order rather than guessing.

Pass / FailAi Platformmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Openai
  • Ai Platform
  • Function Calling And Tool Orchestration

Recommended for

OpenAI APIOpenAI customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.