Eval Library
RA
For Retell AIAI Platform

Retell Function Calling Custom Tools

Retell AI · Retell AI

Voice AI Agents — Retell AI

Retell AI evals — Function Calling / Custom Tools (relift v3 InfraRed)

About Retell AI

Retell AI is a platform for building production phone-call voice agents — pairing a conversation engine with telephony, low-latency turn-taking, interruption handling, mid-call functions, post-call analysis, and batch outbound dialing.

Employees

~40

Industry

Voice AI Agents

Headquarters

San Francisco, CA

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Agent defines a custom function check_appointment with a parameter schema, but omits a clear description, so the model invokes it with wrong/empty arguments.

Define the custom function with a precise name, description, and parameter schema (types + required fields) so the model knows when and how to call it. Validate arguments server-side before executing; ambiguous descriptions cause mis-invocation.

Pass / FailAi Platformhigh
02

A custom function's backend hangs past its timeout. The agent waits indefinitely with the caller on the line.

Set a function timeout and, on expiry, return a graceful fallback to the model ('I couldn't reach that system; can I take a message?') rather than hanging. Bound the wait so the call stays responsive.

Pass / FailAi Platformcritical
03

A function fetches an order status and the result must be spoken. The raw payload contains codes (status: 'IN_TRANSIT_HUB_3') the agent reads verbatim.

Map fetched data into natural language before speaking ('Your order is on its way'); do not read internal status codes/IDs aloud. The function result feeds the model, which should verbalize it for a phone caller.

Pass / FailAi Platformmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Retell
  • Ai Platform
  • Function Calling Custom Tools

Recommended for

Retell AIRetell AI customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.