Tools And Mid Call Function Calling
Vapi · Vapi
Voice AI Orchestration — Vapi
Vapi evals — Tools / Function Calling Mid-call (relift v3 InfraRed)
About Vapi
Vapi is a voice-AI orchestration platform that wires speech-to-text, an LLM, and text-to-speech into low-latency phone and web voice agents, with interruption handling, mid-call function calling, transfers, recordings, and telephony routing.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Operator declares a tool with name='check_balance' but no parameters JSON schema. Model emits tool calls with random argument shapes. | Define every tool with the documented OpenAI-style schema {name, description, parameters: {type:'object', properties, required}}. The model uses the JSON Schema to format arguments; without it the arg JSON is ad-hoc and breaks serverUrl handlers. | Pass / FailAi Platformcritical |
| 02 | Tool 'send_followup_sms' is long-running and the operator does not want to block the assistant. They mark it async:true. | async:true makes Vapi not wait for a result before continuing the conversation. Operator's serverUrl still returns 2xx promptly and runs the work in background. Do not use async:true for tools whose result must be spoken — the assistant won't see the response. | Pass / FailAi Platformhigh |
| 03 | Customer asks 'what's my balance?' Tool round-trips in 1.5s and customer hears dead air while the assistant waits. | Set blocking:false (the default) and configure messages with type 'request-start' so the assistant speaks a filler phrase ('one sec, checking that') during tool execution. Use 'request-complete' and 'request-failed' messages to bracket the result. | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Vapi
- Ai Platform
- Tools And Mid Call Function Calling
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.