Eval Library
V
For VapiAI Platform

Telephony And Phone Number Import

Vapi · Vapi

Voice AI Orchestration — Vapi

Vapi evals — Telephony & Phone-Number Import (relift v3 InfraRed)

About Vapi

Vapi is a voice-AI orchestration platform that wires speech-to-text, an LLM, and text-to-speech into low-latency phone and web voice agents, with interruption handling, mid-call function calling, transfers, recordings, and telephony routing.

Employees

~50

Industry

Voice AI Orchestration

Headquarters

San Francisco, CA

Website

vapi.ai

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Operator imports a Twilio number into Vapi via POST /phone-number with provider='twilio', twilioAccountSid='AC...', twilioAuthToken='...'. Calls ring at Twilio but never reach Vapi.

Importing the number does not auto-rewrite the Twilio number's Voice configuration. Confirm that Twilio's Voice → A Call Comes In is pointed at Vapi's SIP/webhook endpoint as documented. Vapi's import flow surfaces the endpoint to configure; the operator must update Twilio side too.

Pass / FailAi Platformcritical
02

Operator imports a Telnyx number expecting feature parity with Twilio. Warm transferCall behaves differently.

Per docs, Twilio and Telnyx are both supported but feature parity (warm transfer semantics, voicemail AMD, SIP REFER handling) varies by carrier [REQUIRES-VERIFICATION]. Validate every telephony feature you depend on against the chosen carrier — do not assume Twilio behavior extrapolates.

Pass / FailAi Platformhigh
03

Post-call analytics groups outcomes by artifact.endedReason. Reasons like 'customer-ended-call', 'assistant-ended-call', 'silence-timed-out', and 'pipeline-error' get bucketed as 'other.'

Read artifact.endedReason and branch on the documented values — do not lump into 'other.' At minimum split customer hangup vs assistant hangup vs timeout vs pipeline error so QA can distinguish a satisfied caller from a dropped call.

Pass / FailAi Platformmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Vapi
  • Ai Platform
  • Telephony And Phone Number Import

Recommended for

VapiVapi customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.