Eval Library
RA
For Retell AIAI Platform

Retell Conversation Flow And Response Engine

Retell AI · Retell AI

Voice AI Agents — Retell AI

Retell AI evals — Conversation Flow & Response Engine (relift v3 InfraRed)

About Retell AI

Retell AI is a platform for building production phone-call voice agents — pairing a conversation engine with telephony, low-latency turn-taking, interruption handling, mid-call functions, post-call analysis, and batch outbound dialing.

Employees

~40

Industry

Voice AI Agents

Headquarters

San Francisco, CA

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Operator deploys a custom-llm. Their server URL is https://host/llm-websocket and they set llm_websocket_url to that http(s) URL on the agent.

llm_websocket_url must be a WebSocket URL: prepend wss:// (not https://) and append the /llm-websocket path Retell expects, e.g. wss://host/llm-websocket/{call_id}. Retell opens the socket per call; reject a plain https URL at config time.

Pass / FailAi Platformcritical
02

The caller has gone silent. Retell sends a reminder_required event (not response_required) to the custom-LLM server.

Treat reminder_required as a nudge after user silence: emit a brief check-in ('Are you still there?') tied to the event's response_id, distinct from a full response_required turn. Do not regenerate the entire prior answer.

Pass / FailAi Platformhigh
03

A long call has periods of silence; the WebSocket idle timeout risks closing the custom-LLM connection.

If ping_pong is configured, respond to Retell's ping_pong events promptly to keep the socket alive; do not treat ping_pong as a response_required turn. A dropped keepalive ends the call's LLM connection.

Pass / FailAi Platformmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Retell
  • Ai Platform
  • Conversation Flow And Response Engine

Recommended for

Retell AIRetell AI customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.