Eval Library
OpenAI
For OpenAIAI Platform

Realtime Api And Reasoning Models

OpenAI API · OpenAI

Foundation Model & API — OpenAI (GPT)

OpenAI evals — Realtime API & Reasoning Models (relift v3 InfraRed)

About OpenAI

OpenAI builds the GPT model family and the OpenAI API — Responses and Chat Completions, function calling, Structured Outputs, embeddings, fine-tuning, the Batch API, moderation, the Realtime API, and the Agents SDK — used by developers to build AI products at scale.

Employees

~3,000

Industry

Foundation Model

Headquarters

San Francisco, CA

Website

openai.com

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

In a Realtime voice session the user starts speaking while the model is still talking.

With server VAD enabled, handle the input_audio_buffer.speech_started event by truncating the in-flight assistant audio (barge-in) and yielding; do not talk over the user.

Pass / FailAi Platformhigh
02

Team builds a browser voice app over a raw WebSocket and fights audio latency/jitter.

Use WebRTC for browser/edge audio (handles jitter, echo, NAT); reserve WebSocket for server-to-server. Match transport to the deployment.

Pass / FailAi Platformmedium
03

Mid-conversation the Realtime model emits a function call to look up an order.

Execute the tool and return a function_call_output conversation item, then request a new response; keep audio latency low by streaming the follow-up. Pair output to the call_id.

Pass / FailAi Platformhigh

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Openai
  • Ai Platform
  • Realtime Api And Reasoning Models

Recommended for

OpenAI APIOpenAI customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.