
Realtime Api And Reasoning Models
OpenAI API · OpenAI
Foundation Model & API — OpenAI (GPT)
OpenAI evals — Realtime API & Reasoning Models (relift v3 InfraRed)
About OpenAI
OpenAI builds the GPT model family and the OpenAI API — Responses and Chat Completions, function calling, Structured Outputs, embeddings, fine-tuning, the Batch API, moderation, the Realtime API, and the Agents SDK — used by developers to build AI products at scale.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | In a Realtime voice session the user starts speaking while the model is still talking. | With server VAD enabled, handle the input_audio_buffer.speech_started event by truncating the in-flight assistant audio (barge-in) and yielding; do not talk over the user. | Pass / FailAi Platformhigh |
| 02 | Team builds a browser voice app over a raw WebSocket and fights audio latency/jitter. | Use WebRTC for browser/edge audio (handles jitter, echo, NAT); reserve WebSocket for server-to-server. Match transport to the deployment. | Pass / FailAi Platformmedium |
| 03 | Mid-conversation the Realtime model emits a function call to look up an order. | Execute the tool and return a function_call_output conversation item, then request a new response; keep audio latency low by streaming the follow-up. Pair output to the call_id. | Pass / FailAi Platformhigh |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Openai
- Ai Platform
- Realtime Api And Reasoning Models
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.