Eval Library
OpenAI
For OpenAIAI Platform

Batch Api

OpenAI API · OpenAI

Foundation Model & API — OpenAI (GPT)

OpenAI evals — Batch API (relift v3 InfraRed)

About OpenAI

OpenAI builds the GPT model family and the OpenAI API — Responses and Chat Completions, function calling, Structured Outputs, embeddings, fine-tuning, the Batch API, moderation, the Realtime API, and the Agents SDK — used by developers to build AI products at scale.

Employees

~3,000

Industry

Foundation Model

Headquarters

San Francisco, CA

Website

openai.com

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Operator submits 20k requests via a JSONL file to /v1/batches, each line a request with a custom_id matching a dataset row.

Each input line needs a unique custom_id, a method, a url (/v1/responses or /v1/chat/completions), and a body. custom_id is the only mapping back to source rows; pick a stable unique value.

Pass / FailAi Platformcritical
02

Batch status moves in_progress → completed within a 24h window; agent polls every second.

Poll the batch object with backoff; rely on status terminal values (completed/failed/expired/cancelled). Do not assume sub-hour completion or hammer the endpoint.

Pass / FailAi Platformmedium
03

Completed batch exposes an output_file_id and an error_file_id; agent reads only the output file.

Download both files: output_file_id for succeeded lines and error_file_id for failed ones, joining by custom_id. Lines absent from output appear in errors.

Pass / FailAi Platformhigh

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Openai
  • Ai Platform
  • Batch Api

Recommended for

OpenAI APIOpenAI customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.