For OpenAIAI Platform

Batch Api

OpenAI API · OpenAI

Foundation Model & API — OpenAI (GPT)

Evaluates OpenAI's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

About OpenAI

OpenAI builds the GPT model family and the OpenAI API — Responses and Chat Completions, function calling, Structured Outputs, embeddings, fine-tuning, the Batch API, moderation, the Realtime API, and the Agents SDK — used by developers to build AI products at scale.

Employees

~3,000

Industry

Foundation Model

Headquarters

San Francisco, CA

Website

openai.com

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	Operator submits 20k requests via a JSONL file to /v1/batches, each line a request with a custom_id matching a dataset row.	Each input line needs a unique custom_id, a method, a url (/v1/responses or /v1/chat/completions), and a body. custom_id is the only mapping back to source rows; pick a stable unique value.	Pass / FailAi Platformcritical
02	Batch status moves in_progress → completed within a 24h window; agent polls every second.	Poll the batch object with backoff; rely on status terminal values (completed/failed/expired/cancelled). Do not assume sub-hour completion or hammer the endpoint.	Pass / FailAi Platformmedium
03	Completed batch exposes an output_file_id and an error_file_id; agent reads only the output file.	Download both files: output_file_id for succeeded lines and error_file_id for failed ones, joining by custom_id. Lines absent from output appear in errors.	Pass / FailAi Platformhigh
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Openai
Ai Platform
Batch Api

Recommended for

OpenAI APIOpenAI customers

Works with

OpenAI

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Batch Api eval for OpenAI OpenAI API test?+

Evaluates OpenAI's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

How is the Batch Api eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Batch Api pack for OpenAI OpenAI API contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Batch Api pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.