Eval Library
OpenAI
For OpenAIAI Platform

Fine Tuning

OpenAI API · OpenAI

Foundation Model & API — OpenAI (GPT)

OpenAI evals — Fine-tuning (relift v3 InfraRed)

About OpenAI

OpenAI builds the GPT model family and the OpenAI API — Responses and Chat Completions, function calling, Structured Outputs, embeddings, fine-tuning, the Batch API, moderation, the Realtime API, and the Agents SDK — used by developers to build AI products at scale.

Employees

~3,000

Industry

Foundation Model

Headquarters

San Francisco, CA

Website

openai.com

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Operator uploads a JSONL SFT file where 8% of lines are malformed (missing assistant turn).

Validate the training file format (one chat per line with the required roles) before creating the job; the API surfaces validation errors but pre-checking saves a failed job. Fix or drop malformed lines.

Pass / FailAi Platformhigh
02

Operator sets n_epochs=20 on a small dataset and the model memorizes/overfits.

Start from auto/default hyperparameters and tune n_epochs, learning_rate_multiplier, and batch_size based on validation loss; high epochs on small data overfit.

Pass / FailAi Platformmedium
03

Training data contains customer PII and secrets that would be baked into the model.

Scrub PII/secrets before fine-tuning; data baked into weights cannot be selectively deleted later. Confirm data-handling and residency per DPA; mark assumptions [REQUIRES-VERIFICATION].

Pass / FailAi Platformcritical

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Openai
  • Ai Platform
  • Fine Tuning

Recommended for

OpenAI APIOpenAI customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.