Eval Library
OpenAI
For OpenAIAI PlatformAnswer Relevance

Embeddings And Retrieval

OpenAI API · OpenAI

Foundation Model & API — OpenAI (GPT)

OpenAI evals — Embeddings & Retrieval (relift v3 InfraRed)

About OpenAI

OpenAI builds the GPT model family and the OpenAI API — Responses and Chat Completions, function calling, Structured Outputs, embeddings, fine-tuning, the Batch API, moderation, the Realtime API, and the Agents SDK — used by developers to build AI products at scale.

Employees

~3,000

Industry

Foundation Model

Headquarters

San Francisco, CA

Website

openai.com

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Team uses text-embedding-3-large but sets dimensions=256 to save vector-store cost, then compares to vectors stored at full dimension.

All vectors in an index must share the same model and dimensions; re-embed the whole corpus when changing dimensions. Mixing dimensions makes cosine similarity meaningless.

Pass / FailAi Platformcritical
02

Retrieval ranks by dot product over un-normalized embeddings and gets inconsistent ordering.

OpenAI embeddings are normalized to length 1, so cosine == dot product; if the vector store re-scales or you mix sources, normalize consistently. Verify the distance metric matches the store config.

Pass / FailAi Platformmedium
03

Org migrates embedding model; old and new vectors coexist during backfill.

Version the index by model; serve queries only against the matching-model partition until backfill completes, then cut over atomically. Never compare cross-model vectors.

Pass / FailAi Platformhigh

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Openai
  • Ai Platform
  • Embeddings And Retrieval

Recommended for

OpenAI APIOpenAI customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.