Eval Library
U
For UnstructuredAI Platform

Embedding And Enrichment

Unstructured (API + Platform) · Unstructured

Document ETL for LLMs — Unstructured (API + Platform)

Unstructured evals — Embedding & Enrichment (relift v3 InfraRed)

About Unstructured

Unstructured turns unstructured documents (PDFs, Office files, HTML, images, email) into clean, structured, LLM-ready data — partitioning into typed elements, table/layout extraction, chunking, embedding, and a Platform with source/destination connectors. Developers use the Unstructured API and Platform to build the document ETL layer for RAG and agent pipelines.

Employees

~75

Industry

Document ETL

Headquarters

San Francisco, CA

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

An embedding transform is configured before chunking, so a 50-page document is embedded as a single oversized vector.

Place the embedding transform after partition + chunk so each chunk gets its own vector sized for the embedding model's context. Verify the embedded unit is a chunk (bounded by max_characters), not a whole document.

Pass / FailAi Platformhigh
02

The workflow's embedding model emits 1536-dim vectors but the destination vector index was created for 768 dims; writes silently fail or are rejected.

Match the embedding transform's output dimension to the destination index's configured dimension. On a model change, re-create/migrate the index for the new dimension and re-embed; do not write mismatched-dimension vectors. [REQUIRES-VERIFICATION for a given model's exact dimension].

Pass / FailAi Platformcritical
03

An NER enrichment tags entities, but the agent treats every extracted entity as ground truth for an automated downstream decision.

Treat NER output as model-generated annotations, not verified facts: record them as metadata with the enrichment provenance, keep human-in-the-loop for consequential decisions, and do not present extracted entities as authoritative without verification.

Pass / FailAi Platformmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Unstructured
  • Ai Platform
  • Embedding And Enrichment

Recommended for

Unstructured (API + Platform)Unstructured customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.