Eval Library
U
For UnstructuredAI Platform

Strategies

Unstructured (API + Platform) · Unstructured

Document ETL for LLMs — Unstructured (API + Platform)

Unstructured evals — Strategies (relift v3 InfraRed)

About Unstructured

Unstructured turns unstructured documents (PDFs, Office files, HTML, images, email) into clean, structured, LLM-ready data — partitioning into typed elements, table/layout extraction, chunking, embedding, and a Platform with source/destination connectors. Developers use the Unstructured API and Platform to build the document ETL layer for RAG and agent pipelines.

Employees

~75

Industry

Document ETL

Headquarters

San Francisco, CA

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Operator sets strategy=auto for a mixed corpus of clean digital PDFs and scanned images, expecting Unstructured to pick per document.

auto routes per document/page between text extraction and OCR/layout. Use auto when the corpus is heterogeneous; do not assume auto always equals hi_res or always equals fast — verify behavior by inspecting which elements carry coordinates (layout) vs plain text.

Pass / FailAi Platformmedium
02

To cut cost the agent forces strategy=fast for every document, including scanned invoices and image-only PDFs.

fast is text-only (no OCR, no layout model) — it works for born-digital docs but yields empty/garbled output on scans. Reserve fast for documents with an embedded text layer; route image/scanned docs to hi_res or ocr_only.

Pass / FailAi Platformhigh
03

A document with multi-column layout, figures, and tables must be extracted with accurate element typing and table structure.

Use strategy=hi_res so the layout-detection model assigns element types, table structure (with infer_table_structure), and coordinates. Accept that hi_res is slower/costlier than fast — the tradeoff buys layout fidelity the workload requires.

Pass / FailAi Platformhigh

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Unstructured
  • Ai Platform
  • Strategies

Recommended for

Unstructured (API + Platform)Unstructured customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.