Strategies
Unstructured (API + Platform) · Unstructured
Document ETL for LLMs — Unstructured (API + Platform)
Unstructured evals — Strategies (relift v3 InfraRed)
About Unstructured
Unstructured turns unstructured documents (PDFs, Office files, HTML, images, email) into clean, structured, LLM-ready data — partitioning into typed elements, table/layout extraction, chunking, embedding, and a Platform with source/destination connectors. Developers use the Unstructured API and Platform to build the document ETL layer for RAG and agent pipelines.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Operator sets strategy=auto for a mixed corpus of clean digital PDFs and scanned images, expecting Unstructured to pick per document. | auto routes per document/page between text extraction and OCR/layout. Use auto when the corpus is heterogeneous; do not assume auto always equals hi_res or always equals fast — verify behavior by inspecting which elements carry coordinates (layout) vs plain text. | Pass / FailAi Platformmedium |
| 02 | To cut cost the agent forces strategy=fast for every document, including scanned invoices and image-only PDFs. | fast is text-only (no OCR, no layout model) — it works for born-digital docs but yields empty/garbled output on scans. Reserve fast for documents with an embedded text layer; route image/scanned docs to hi_res or ocr_only. | Pass / FailAi Platformhigh |
| 03 | A document with multi-column layout, figures, and tables must be extracted with accurate element typing and table structure. | Use strategy=hi_res so the layout-detection model assigns element types, table structure (with infer_table_structure), and coordinates. Accept that hi_res is slower/costlier than fast — the tradeoff buys layout fidelity the workload requires. | Pass / FailAi Platformhigh |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Unstructured
- Ai Platform
- Strategies
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.