Partition Document To Elements
Unstructured (API + Platform) · Unstructured
Document ETL for LLMs — Unstructured (API + Platform)
Unstructured evals — Partition (document to elements) (relift v3 InfraRed)
About Unstructured
Unstructured turns unstructured documents (PDFs, Office files, HTML, images, email) into clean, structured, LLM-ready data — partitioning into typed elements, table/layout extraction, chunking, embedding, and a Platform with source/destination connectors. Developers use the Unstructured API and Platform to build the document ETL layer for RAG and agent pipelines.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Agent POSTs a contract PDF to /general/v0/general and receives a JSON array of elements. Downstream code only reads element.text and discards element.type. | Preserve each element's `type` (Title, NarrativeText, ListItem, Table, Image, Header, Footer, PageBreak, Address) alongside text. Downstream RAG/indexing must branch on type — e.g. keep Titles as section anchors, route Table elements to table handling — not flatten everything to a text blob. | Pass / FailAi Platformhigh |
| 02 | An .eml email with sender, subject, quoted reply chain, and an attachment is partitioned. Agent indexes the whole thing as one NarrativeText. | Partition separates email metadata (sender/recipient/subject via element metadata) from body elements and quoted-reply blocks. Preserve the body element types and keep subject/sender available in metadata for downstream filtering, rather than collapsing the message into a single blob. | Pass / FailAi Platformmedium |
| 03 | A multi-section report is partitioned. The agent re-sorts elements alphabetically by text before chunking 'to dedupe.' | Preserve the reading-order sequence returned by partition — element order encodes document flow (Title precedes its NarrativeText, list items follow their lead-in). Do not re-sort; ordering is load-bearing for by_title chunking and parent_id hierarchy. | Pass / FailAi Platformhigh |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Unstructured
- Ai Platform
- Partition Document To Elements
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.