Chunking
Unstructured (API + Platform) · Unstructured
Document ETL for LLMs — Unstructured (API + Platform)
Unstructured evals — Chunking (relift v3 InfraRed)
About Unstructured
Unstructured turns unstructured documents (PDFs, Office files, HTML, images, email) into clean, structured, LLM-ready data — partitioning into typed elements, table/layout extraction, chunking, embedding, and a Platform with source/destination connectors. Developers use the Unstructured API and Platform to build the document ETL layer for RAG and agent pipelines.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Operator uses chunking_strategy=by_title so chunks respect section structure, but the source was partitioned with fast and has few real Title elements. | by_title starts new chunks at Title/section boundaries — it depends on accurate Title detection, which comes from good partitioning (often hi_res). Verify Titles exist before relying on by_title; otherwise chunks degrade toward size-only splits. | Pass / FailAi Platformhigh |
| 02 | RAG recall is poor at chunk boundaries; the agent sets overlap to a very large value to compensate, ballooning the index. | Set overlap to a modest fraction of max_characters to preserve boundary context without exploding index size or duplicating content across many chunks. Tune against retrieval metrics rather than maximizing overlap blindly. | Pass / FailAi Platformmedium |
| 03 | With by_title, the operator expects sections to be capped at page boundaries but a section legitimately spans pages and gets split unexpectedly. | Control cross-page section behavior with multipage_sections: allow a by_title section to span pages when the document structure warrants, or restrict it to single pages when page locality matters. Choose deliberately rather than accepting the default blindly. | Pass / FailAi Platformlow |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Unstructured
- Ai Platform
- Chunking
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.