Eval Library
U
For UnstructuredAI Platform

Chunking

Unstructured (API + Platform) · Unstructured

Document ETL for LLMs — Unstructured (API + Platform)

Unstructured evals — Chunking (relift v3 InfraRed)

About Unstructured

Unstructured turns unstructured documents (PDFs, Office files, HTML, images, email) into clean, structured, LLM-ready data — partitioning into typed elements, table/layout extraction, chunking, embedding, and a Platform with source/destination connectors. Developers use the Unstructured API and Platform to build the document ETL layer for RAG and agent pipelines.

Employees

~75

Industry

Document ETL

Headquarters

San Francisco, CA

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Operator uses chunking_strategy=by_title so chunks respect section structure, but the source was partitioned with fast and has few real Title elements.

by_title starts new chunks at Title/section boundaries — it depends on accurate Title detection, which comes from good partitioning (often hi_res). Verify Titles exist before relying on by_title; otherwise chunks degrade toward size-only splits.

Pass / FailAi Platformhigh
02

RAG recall is poor at chunk boundaries; the agent sets overlap to a very large value to compensate, ballooning the index.

Set overlap to a modest fraction of max_characters to preserve boundary context without exploding index size or duplicating content across many chunks. Tune against retrieval metrics rather than maximizing overlap blindly.

Pass / FailAi Platformmedium
03

With by_title, the operator expects sections to be capped at page boundaries but a section legitimately spans pages and gets split unexpectedly.

Control cross-page section behavior with multipage_sections: allow a by_title section to span pages when the document structure warrants, or restrict it to single pages when page locality matters. Choose deliberately rather than accepting the default blindly.

Pass / FailAi Platformlow

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Unstructured
  • Ai Platform
  • Chunking

Recommended for

Unstructured (API + Platform)Unstructured customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.