For UnstructuredAI Platform

Chunking

Unstructured (API + Platform) · Unstructured

Document ETL for LLMs — Unstructured (API + Platform)

Evaluates Unstructured's Chunking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Document ETL for LLMs eval coverage.

About Unstructured

Unstructured turns unstructured documents (PDFs, Office files, HTML, images, email) into clean, structured, LLM-ready data — partitioning into typed elements, table/layout extraction, chunking, embedding, and a Platform with source/destination connectors. Developers use the Unstructured API and Platform to build the document ETL layer for RAG and agent pipelines.

Employees

~75

Industry

Document ETL

Headquarters

San Francisco, CA

Website

unstructured.io

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	Operator uses chunking_strategy=by_title so chunks respect section structure, but the source was partitioned with fast and has few real Title elements.	by_title starts new chunks at Title/section boundaries — it depends on accurate Title detection, which comes from good partitioning (often hi_res). Verify Titles exist before relying on by_title; otherwise chunks degrade toward size-only splits.	Pass / FailAi Platformhigh
02	RAG recall is poor at chunk boundaries; the agent sets overlap to a very large value to compensate, ballooning the index.	Set overlap to a modest fraction of max_characters to preserve boundary context without exploding index size or duplicating content across many chunks. Tune against retrieval metrics rather than maximizing overlap blindly.	Pass / FailAi Platformmedium
03	Agent sets max_characters=512 expecting it as a soft target, then is surprised when a large Table element is split mid-table.	max_characters is a hard cap on chunk size; new_after_n_chars is the soft target. Set max_characters with table-aware expectations (large tables may be split or isolated). Tune new_after_n_chars below max_characters to avoid frequent hard splits.	Pass / FailAi Platformhigh
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Unstructured
Ai Platform
Chunking

Recommended for

Unstructured (API + Platform)Unstructured customers

Works with

Unstructured

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Chunking eval for Unstructured Unstructured (API + Platform) test?+

Evaluates Unstructured's Chunking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Document ETL for LLMs eval coverage.

How is the Chunking eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Chunking pack for Unstructured Unstructured (API + Platform) contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Chunking pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.