For UnstructuredAI Platform

Metadata And Element Schema

Unstructured (API + Platform) · Unstructured

Document ETL for LLMs — Unstructured (API + Platform)

Evaluates Unstructured's Metadata & Element Schema across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Document ETL for LLMs eval coverage.

About Unstructured

Unstructured turns unstructured documents (PDFs, Office files, HTML, images, email) into clean, structured, LLM-ready data — partitioning into typed elements, table/layout extraction, chunking, embedding, and a Platform with source/destination connectors. Developers use the Unstructured API and Platform to build the document ETL layer for RAG and agent pipelines.

Employees

~75

Industry

Document ETL

Headquarters

San Francisco, CA

Website

unstructured.io

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	Citations must deep-link to the source page, but the agent's index stores only element.text and drops metadata.page_number.	Carry metadata.page_number (and filename) through chunking and into the vector store so retrieved chunks can deep-link to the exact source page. Verify page_number survives the chunking step (on the chunk or via orig_elements).	Pass / FailAi Platformhigh
02	A mixed-source index needs per-document filtering, but the agent never persists metadata.filename / metadata.filetype.	Persist metadata.filename and metadata.filetype on every indexed element/chunk so retrieval can filter by document and type. Use a stable source identifier (filename + a content hash) so re-ingested versions are reconcilable.	Pass / FailAi Platformmedium
03	The agent wants section-aware retrieval (return a chunk plus its parent Title) but ignores metadata.parent_id.	Use metadata.parent_id to reconstruct the element hierarchy (NarrativeText/ListItem under their Title). Store parent_id so retrieval can include parent context, and validate the parent chain resolves to a real element_id.	Pass / FailAi Platformmedium
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Unstructured
Ai Platform
Metadata And Element Schema

Recommended for

Unstructured (API + Platform)Unstructured customers

Works with

Unstructured

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Metadata And Element Schema eval for Unstructured Unstructured (API + Platform) test?+

How is the Metadata And Element Schema eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Metadata And Element Schema pack for Unstructured Unstructured (API + Platform) contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Metadata And Element Schema pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.