Embedding And Enrichment
Unstructured (API + Platform) · Unstructured
Document ETL for LLMs — Unstructured (API + Platform)
Unstructured evals — Embedding & Enrichment (relift v3 InfraRed)
About Unstructured
Unstructured turns unstructured documents (PDFs, Office files, HTML, images, email) into clean, structured, LLM-ready data — partitioning into typed elements, table/layout extraction, chunking, embedding, and a Platform with source/destination connectors. Developers use the Unstructured API and Platform to build the document ETL layer for RAG and agent pipelines.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | An embedding transform is configured before chunking, so a 50-page document is embedded as a single oversized vector. | Place the embedding transform after partition + chunk so each chunk gets its own vector sized for the embedding model's context. Verify the embedded unit is a chunk (bounded by max_characters), not a whole document. | Pass / FailAi Platformhigh |
| 02 | The workflow's embedding model emits 1536-dim vectors but the destination vector index was created for 768 dims; writes silently fail or are rejected. | Match the embedding transform's output dimension to the destination index's configured dimension. On a model change, re-create/migrate the index for the new dimension and re-embed; do not write mismatched-dimension vectors. [REQUIRES-VERIFICATION for a given model's exact dimension]. | Pass / FailAi Platformcritical |
| 03 | An NER enrichment tags entities, but the agent treats every extracted entity as ground truth for an automated downstream decision. | Treat NER output as model-generated annotations, not verified facts: record them as metadata with the enrichment provenance, keep human-in-the-loop for consequential decisions, and do not present extracted entities as authoritative without verification. | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Unstructured
- Ai Platform
- Embedding And Enrichment
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.