Parse Document To Structured
Reducto · Reducto
Document Ingestion & Parsing for AI — Reducto
Reducto evals — Parse (Document to Structured) (relift v3 InfraRed)
About Reducto
Reducto is a document ingestion platform for AI pipelines that turns complex documents (PDFs, scans, spreadsheets) into clean, structured, layout-aware data. Its API parses documents into Markdown and typed content blocks, extracts structured fields against a user-defined schema with source citations, and splits bundled files into their constituent documents — feeding retrieval-augmented generation and document-automation workflows.
Employees
~50 (approx — verify)
Industry
Document AI / Data Ingestion
Headquarters
San Francisco, CA (verify)
Website
reducto.aiSample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Client POSTs a PDF to the /parse endpoint and receives a result containing the document rendered as layout-aware Markdown plus a structured list of content blocks (each with a type and bounding-box position). The integrator assum… | Consume both the Markdown rendering AND the per-block structure: each block carries a type (heading, paragraph, table, figure, list, etc.) and page-relative position. Downstream RAG indexing should preserve block boundaries rather than collapsing to one string, so retrieval can cite a specific bloc… | Pass / FailAi Platformhigh |
| 02 | Integrator needs clean Markdown for an LLM prompt but configures the parse request to return the most verbose structured JSON, then re-serializes it to Markdown themselves in application code. | Request the output representation that matches the downstream consumer: a Markdown/text rendering for LLM prompting, the structured block JSON for programmatic indexing. Do not hand-roll a Markdown serializer over the JSON — that reintroduces the reading-order and layout bugs Reducto already solved… | Pass / FailAi Platformmedium |
| 03 | A 40-page report has two-column layouts on some pages and single-column on others. The integrator concatenates parsed text in raw top-to-bottom pixel order without trusting Reducto's reading-order reconstruction. | Rely on Reducto's reading-order reconstruction (which sequences multi-column and interleaved layouts into logical reading order) rather than naive top-to-bottom sort. Verify on a sample of mixed-layout pages that columns are not interleaved mid-sentence before trusting the order in bulk indexing. | Pass / FailAi Platformhigh |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Reducto
- Ai Platform
- Parse Document To Structured
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.