Extract Schema And Citations
Reducto · Reducto
Document Ingestion & Parsing for AI — Reducto
Reducto evals — Extract (Schema-driven + Citations) (relift v3 InfraRed)
About Reducto
Reducto is a document ingestion platform for AI pipelines that turns complex documents (PDFs, scans, spreadsheets) into clean, structured, layout-aware data. Its API parses documents into Markdown and typed content blocks, extracts structured fields against a user-defined schema with source citations, and splits bundled files into their constituent documents — feeding retrieval-augmented generation and document-automation workflows.
Employees
~50 (approx — verify)
Industry
Document AI / Data Ingestion
Headquarters
San Francisco, CA (verify)
Website
reducto.aiSample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Integrator calls /extract with a JSON schema describing the fields to pull from an invoice (invoice_number, total, line_items[]). They make every field a loosely typed string and add no field descriptions. | Define a precise schema: correct types (numbers as numbers, dates as dates), arrays for repeated structures (line_items[]), and per-field descriptions that disambiguate (e.g., 'total' = grand total incl. tax). A well-specified schema is the primary lever on extraction accuracy. Validate returned va… | Pass / FailAi Platformhigh |
| 02 | Extract returns each field value alongside a citation pointing to the source location in the document (page + region/block). The integrator stores only the values and discards the citations. | Persist the citation alongside each extracted value so the field is auditable and a human can verify it against the source. Source-grounded extraction is the core trust feature — discarding citations turns a verifiable extraction into an unverifiable guess. Surface citations in any human-review UI.… | Pass / FailAi Platformcritical |
| 03 | Integrator already parses every document, then sends the full parsed text to a general LLM with an ad-hoc 'pull these fields' prompt instead of using /extract. | Prefer the purpose-built /extract path (schema-constrained, source-cited) over an ad-hoc LLM prompt for structured field extraction: it returns typed, validated, grounded output without bespoke prompt engineering or a separate hallucination guard. Reserve the parse-then-LLM path for open-ended summ… | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Reducto
- Ai Platform
- Extract Schema And Citations
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.