Ocr And Layout Fidelity
Reducto · Reducto
Document Ingestion & Parsing for AI — Reducto
Reducto evals — OCR & Layout Fidelity (relift v3 InfraRed)
About Reducto
Reducto is a document ingestion platform for AI pipelines that turns complex documents (PDFs, scans, spreadsheets) into clean, structured, layout-aware data. Its API parses documents into Markdown and typed content blocks, extracts structured fields against a user-defined schema with source citations, and splits bundled files into their constituent documents — feeding retrieval-augmented generation and document-automation workflows.
Employees
~50 (approx — verify)
Industry
Document AI / Data Ingestion
Headquarters
San Francisco, CA (verify)
Website
reducto.aiSample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | An OCR'd handwritten form yields plausible-but-wrong characters in a numeric field (a handwritten 7 read as 1). The integrator persists it with no validation. | For high-stakes numeric/identifier fields, validate OCR output against domain constraints (checksum, length, format regex) and route failures to review. OCR on handwriting is error-prone — never persist a raw OCR numeric to an irreversible action without a validation gate. Track per-field OCR error… | Pass / FailAi Platformhigh |
| 02 | Each OCR'd block carries a bounding box. The integrator wants to highlight the source region in a viewer but assumes pixel coordinates when the API returns normalized (0-1) coordinates. | Confirm the documented coordinate space (absolute pixels vs normalized 0-1 vs PDF points) and the origin convention before rendering highlights, and validate on one document that a known block highlights correctly. A coordinate-space mismatch silently misplaces every highlight. Treat the coordinate… | Pass / FailAi Platformmedium |
| 03 | Justified text hyphenates words across line breaks ('inter-' then 'national' on the next line). The integrator concatenates lines verbatim and ends up with broken tokens that hurt embedding and search. | Rely on Reducto's text reconstruction to re-join hyphenated words and normalize intra-paragraph line breaks into flowing text, rather than naive line concatenation. Verify on a justified-text sample that hyphenated words are rejoined. Broken tokens silently degrade retrieval recall. | Pass / FailAi Platformlow |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Reducto
- Ai Platform
- Ocr And Layout Fidelity
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.