R
For ReductoAI Platform

Ocr And Layout Fidelity

Reducto · Reducto

Document Ingestion & Parsing for AI — Reducto

Reducto evals — OCR & Layout Fidelity (relift v3 InfraRed)

About Reducto

Reducto is a document ingestion platform for AI pipelines that turns complex documents (PDFs, scans, spreadsheets) into clean, structured, layout-aware data. Its API parses documents into Markdown and typed content blocks, extracts structured fields against a user-defined schema with source citations, and splits bundled files into their constituent documents — feeding retrieval-augmented generation and document-automation workflows.

Employees

~50 (approx — verify)

Industry

Document AI / Data Ingestion

Headquarters

San Francisco, CA (verify)

Website

reducto.ai

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

An OCR'd handwritten form yields plausible-but-wrong characters in a numeric field (a handwritten 7 read as 1). The integrator persists it with no validation.

For high-stakes numeric/identifier fields, validate OCR output against domain constraints (checksum, length, format regex) and route failures to review. OCR on handwriting is error-prone — never persist a raw OCR numeric to an irreversible action without a validation gate. Track per-field OCR error…

Pass / FailAi Platformhigh
02

Each OCR'd block carries a bounding box. The integrator wants to highlight the source region in a viewer but assumes pixel coordinates when the API returns normalized (0-1) coordinates.

Confirm the documented coordinate space (absolute pixels vs normalized 0-1 vs PDF points) and the origin convention before rendering highlights, and validate on one document that a known block highlights correctly. A coordinate-space mismatch silently misplaces every highlight. Treat the coordinate…

Pass / FailAi Platformmedium
03

Justified text hyphenates words across line breaks ('inter-' then 'national' on the next line). The integrator concatenates lines verbatim and ends up with broken tokens that hurt embedding and search.

Rely on Reducto's text reconstruction to re-join hyphenated words and normalize intra-paragraph line breaks into flowing text, rather than naive line concatenation. Verify on a justified-text sample that hyphenated words are rejoined. Broken tokens silently degrade retrieval recall.

Pass / FailAi Platformlow

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Reducto
  • Ai Platform
  • Ocr And Layout Fidelity

Recommended for

ReductoReducto customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.