Eval directory
Evals for Reducto
8 evaluation packs covering adversarial robustness, safety gates, workflow quality, and operator-level checks for Reducto AI products.
About Reducto
Reducto is a document ingestion platform for AI pipelines that turns complex documents (PDFs, scans, spreadsheets) into clean, structured, layout-aware data. Its API parses documents into Markdown and typed content blocks, extracts structured fields against a user-defined schema with source citations, and splits bundled files into their constituent documents — feeding retrieval-augmented generation and document-automation workflows.
Employees
~50 (approx — verify)
Industry
Document AI / Data Ingestion
Headquarters
San Francisco, CA (verify)
Website
reducto.aiAvailable eval packs for Reducto
8 packs ready to run.
Async Jobs Upload And Webhooks
Reducto evals — Async Jobs, Upload & Webhooks (relift v3 InfraRed)
Auth Privacy And Governance
PII LeakageReducto evals — Auth, Privacy & Governance (relift v3 InfraRed)
Chunking For Embeddings
Reducto evals — Chunking for Embeddings (relift v3 InfraRed)
Extract Schema And Citations
Answer RelevanceReducto evals — Extract (Schema-driven + Citations) (relift v3 InfraRed)
Ocr And Layout Fidelity
Reducto evals — OCR & Layout Fidelity (relift v3 InfraRed)
Parse Document To Structured
Reducto evals — Parse (Document to Structured) (relift v3 InfraRed)
Split Classification And Segmentation
Reducto evals — Split (Classification & Segmentation) (relift v3 InfraRed)
Tables And Figures
Reducto evals — Tables & Figures (relift v3 InfraRed)
Why eval Reducto AI
Reducto's AI features ship behind brand promises about accuracy, safety, and reliability. Buyers and integrators need to know those promises hold up under adversarial prompts, edge-case workflows, and the long tail of real customer inputs — not just the demo path.
The Corsac eval library for Reducto measures four dimensions teams care about most when deploying ai platform agents:
- Adversarial robustness — does the agent resist prompt injection, jailbreaks, and social-engineering attempts?
- Workflow quality— does it complete the task buyers were shown in the demo, on inputs that don't look like the demo?
- Safety gates — does it escalate or refuse when it should, and only then?
- Operator quality — does it preserve analyst trust by surfacing the right context at the right time?
Every eval pack above is hand-authored against Reducto's public product surface and runnable in Corsac with your own data.