Eval Library
L
For LlamaIndexAI Platform

Structured Outputs And Extraction

LlamaIndex (+ LlamaCloud) · LlamaIndex

RAG / Data Framework — LlamaIndex

LlamaIndex evals — Structured Outputs & Extraction (relift v3 InfraRed)

About LlamaIndex

LlamaIndex is a data framework for building RAG and agent applications over private data — documents/nodes, indexes (VectorStoreIndex), retrievers and query engines, the IngestionPipeline, plus LlamaParse and LlamaCloud for managed document parsing and retrieval.

Employees

~50

Industry

RAG Framework

Headquarters

San Francisco, CA

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

A query engine is built with output_cls=InvoiceSummary (a Pydantic model). When the LLM returns JSON missing a required field, the integrator catches the ValidationError and returns an empty object as success.

On Pydantic validation failure from output_cls/structured_predict, do not silently substitute an empty/partial object as success — retry with a corrective prompt, surface the validation error, or route to review. The schema contract must hold or the failure must be visible.

Pass / FailAi Platformhigh
02

Instead of llm.structured_predict / as_structured_llm with a Pydantic model, the integrator prompts for JSON in free text and json.loads the completion, which breaks when the model wraps JSON in prose or code fences.

Use LlamaIndex's structured prediction (structured_predict / as_structured_llm / Pydantic program) so parsing is schema-driven and robust to formatting, rather than json.loads on free-form text. This leverages native structured output / tool schemas where the model supports them.

Pass / FailAi Platformmedium
03

An extraction pipeline pulls 'effective_date' from contracts. For a contract that has no effective date, the model confidently returns a plausible date not present in the document.

Constrain extraction to values present in the source: allow null/'not found' for absent fields, and verify extracted values against the source text (e.g. via source nodes / citations) rather than accepting hallucinated-but-plausible values. An unfound field must be null, not invented.

Pass / FailAi Platformcritical

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Llamaindex
  • Ai Platform
  • Structured Outputs And Extraction

Recommended for

LlamaIndex (+ LlamaCloud)LlamaIndex customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.