Structured Outputs And Extraction
LlamaIndex (+ LlamaCloud) · LlamaIndex
RAG / Data Framework — LlamaIndex
LlamaIndex evals — Structured Outputs & Extraction (relift v3 InfraRed)
About LlamaIndex
LlamaIndex is a data framework for building RAG and agent applications over private data — documents/nodes, indexes (VectorStoreIndex), retrievers and query engines, the IngestionPipeline, plus LlamaParse and LlamaCloud for managed document parsing and retrieval.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | A query engine is built with output_cls=InvoiceSummary (a Pydantic model). When the LLM returns JSON missing a required field, the integrator catches the ValidationError and returns an empty object as success. | On Pydantic validation failure from output_cls/structured_predict, do not silently substitute an empty/partial object as success — retry with a corrective prompt, surface the validation error, or route to review. The schema contract must hold or the failure must be visible. | Pass / FailAi Platformhigh |
| 02 | Instead of llm.structured_predict / as_structured_llm with a Pydantic model, the integrator prompts for JSON in free text and json.loads the completion, which breaks when the model wraps JSON in prose or code fences. | Use LlamaIndex's structured prediction (structured_predict / as_structured_llm / Pydantic program) so parsing is schema-driven and robust to formatting, rather than json.loads on free-form text. This leverages native structured output / tool schemas where the model supports them. | Pass / FailAi Platformmedium |
| 03 | An extraction pipeline pulls 'effective_date' from contracts. For a contract that has no effective date, the model confidently returns a plausible date not present in the document. | Constrain extraction to values present in the source: allow null/'not found' for absent fields, and verify extracted values against the source text (e.g. via source nodes / citations) rather than accepting hallucinated-but-plausible values. An unfound field must be null, not invented. | Pass / FailAi Platformcritical |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Llamaindex
- Ai Platform
- Structured Outputs And Extraction
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.