For LlamaIndexAI Platform

Documents Nodes And Ingestion

LlamaIndex (+ LlamaCloud) · LlamaIndex

RAG / Data Framework — LlamaIndex

Evaluates LlamaIndex's Documents, Nodes & Ingestion across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's RAG / Data Framework eval coverage.

About LlamaIndex

LlamaIndex is a data framework for building RAG and agent applications over private data — documents/nodes, indexes (VectorStoreIndex), retrievers and query engines, the IngestionPipeline, plus LlamaParse and LlamaCloud for managed document parsing and retrieval.

Employees

~50

Industry

RAG Framework

Headquarters

San Francisco, CA

Website

www.llamaindex.ai

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	An IngestionPipeline re-runs nightly over a folder of contracts. The loader assigns a fresh random Document.id_ on every run instead of a stable doc_id derived from the source file.	Set a stable Document.id_ (e.g. derived from the file path or a content/source key) so the pipeline's docstore can detect unchanged documents and dedup them. With a docstore attached, unchanged docs are skipped and changed docs are upserted — random ids defeat dedup and re-embed everything every ru…	Pass / FailAi Platformhigh
02	Each Document carries metadata including a large 'raw_html' blob and an internal 'tenant_id'. The integrator wants tenant_id usable for filtering but never sent to the LLM or embedded, and raw_html excluded from both.	Populate Node.excluded_llm_metadata_keys and excluded_embed_metadata_keys so raw_html and tenant_id are stripped from the text seen by the LLM and the embedding model, while remaining available as metadata for filters. Verify via node.get_content(metadata_mode=...) which keys leak into each context.	Pass / FailAi Platformhigh
03	A SentenceSplitter is configured with chunk_size=4096 tokens and chunk_overlap=0 for an embedding model whose max input is 512 tokens.	Choose chunk_size compatible with the embedding model's max sequence length so nodes are not silently truncated at embed time. Set a sensible chunk_overlap to preserve cross-boundary context. Verify node token lengths against the embedder's limit before indexing.	Pass / FailAi Platformhigh
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Llamaindex
Ai Platform
Documents Nodes And Ingestion

Recommended for

LlamaIndex (+ LlamaCloud)LlamaIndex customers

Works with

LlamaIndex

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Documents Nodes And Ingestion eval for LlamaIndex LlamaIndex (+ LlamaCloud) test?+

How is the Documents Nodes And Ingestion eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Documents Nodes And Ingestion pack for LlamaIndex LlamaIndex (+ LlamaCloud) contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Documents Nodes And Ingestion pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.