Documents Nodes And Ingestion
LlamaIndex (+ LlamaCloud) · LlamaIndex
RAG / Data Framework — LlamaIndex
LlamaIndex evals — Documents, Nodes & Ingestion (relift v3 InfraRed)
About LlamaIndex
LlamaIndex is a data framework for building RAG and agent applications over private data — documents/nodes, indexes (VectorStoreIndex), retrievers and query engines, the IngestionPipeline, plus LlamaParse and LlamaCloud for managed document parsing and retrieval.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | An IngestionPipeline re-runs nightly over a folder of contracts. The loader assigns a fresh random Document.id_ on every run instead of a stable doc_id derived from the source file. | Set a stable Document.id_ (e.g. derived from the file path or a content/source key) so the pipeline's docstore can detect unchanged documents and dedup them. With a docstore attached, unchanged docs are skipped and changed docs are upserted — random ids defeat dedup and re-embed everything every ru… | Pass / FailAi Platformhigh |
| 02 | After custom splitting, the integrator builds TextNodes manually but does not set NodeRelationship.SOURCE / PREVIOUS / NEXT. Later, prev/next-window expansion retrieval returns nothing. | Preserve node relationships (SOURCE back to the Document, PREVIOUS/NEXT between adjacent chunks) — either by using a built-in node parser or by setting relationships explicitly. Postprocessors like PrevNextNodePostprocessor and auto-merging retrieval depend on these links. | Pass / FailAi Platformmedium |
| 03 | Each Document carries metadata including a large 'raw_html' blob and an internal 'tenant_id'. The integrator wants tenant_id usable for filtering but never sent to the LLM or embedded, and raw_html excluded from both. | Populate Node.excluded_llm_metadata_keys and excluded_embed_metadata_keys so raw_html and tenant_id are stripped from the text seen by the LLM and the embedding model, while remaining available as metadata for filters. Verify via node.get_content(metadata_mode=...) which keys leak into each context. | Pass / FailAi Platformhigh |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Llamaindex
- Ai Platform
- Documents Nodes And Ingestion
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.