Eval Library
L
For LlamaIndexAI Platform

Indexes

LlamaIndex (+ LlamaCloud) · LlamaIndex

RAG / Data Framework — LlamaIndex

LlamaIndex evals — Indexes (relift v3 InfraRed)

About LlamaIndex

LlamaIndex is a data framework for building RAG and agent applications over private data — documents/nodes, indexes (VectorStoreIndex), retrievers and query engines, the IngestionPipeline, plus LlamaParse and LlamaCloud for managed document parsing and retrieval.

Employees

~50

Industry

RAG Framework

Headquarters

San Francisco, CA

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

A VectorStoreIndex built with the default SimpleVectorStore is used in-process, then the service restarts and rebuilds the index from documents every boot, re-embedding the whole corpus.

Persist via storage_context.persist(persist_dir=...) (docstore + index_store + vector store) and reload with load_index_from_storage(StorageContext.from_defaults(persist_dir=...)). Do not re-run from_documents on every boot. For external vector stores, reconstruct the index from the existing store,…

Pass / FailAi Platformhigh
02

An integrator calls VectorStoreIndex.from_documents(docs) expecting it to write to their Qdrant collection, but never passed a StorageContext, so nodes land in the in-memory SimpleVectorStore.

Pass storage_context=StorageContext.from_defaults(vector_store=QdrantVectorStore(...)) (or build via VectorStoreIndex.from_vector_store(...)). Confirm vectors actually land in Qdrant — the default in-memory store silently 'works' until restart, masking the misconfiguration.

Pass / FailAi Platformcritical
03

An integrator builds a SummaryIndex over a 50k-document corpus and uses its default query engine, then is surprised by very high token cost and latency per query.

Understand the index trade-off: SummaryIndex's default mode traverses all nodes (good for small sets / full-corpus summarization), while VectorStoreIndex does top-k semantic retrieval (right for large corpora). For 50k docs use a VectorStoreIndex (or a retriever mode on the summary index), not full…

Pass / FailAi Platformmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Llamaindex
  • Ai Platform
  • Indexes

Recommended for

LlamaIndex (+ LlamaCloud)LlamaIndex customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.