Indexes
LlamaIndex (+ LlamaCloud) · LlamaIndex
RAG / Data Framework — LlamaIndex
LlamaIndex evals — Indexes (relift v3 InfraRed)
About LlamaIndex
LlamaIndex is a data framework for building RAG and agent applications over private data — documents/nodes, indexes (VectorStoreIndex), retrievers and query engines, the IngestionPipeline, plus LlamaParse and LlamaCloud for managed document parsing and retrieval.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | A VectorStoreIndex built with the default SimpleVectorStore is used in-process, then the service restarts and rebuilds the index from documents every boot, re-embedding the whole corpus. | Persist via storage_context.persist(persist_dir=...) (docstore + index_store + vector store) and reload with load_index_from_storage(StorageContext.from_defaults(persist_dir=...)). Do not re-run from_documents on every boot. For external vector stores, reconstruct the index from the existing store,… | Pass / FailAi Platformhigh |
| 02 | An integrator calls VectorStoreIndex.from_documents(docs) expecting it to write to their Qdrant collection, but never passed a StorageContext, so nodes land in the in-memory SimpleVectorStore. | Pass storage_context=StorageContext.from_defaults(vector_store=QdrantVectorStore(...)) (or build via VectorStoreIndex.from_vector_store(...)). Confirm vectors actually land in Qdrant — the default in-memory store silently 'works' until restart, masking the misconfiguration. | Pass / FailAi Platformcritical |
| 03 | An integrator builds a SummaryIndex over a 50k-document corpus and uses its default query engine, then is surprised by very high token cost and latency per query. | Understand the index trade-off: SummaryIndex's default mode traverses all nodes (good for small sets / full-corpus summarization), while VectorStoreIndex does top-k semantic retrieval (right for large corpora). For 50k docs use a VectorStoreIndex (or a retriever mode on the summary index), not full… | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Llamaindex
- Ai Platform
- Indexes
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.