Retrievers And Query Engines
LlamaIndex (+ LlamaCloud) · LlamaIndex
RAG / Data Framework — LlamaIndex
LlamaIndex evals — Retrievers & Query Engines (relift v3 InfraRed)
About LlamaIndex
LlamaIndex is a data framework for building RAG and agent applications over private data — documents/nodes, indexes (VectorStoreIndex), retrievers and query engines, the IngestionPipeline, plus LlamaParse and LlamaCloud for managed document parsing and retrieval.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | A RetrieverQueryEngine is left at similarity_top_k=2 for a corpus where relevant evidence is routinely spread across 5-8 chunks; answers are incomplete. | Tune similarity_top_k to the evidence-spread of the corpus and the LLM's context budget — raise k (and optionally add a reranker to trim) when answers need more chunks, while watching context overflow and cost. Treat k as a measured parameter, not a default. | Pass / FailAi Platformmedium |
| 02 | A query engine returns a Response. The application renders response.response text to users as cited answers but never inspects response.source_nodes. | Use response.source_nodes (NodeWithScore) to attribute each answer to the retrieved nodes — render citations from node metadata (source, page) and verify the answer's claims are supported by those nodes. An answer with no/low-score source nodes should be flagged as ungrounded, not presented as cite… | Pass / FailAi Platformcritical |
| 03 | The integrator wants higher precision, so they set similarity_top_k=3 and add an LLMRerank node postprocessor expecting it to improve recall. | Retrieve a wider candidate set (higher similarity_top_k) and let the reranker (LLMRerank / SentenceTransformerRerank / Cohere rerank) cut to a smaller top_n. Reranking cannot recover documents the retriever never fetched — set retrieval k > rerank top_n. | Pass / FailAi Platformhigh |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Llamaindex
- Ai Platform
- Retrievers And Query Engines
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.