Retrieval And Vector Stores
LangChain (+ LangGraph) · LangChain
LLM Orchestration Framework — LangChain
LangChain evals — Retrieval & Vector Stores (relift v3 InfraRed)
About LangChain
LangChain is the open-source framework for building LLM applications and agents — provider-agnostic chat-model abstractions, LCEL/Runnables composition, tools, retrieval, and the LangGraph agent runtime (Python & JS). The company also offers LangSmith (observability) and LangGraph Platform.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Integrator stores retrieval text as bare strings, losing source/url metadata needed for citations and filtering. | Wrap chunks as Document(page_content=..., metadata={'source':..., 'page':...}). Metadata carries provenance for citations and enables metadata-filtered retrieval. Preserve it through splitting and embedding so it survives to the answer. | Pass / FailAi Platformhigh |
| 02 | Integrator splits documents into 4000-character chunks with zero overlap and gets answers that miss facts spanning chunk boundaries. | Use RecursiveCharacterTextSplitter with a chunk_size matched to the embedding/model context and a non-zero chunk_overlap so boundary-spanning facts are retrievable. Tune sizes to the corpus; do not assume one size fits all documents. | Pass / FailAi Platformmedium |
| 03 | A RAG chain answers from the model's parametric knowledge even when the retrieved context does not support the claim, and cites no source. | Prompt the model to answer ONLY from retrieved context and to say it does not know when context is insufficient; pass Document metadata through so the answer can cite sources. Ungrounded claims must be avoided, not silently filled from parametric memory. | Pass / FailAi Platformcritical |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Langchain
- Ai Platform
- Retrieval And Vector Stores
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.