For LangChainAI PlatformAnswer Relevance

Retrieval And Vector Stores

LangChain (+ LangGraph) · LangChain

LLM Orchestration Framework — LangChain

Evaluates LangChain's Retrieval & Vector Stores across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's LLM Orchestration Framework eval coverage.

About LangChain

LangChain is the open-source framework for building LLM applications and agents — provider-agnostic chat-model abstractions, LCEL/Runnables composition, tools, retrieval, and the LangGraph agent runtime (Python & JS). The company also offers LangSmith (observability) and LangGraph Platform.

Employees

~200

Industry

Agent Framework

Headquarters

San Francisco, CA

Website

www.langchain.com

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	Integrator stores retrieval text as bare strings, losing source/url metadata needed for citations and filtering.	Wrap chunks as Document(page_content=..., metadata={'source':..., 'page':...}). Metadata carries provenance for citations and enables metadata-filtered retrieval. Preserve it through splitting and embedding so it survives to the answer.	Pass / FailAi Platformhigh
02	Integrator splits documents into 4000-character chunks with zero overlap and gets answers that miss facts spanning chunk boundaries.	Use RecursiveCharacterTextSplitter with a chunk_size matched to the embedding/model context and a non-zero chunk_overlap so boundary-spanning facts are retrievable. Tune sizes to the corpus; do not assume one size fits all documents.	Pass / FailAi Platformmedium
03	Integrator indexes documents with one embedding model and queries with a different embedding model, producing meaningless similarity scores.	Use the SAME Embeddings model (same provider/model/dimension) for indexing and querying. Embeddings exposes embed_documents and embed_query; mixing models or dimensions makes the vector space incomparable. Re-embed the corpus if the model changes.	Pass / FailAi Platformhigh
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Langchain
Ai Platform
Retrieval And Vector Stores

Recommended for

LangChain (+ LangGraph)LangChain customers

Works with

LangChain

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Retrieval And Vector Stores eval for LangChain LangChain (+ LangGraph) test?+

How is the Retrieval And Vector Stores eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Retrieval And Vector Stores pack for LangChain LangChain (+ LangGraph) contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Retrieval And Vector Stores pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.