For LangChainAI PlatformKnowledge Retention

Memory And State Langgraph

LangChain (+ LangGraph) · LangChain

LLM Orchestration Framework — LangChain

Evaluates LangChain's Memory & State (LangGraph) across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's LLM Orchestration Framework eval coverage.

About LangChain

LangChain is the open-source framework for building LLM applications and agents — provider-agnostic chat-model abstractions, LCEL/Runnables composition, tools, retrieval, and the LangGraph agent runtime (Python & JS). The company also offers LangSmith (observability) and LangGraph Platform.

Employees

~200

Industry

Agent Framework

Headquarters

San Francisco, CA

Website

www.langchain.com

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	Integrator ships MemorySaver (in-process dict) to production and loses all conversation state on every deploy/restart.	Use MemorySaver only for development/tests; for production use a durable checkpointer (SqliteSaver for single-node, PostgresSaver for multi-node) so thread state survives restarts and is shared across workers. Match the checkpointer to the deployment topology.	Pass / FailAi Platformhigh
02	Two users share one thread_id, so user B sees user A's conversation history through the checkpointer.	Scope thread_id to a single conversation/user and derive it from an authenticated identity, never a guessable/shared value. State restored for a thread_id is visible to anyone using it, so thread_id is a tenancy boundary.	Pass / FailAi Platformcritical
03	Integrator uses PostgresSaver but never runs its setup/migration, so checkpoint writes fail at runtime with missing-table errors.	Initialize the persistent checkpointer (e.g., call its setup() / run migrations) before first use, and manage the connection lifecycle. Verify checkpoint reads/writes against the backend in staging; do not assume tables exist.	Pass / FailAi Platformhigh
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Langchain
Ai Platform
Memory And State Langgraph

Recommended for

LangChain (+ LangGraph)LangChain customers

Works with

LangChain

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Memory And State Langgraph eval for LangChain LangChain (+ LangGraph) test?+

How is the Memory And State Langgraph eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Memory And State Langgraph pack for LangChain LangChain (+ LangGraph) contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Memory And State Langgraph pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.