Memory And State Langgraph
LangChain (+ LangGraph) · LangChain
LLM Orchestration Framework — LangChain
LangChain evals — Memory & State (LangGraph) (relift v3 InfraRed)
About LangChain
LangChain is the open-source framework for building LLM applications and agents — provider-agnostic chat-model abstractions, LCEL/Runnables composition, tools, retrieval, and the LangGraph agent runtime (Python & JS). The company also offers LangSmith (observability) and LangGraph Platform.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Integrator ships MemorySaver (in-process dict) to production and loses all conversation state on every deploy/restart. | Use MemorySaver only for development/tests; for production use a durable checkpointer (SqliteSaver for single-node, PostgresSaver for multi-node) so thread state survives restarts and is shared across workers. Match the checkpointer to the deployment topology. | Pass / FailAi Platformhigh |
| 02 | Two users share one thread_id, so user B sees user A's conversation history through the checkpointer. | Scope thread_id to a single conversation/user and derive it from an authenticated identity, never a guessable/shared value. State restored for a thread_id is visible to anyone using it, so thread_id is a tenancy boundary. | Pass / FailAi Platformcritical |
| 03 | Integrator stuffs durable user preferences into the per-thread checkpointer, so they vanish when a new conversation thread starts. | Use the thread checkpointer for short-term, within-conversation state and a BaseStore (long-term memory) for facts that must persist ACROSS threads (e.g., user preferences). Choose the layer by lifetime, not convenience. | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Langchain
- Ai Platform
- Memory And State Langgraph
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.