Agents Langgraph
LangChain (+ LangGraph) · LangChain
LLM Orchestration Framework — LangChain
LangChain evals — Agents (LangGraph) (relift v3 InfraRed)
About LangChain
LangChain is the open-source framework for building LLM applications and agents — provider-agnostic chat-model abstractions, LCEL/Runnables composition, tools, retrieval, and the LangGraph agent runtime (Python & JS). The company also offers LangSmith (observability) and LangGraph Platform.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Integrator wants a tool-calling agent and hand-builds a StateGraph from scratch, reimplementing the model/tool loop and introducing routing bugs. | Use langgraph.prebuilt.create_react_agent(model, tools) for the standard ReAct tool-calling loop; it wires the model node, ToolNode, and conditional routing back to the model. Drop to a custom StateGraph only when the prebuilt loop is insufficient. | Pass / FailAi Platformhigh |
| 02 | An agent loops between two tools forever; the integrator has no recursion_limit set and the run consumes tokens until it is killed manually. | Set a sensible recursion_limit in the config so LangGraph raises GraphRecursionError when the step budget is exceeded, then handle it (surface to the user / inspect state). Do not raise the limit blindly to 'make it finish' — investigate the loop. | Pass / FailAi Platformcritical |
| 03 | Integrator builds a new agent on the legacy AgentExecutor / initialize_agent path and expects checkpointing and interrupts to work. | Prefer the LangGraph agent (create_react_agent) for new builds; the legacy AgentExecutor is maintained but does not offer LangGraph's persistence, streaming, and interrupt features. If maintaining legacy code, do not assume LangGraph capabilities are present. | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Langchain
- Ai Platform
- Agents Langgraph
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.