Tools And Tool Calling
LangChain (+ LangGraph) · LangChain
LLM Orchestration Framework — LangChain
LangChain evals — Tools & Tool Calling (relift v3 InfraRed)
About LangChain
LangChain is the open-source framework for building LLM applications and agents — provider-agnostic chat-model abstractions, LCEL/Runnables composition, tools, retrieval, and the LangGraph agent runtime (Python & JS). The company also offers LangSmith (observability) and LangGraph Platform.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Integrator defines a tool function with no type hints and a one-line docstring, then wonders why the model passes wrong-typed arguments. | Decorate with @tool and provide type hints + a clear docstring; LangChain derives the tool name, description, and args schema from them. The description is what the model uses to route, so it must describe purpose and arguments precisely. | Pass / FailAi Platformhigh |
| 02 | The model returns two tool_calls in one AIMessage; the integrator returns a single merged ToolMessage without matching tool_call_id. | For each AIMessage.tool_calls entry, execute the tool and return one ToolMessage whose tool_call_id matches the call's id. The next model turn requires every tool_call to have a matching ToolMessage; mismatched or missing ids break the conversation contract. | Pass / FailAi Platformcritical |
| 03 | Two tools, get_weather (current) and get_forecast (future), have near-identical descriptions, so the model routes 'will it rain tomorrow?' to the wrong tool. | Write distinct, scope-bounding descriptions: get_weather 'current conditions only; for future dates use get_forecast'. The model routes from descriptions, so disambiguation must live there, not in a system-prompt tie-breaker that biases against one tool. | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Langchain
- Ai Platform
- Tools And Tool Calling
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.