Chat Models And Messages
LangChain (+ LangGraph) · LangChain
LLM Orchestration Framework — LangChain
LangChain evals — Chat Models & Messages (relift v3 InfraRed)
About LangChain
LangChain is the open-source framework for building LLM applications and agents — provider-agnostic chat-model abstractions, LCEL/Runnables composition, tools, retrieval, and the LangGraph agent runtime (Python & JS). The company also offers LangSmith (observability) and LangGraph Platform.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Integrator wants a provider-agnostic chat model and calls init_chat_model('claude-3-5-sonnet-latest') without model_provider, then later swaps to a model id that init_chat_model cannot map to a provider. | When the model id is unambiguous, rely on init_chat_model's provider inference; when ambiguous, pass model_provider explicitly (e.g., 'anthropic', 'openai'). On an unmappable id, init_chat_model raises — surface the error and require an explicit model_provider rather than defaulting to a hardcoded … | Pass / FailAi Platformhigh |
| 02 | Integrator needs token-by-token UI output but calls model.invoke and then splits the final string client-side to fake streaming. | Use model.stream (or astream) to receive AIMessageChunk pieces as they arrive; use .batch for independent parallel inputs (honoring max_concurrency); use .invoke for a single blocking call. Do not post-hoc chunk a completed string and call it streaming. | Pass / FailAi Platformmedium |
| 03 | Integrator builds a chat history as a list of plain dicts and tuples mixing {'role':'system'} strings with raw assistant text, then passes it to model.invoke. | Use langchain_core.messages types — SystemMessage, HumanMessage, AIMessage, ToolMessage — (or the documented ('system','...') tuple shorthand). A SystemMessage carries instructions; the first turn need not be Human but the system role must not be smuggled into a HumanMessage. Preserve AIMessage.too… | Pass / FailAi Platformhigh |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Langchain
- Ai Platform
- Chat Models And Messages
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.