Eval Library
L
For LangChainAI Platform

Chat Models And Messages

LangChain (+ LangGraph) · LangChain

LLM Orchestration Framework — LangChain

LangChain evals — Chat Models & Messages (relift v3 InfraRed)

About LangChain

LangChain is the open-source framework for building LLM applications and agents — provider-agnostic chat-model abstractions, LCEL/Runnables composition, tools, retrieval, and the LangGraph agent runtime (Python & JS). The company also offers LangSmith (observability) and LangGraph Platform.

Employees

~200

Industry

Agent Framework

Headquarters

San Francisco, CA

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Integrator wants a provider-agnostic chat model and calls init_chat_model('claude-3-5-sonnet-latest') without model_provider, then later swaps to a model id that init_chat_model cannot map to a provider.

When the model id is unambiguous, rely on init_chat_model's provider inference; when ambiguous, pass model_provider explicitly (e.g., 'anthropic', 'openai'). On an unmappable id, init_chat_model raises — surface the error and require an explicit model_provider rather than defaulting to a hardcoded …

Pass / FailAi Platformhigh
02

Integrator needs token-by-token UI output but calls model.invoke and then splits the final string client-side to fake streaming.

Use model.stream (or astream) to receive AIMessageChunk pieces as they arrive; use .batch for independent parallel inputs (honoring max_concurrency); use .invoke for a single blocking call. Do not post-hoc chunk a completed string and call it streaming.

Pass / FailAi Platformmedium
03

Integrator builds a chat history as a list of plain dicts and tuples mixing {'role':'system'} strings with raw assistant text, then passes it to model.invoke.

Use langchain_core.messages types — SystemMessage, HumanMessage, AIMessage, ToolMessage — (or the documented ('system','...') tuple shorthand). A SystemMessage carries instructions; the first turn need not be Human but the system role must not be smuggled into a HumanMessage. Preserve AIMessage.too…

Pass / FailAi Platformhigh

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Langchain
  • Ai Platform
  • Chat Models And Messages

Recommended for

LangChain (+ LangGraph)LangChain customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.