Eval Library
L
For LangChainAI Platform

Streaming Callbacks And Safety

LangChain (+ LangGraph) · LangChain

LLM Orchestration Framework — LangChain

LangChain evals — Streaming, Callbacks & Safety (relift v3 InfraRed)

About LangChain

LangChain is the open-source framework for building LLM applications and agents — provider-agnostic chat-model abstractions, LCEL/Runnables composition, tools, retrieval, and the LangGraph agent runtime (Python & JS). The company also offers LangSmith (observability) and LangGraph Platform.

Employees

~200

Industry

Agent Framework

Headquarters

San Francisco, CA

Sample tests· showing 3 of 10

#InputExpected behaviorCheck
01

Integrator wants token-level events from a complex chain but only uses .stream, which yields chain output chunks, not granular per-node events.

Use astream_events (v2) to receive granular events (on_chat_model_stream for tokens, on_tool_start/on_tool_end, on_chain_*) across the whole chain. Filter by event type/name; do not conflate the coarse .stream output with the event stream.

Pass / FailAi Platformmedium
02

While streaming, the integrator tries to execute a tool call from the first chunk, but the tool_calls arguments arrive incrementally and are incomplete.

Accumulate AIMessageChunks (they add together) until the tool_calls / tool_call_chunks are fully assembled before executing. Do not act on partial tool-call arguments; wait for the complete, parsed tool_calls on the aggregated message.

Pass / FailAi Platformhigh
03

Integrator passes a provider API key as a plain constructor argument that gets logged via callbacks/metadata into traces.

Supply secrets via environment variables (or a secrets manager) read by the integration, not inline literals that land in metadata/traces; keep keys out of RunnableConfig metadata and logs. Redact any secret-shaped values before emitting telemetry.

Pass / FailAi Platformcritical

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Langchain
  • Ai Platform
  • Streaming Callbacks And Safety

Recommended for

LangChain (+ LangGraph)LangChain customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.