Streaming Callbacks And Safety
LangChain (+ LangGraph) · LangChain
LLM Orchestration Framework — LangChain
LangChain evals — Streaming, Callbacks & Safety (relift v3 InfraRed)
About LangChain
LangChain is the open-source framework for building LLM applications and agents — provider-agnostic chat-model abstractions, LCEL/Runnables composition, tools, retrieval, and the LangGraph agent runtime (Python & JS). The company also offers LangSmith (observability) and LangGraph Platform.
Sample tests· showing 3 of 10
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Integrator wants token-level events from a complex chain but only uses .stream, which yields chain output chunks, not granular per-node events. | Use astream_events (v2) to receive granular events (on_chat_model_stream for tokens, on_tool_start/on_tool_end, on_chain_*) across the whole chain. Filter by event type/name; do not conflate the coarse .stream output with the event stream. | Pass / FailAi Platformmedium |
| 02 | While streaming, the integrator tries to execute a tool call from the first chunk, but the tool_calls arguments arrive incrementally and are incomplete. | Accumulate AIMessageChunks (they add together) until the tool_calls / tool_call_chunks are fully assembled before executing. Do not act on partial tool-call arguments; wait for the complete, parsed tool_calls on the aggregated message. | Pass / FailAi Platformhigh |
| 03 | Integrator passes a provider API key as a plain constructor argument that gets logged via callbacks/metadata into traces. | Supply secrets via environment variables (or a secrets manager) read by the integration, not inline literals that land in metadata/traces; keep keys out of RunnableConfig metadata and logs. Redact any secret-shaped values before emitting telemetry. | Pass / FailAi Platformcritical |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Langchain
- Ai Platform
- Streaming Callbacks And Safety
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.