Eval Library
L
For LangSmithAI Platform

Langgraph Platform And Studio

LangSmith · LangSmith

LLM Observability & Evaluation Platform — LangSmith (LangChain)

LangSmith evals — LangGraph Platform & Studio (relift v3 InfraRed)

About LangSmith

LangSmith is LangChain's LLM observability and evaluation platform: tracing, datasets, evaluators (LLM-as-judge, code, and human), experiments, prompt management, and online monitoring used by AI teams to measure and improve LLM apps in production.

Employees

~200

Industry

LLM Observability

Headquarters

San Francisco, CA

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Operator wants to deploy a LangGraph app to LangGraph Platform so traces land in LangSmith automatically.

Define langgraph.json with the graph entry, push via the LangGraph CLI / Platform UI, and link the deployment to a LangSmith project. LangGraph runtime emits traces using the same @traceable / RunTree machinery so the LangSmith UI shows node-level spans. [REQUIRES-VERIFICATION] on exact CLI flags a…

Pass / FailAi Platformhigh
02

A graph node requires human approval before proceeding (e.g., 'send email' step). Operator uses LangGraph interrupts.

Set an interrupt_before / interrupt_after for the gated node. The graph pauses with a recoverable checkpoint. Surface the pending state to the human via Studio or an integrated app; on approval, resume by passing the thread_id. The pause is fully recorded as a span in LangSmith.

Pass / FailAi Platformcritical
03

Operator opens LangGraph Studio for a failing thread and wants to see each node's inputs, outputs, and the LLM trace per node.

Studio shows per-node inputs/outputs and deep-links into the LangSmith trace for that node. Verify the deployment is linked to a LangSmith project so deep-links resolve. Use Studio's edit-and-rerun to test a fix on a specific checkpoint without redeploying.

Pass / FailAi Platformmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Langsmith
  • Ai Platform
  • Langgraph Platform And Studio

Recommended for

LangSmithLangSmith customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.