Eval Library
L
For LangSmithAI Platform

Tracing And Runs Api

LangSmith · LangSmith

LLM Observability & Evaluation Platform — LangSmith (LangChain)

LangSmith evals — Tracing & Runs API (relift v3 InfraRed)

About LangSmith

LangSmith is LangChain's LLM observability and evaluation platform: tracing, datasets, evaluators (LLM-as-judge, code, and human), experiments, prompt management, and online monitoring used by AI teams to measure and improve LLM apps in production.

Employees

~200

Industry

LLM Observability

Headquarters

San Francisco, CA

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Agent wraps a multi-step pipeline with @traceable on the orchestrator and child @traceable on each step. A trace tree is expected with the orchestrator as parent_run and three child runs under it.

Decorate the orchestrator and each step with @traceable; rely on LANGSMITH_TRACING=true and Python contextvars (or run_tree propagation) to auto-attach child runs to the active parent. Do NOT pass parent_run_id manually unless crossing a thread/process boundary — incorrect manual ids create orphane…

Pass / FailAi Platformhigh
02

LLM call streams tokens via SSE. The client aborts mid-stream after 12 of an expected 200 tokens. Operator wants the span to land in LangSmith with the partial output and an error marker.

Finalize the run on cancellation with end_time set, the partial outputs captured, and error set to a descriptive message (e.g., 'client disconnect after N tokens'). Do not leave the run in 'running' state — orphan runs skew P50/P99 latency and inflate active-run counts.

Pass / FailAi Platformcritical
03

Operator wants to export only top-level production traces (no nested child runs) for batch analysis via client.list_runs.

Pass is_root=True to client.list_runs along with a project_name filter. The API returns only runs where parent_run_id is null. Iterating list_runs without is_root pulls every child span and inflates result size by an order of magnitude in deep traces.

Pass / FailAi Platformmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Langsmith
  • Ai Platform
  • Tracing And Runs Api

Recommended for

LangSmithLangSmith customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.