Tracing And Runs Api
LangSmith · LangSmith
LLM Observability & Evaluation Platform — LangSmith (LangChain)
LangSmith evals — Tracing & Runs API (relift v3 InfraRed)
About LangSmith
LangSmith is LangChain's LLM observability and evaluation platform: tracing, datasets, evaluators (LLM-as-judge, code, and human), experiments, prompt management, and online monitoring used by AI teams to measure and improve LLM apps in production.
Employees
~200
Industry
LLM Observability
Headquarters
San Francisco, CA
Website
www.langchain.com/langsmithSample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Agent wraps a multi-step pipeline with @traceable on the orchestrator and child @traceable on each step. A trace tree is expected with the orchestrator as parent_run and three child runs under it. | Decorate the orchestrator and each step with @traceable; rely on LANGSMITH_TRACING=true and Python contextvars (or run_tree propagation) to auto-attach child runs to the active parent. Do NOT pass parent_run_id manually unless crossing a thread/process boundary — incorrect manual ids create orphane… | Pass / FailAi Platformhigh |
| 02 | LLM call streams tokens via SSE. The client aborts mid-stream after 12 of an expected 200 tokens. Operator wants the span to land in LangSmith with the partial output and an error marker. | Finalize the run on cancellation with end_time set, the partial outputs captured, and error set to a descriptive message (e.g., 'client disconnect after N tokens'). Do not leave the run in 'running' state — orphan runs skew P50/P99 latency and inflate active-run counts. | Pass / FailAi Platformcritical |
| 03 | Operator wants to export only top-level production traces (no nested child runs) for batch analysis via client.list_runs. | Pass is_root=True to client.list_runs along with a project_name filter. The API returns only runs where parent_run_id is null. Iterating list_runs without is_root pulls every child span and inflates result size by an order of magnitude in deep traces. | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Langsmith
- Ai Platform
- Tracing And Runs Api
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.