Eval Library
L
For LangSmithAI Platform

Online Monitoring And Feedback

LangSmith · LangSmith

LLM Observability & Evaluation Platform — LangSmith (LangChain)

LangSmith evals — Online Monitoring & Feedback (relift v3 InfraRed)

About LangSmith

LangSmith is LangChain's LLM observability and evaluation platform: tracing, datasets, evaluators (LLM-as-judge, code, and human), experiments, prompt management, and online monitoring used by AI teams to measure and improve LLM apps in production.

Employees

~200

Industry

LLM Observability

Headquarters

San Francisco, CA

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Operator wants a Slack alert when error rate on a production project exceeds 5% over a 10-minute window.

Create a project-scoped alert rule: metric='error_rate', threshold=0.05, window=10m, recipients=[slack-webhook]. The rule fires only when both threshold and window conditions are met. Use a separate alert rule per severity (warning vs page). Confirm via the 'test alert' button in UI.

Pass / FailAi Platformhigh
02

Operator routes critical-severity LangSmith alerts to PagerDuty.

Connect via the documented PagerDuty integration (events API key). Each alert rule maps to a service. LangSmith deduplicates 'same alert within one hour' on its end per docs; PagerDuty applies its own dedup window. Confirm end-to-end with a test alert.

Pass / FailAi Platformmedium
03

Operator wants to page when LLM-run P99 latency exceeds 8 seconds over 5 minutes.

Configure a latency alert with metric='p99_latency_ms', threshold=8000, window=5m. Scope to run_type='llm' to exclude tool runs. Bake a [REQUIRES-VERIFICATION] note on the exact metric key string against current alerting docs; LangSmith's alert metric names evolve.

Pass / FailAi Platformhigh

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Langsmith
  • Ai Platform
  • Online Monitoring And Feedback

Recommended for

LangSmithLangSmith customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.