Online Monitoring And Feedback
LangSmith · LangSmith
LLM Observability & Evaluation Platform — LangSmith (LangChain)
LangSmith evals — Online Monitoring & Feedback (relift v3 InfraRed)
About LangSmith
LangSmith is LangChain's LLM observability and evaluation platform: tracing, datasets, evaluators (LLM-as-judge, code, and human), experiments, prompt management, and online monitoring used by AI teams to measure and improve LLM apps in production.
Employees
~200
Industry
LLM Observability
Headquarters
San Francisco, CA
Website
www.langchain.com/langsmithSample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Operator wants a Slack alert when error rate on a production project exceeds 5% over a 10-minute window. | Create a project-scoped alert rule: metric='error_rate', threshold=0.05, window=10m, recipients=[slack-webhook]. The rule fires only when both threshold and window conditions are met. Use a separate alert rule per severity (warning vs page). Confirm via the 'test alert' button in UI. | Pass / FailAi Platformhigh |
| 02 | Operator routes critical-severity LangSmith alerts to PagerDuty. | Connect via the documented PagerDuty integration (events API key). Each alert rule maps to a service. LangSmith deduplicates 'same alert within one hour' on its end per docs; PagerDuty applies its own dedup window. Confirm end-to-end with a test alert. | Pass / FailAi Platformmedium |
| 03 | Operator wants to page when LLM-run P99 latency exceeds 8 seconds over 5 minutes. | Configure a latency alert with metric='p99_latency_ms', threshold=8000, window=5m. Scope to run_type='llm' to exclude tool runs. Bake a [REQUIRES-VERIFICATION] note on the exact metric key string against current alerting docs; LangSmith's alert metric names evolve. | Pass / FailAi Platformhigh |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Langsmith
- Ai Platform
- Online Monitoring And Feedback
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.