Annotation Queues
LangSmith · LangSmith
LLM observability and evaluation — LangSmith
LangSmith evals — Annotation Queues (relift v3)
About LangSmith
LangSmith is LangChain's LLM observability and evaluation platform: tracing, datasets, evaluators (LLM-as-judge, code, and human), experiments, prompt management, and online monitoring used by AI teams to measure and improve LLM apps in production.
Employees
~200
Industry
LLM Observability
Headquarters
San Francisco, CA
Website
www.langchain.com/langsmithSample tests· showing 3 of 7
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Moderators need queue filtered to feedback.score<0.5 safety runs. | Create annotation queue in UI or SDK with project scope and filter; route flagged runs; document queue purpose and reviewer RBAC. | Pass / FailAi Platformhigh |
| 02 | list_runs returns candidates; script should add to queue without manual UI clicks. | Use annotation-queues-sdk patterns to add runs by id; idempotent enqueue; log queue id in job artifact. | Pass / FailAi Platformmedium |
| 03 | Queue shows customer emails; reviewers are vendors without DPA. | Apply gateway redaction before enqueue where possible; restrict queue RBAC; document purge workflow for reviewed runs [REQUIRES-VERIFICATION on vendor access policy]. | Pass / FailAi Platformcritical |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Penalize failure_modes.
Rubric criteria
- Langsmith
- Ai Platform
- Annotation Queues
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.