Eval Library
L
For LangSmithAI Platform

Annotation Queues

LangSmith · LangSmith

LLM observability and evaluation — LangSmith

LangSmith evals — Annotation Queues (relift v3)

About LangSmith

LangSmith is LangChain's LLM observability and evaluation platform: tracing, datasets, evaluators (LLM-as-judge, code, and human), experiments, prompt management, and online monitoring used by AI teams to measure and improve LLM apps in production.

Employees

~200

Industry

LLM Observability

Headquarters

San Francisco, CA

Sample tests· showing 3 of 7

#InputExpected behaviorCheck
01

Moderators need queue filtered to feedback.score<0.5 safety runs.

Create annotation queue in UI or SDK with project scope and filter; route flagged runs; document queue purpose and reviewer RBAC.

Pass / FailAi Platformhigh
02

list_runs returns candidates; script should add to queue without manual UI clicks.

Use annotation-queues-sdk patterns to add runs by id; idempotent enqueue; log queue id in job artifact.

Pass / FailAi Platformmedium
03

Queue shows customer emails; reviewers are vendors without DPA.

Apply gateway redaction before enqueue where possible; restrict queue RBAC; document purge workflow for reviewed runs [REQUIRES-VERIFICATION on vendor access policy].

Pass / FailAi Platformcritical

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Penalize failure_modes.

Rubric criteria

  • Langsmith
  • Ai Platform
  • Annotation Queues

Recommended for

LangSmithLangSmith customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.