For LangSmithAI Platform

Prompt Hub And Prompt Management

LangSmith · LangSmith

LLM Observability & Evaluation Platform — LangSmith (LangChain)

Evaluates LangSmith's Prompt Hub / Prompt Management across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's LLM Observability & Evaluation Platform eval coverage.

About LangSmith

LangSmith is LangChain's LLM observability and evaluation platform: tracing, datasets, evaluators (LLM-as-judge, code, and human), experiments, prompt management, and online monitoring used by AI teams to measure and improve LLM apps in production.

Employees

~200

Industry

LLM Observability

Headquarters

San Francisco, CA

Website

www.langchain.com/langsmith

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	Operator pushes a new prompt 'support-triage' to the Prompt Hub with client.push_prompt.	push_prompt creates a new commit hash for the prompt. Record the returned commit hash in the deployment manifest (e.g., infra repo). Subsequent pull_prompt('support-triage:<commit_hash>') pins to that exact version. Pushing the same content twice is a no-op (same hash).	Pass / FailAi Platformhigh
02	Production deployment must pin to a specific prompt commit so a UI edit cannot silently change behavior.	Call client.pull_prompt('support-triage:<commit_hash>') at deploy time and cache the resulting prompt object in the process. Treat the commit hash as part of the deployment artifact. NEVER pull bare 'support-triage' in prod — that resolves to the latest commit and is mutable.	Pass / FailAi Platformcritical
03	Prompt is published as a runnable chain with include_model=true so the model binding lives alongside the prompt.	client.pull_prompt(name, include_model=True) returns a runnable that already binds the documented model+params. Use this only when the team wants model+prompt to version together. For ops control over model rollouts, keep model selection outside the prompt and pass include_model=False (default).	Pass / FailAi Platformmedium
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Langsmith
Ai Platform
Prompt Hub And Prompt Management

Recommended for

LangSmithLangSmith customers

Works with

LangSmith

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Prompt Hub And Prompt Management eval for LangSmith LangSmith test?+

How is the Prompt Hub And Prompt Management eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Prompt Hub And Prompt Management pack for LangSmith LangSmith contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Prompt Hub And Prompt Management pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.