For SourcegraphCode Assistant

Code Insights And Ownership

Sourcegraph (Cody + Amp) · Sourcegraph

Code Intelligence — Sourcegraph

Evaluates Sourcegraph's Code Insights & Ownership across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Code Intelligence eval coverage.

About Sourcegraph

Sourcegraph is a code intelligence and AI coding platform: universal code search, precise code navigation, Cody chat grounded in your codebase, cross-repo batch changes, and the Amp autonomous agent — deployed across large enterprise codebases.

Employees

~150

Industry

Code Intelligence

Headquarters

San Francisco, CA

Website

sourcegraph.com

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	Operator wants a weekly trend of `TODO(security)` count across the codebase. Defines a Code Insight with a search query and a 30-day window.	Per docs/code_insights/references, a search-based insight runs the query at each sample point against historical commits, returning a time series. Verify the query is well-scoped (`type:diff` for added/removed counts or plain content for current state). Sampling cadence and backfill horizon are con…	Pass / FailCode Assistantmedium
02	Operator wants a chart of language version counts: each capture of `node>=([0-9.]+)` in `package.json` is grouped into a series.	Per docs/code_insights, capture-group insights use regexp patterntype with a single capture group whose match becomes the series key. Verify the regexp is anchored and the capture group is non-greedy enough to avoid swallowing trailing JSON. Cite the capture-group insight guide.	Pass / FailCode Assistantmedium
03	Operator creates a personal dashboard with 6 insights, then asks why teammates do not see it.	Per docs/code_insights, dashboards have user/organization/global scope. Personal dashboards are not visible to others. Operator should re-scope the dashboard to org or share the insight URLs; viewers still see only repos their ACL covers.	Pass / FailCode Assistantmedium
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Sourcegraph
Code Assistant
Code Insights And Ownership

Recommended for

Sourcegraph (Cody + Amp)Sourcegraph customers

Works with

Sourcegraph

Related evals

Code Assistant

Browserbase

Evaluates Browserbase's Captcha Handling across scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Browser infrastructure eval coverage.

View Code Assistant

Browserbase

Evaluates Browserbase's Concurrency & Rate Limits across scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Browser infrastructure eval coverage.

View Code Assistant

Browserbase

Evaluates Browserbase's Live Debugging & Session Inspector across scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Browser infrastructure eval coverage.

View

Frequently asked questions

What does the Code Insights And Ownership eval for Sourcegraph Sourcegraph (Cody + Amp) test?+

Evaluates Sourcegraph's Code Insights & Ownership across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Code Intelligence eval coverage.

How is the Code Insights And Ownership eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Code Insights And Ownership pack for Sourcegraph Sourcegraph (Cody + Amp) contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Code Insights And Ownership pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.