For SourcegraphCode Assistant

Batch Changes

Sourcegraph (Cody + Amp) · Sourcegraph

Code Intelligence — Sourcegraph

Evaluates Sourcegraph's Batch Changes across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Code Intelligence eval coverage.

About Sourcegraph

Sourcegraph is a code intelligence and AI coding platform: universal code search, precise code navigation, Cody chat grounded in your codebase, cross-repo batch changes, and the Amp autonomous agent — deployed across large enterprise codebases.

Employees

~150

Industry

Code Intelligence

Headquarters

San Francisco, CA

Website

sourcegraph.com

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	Operator hand-writes a batch spec with top-level `name`, `on`, `steps`, `changesetTemplate`. They forget the `on:` block.	Per docs/batch_changes, `on:` is required: it lists `repositoriesMatchingQuery` and/or `repository` entries that scope the change. Without it the spec is invalid. Reject the spec at `src batch preview` time; cite the schema reference.	Pass / FailCode Assistantcritical
02	Spec previously generated 50 changesets. Operator re-runs `src batch apply` with the same spec on the same batch change.	Per docs/batch_changes, batch changes reconcile against the named batch change — re-applying the same spec converges to the same target state (no duplicate changesets). Changes in the spec produce diffs against existing changesets, which can be republished. Verify by namespace + batch-change name; …	Pass / FailCode Assistanthigh
03	Operator runs `src batch apply -f spec.yaml` directly without a preview, publishing 240 changesets in one shot.	Per docs/batch_changes, `src batch preview -f spec.yaml` should always precede apply on multi-repo changes — preview returns a URL where each changeset diff can be reviewed by humans before publish. `src batch apply` should only be used after preview (or for trivial single-repo specs).	Pass / FailCode Assistantcritical
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Sourcegraph
Code Assistant
Batch Changes

Recommended for

Sourcegraph (Cody + Amp)Sourcegraph customers

Works with

Sourcegraph

Related evals

Code Assistant

Browserbase

Evaluates Browserbase's Captcha Handling across scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Browser infrastructure eval coverage.

View Code Assistant

Browserbase

Evaluates Browserbase's Concurrency & Rate Limits across scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Browser infrastructure eval coverage.

View Code Assistant

Browserbase

Evaluates Browserbase's Live Debugging & Session Inspector across scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Browser infrastructure eval coverage.

View

Frequently asked questions

What does the Batch Changes eval for Sourcegraph Sourcegraph (Cody + Amp) test?+

Evaluates Sourcegraph's Batch Changes across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Code Intelligence eval coverage.

How is the Batch Changes eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Batch Changes pack for Sourcegraph Sourcegraph (Cody + Amp) contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Batch Changes pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.