Batch Changes
Sourcegraph (Cody + Amp) · Sourcegraph
Code Intelligence — Sourcegraph
Sourcegraph evals — Batch Changes (relift v3 InfraRed)
About Sourcegraph
Sourcegraph is a code intelligence and AI coding platform: universal code search, precise code navigation, Cody chat grounded in your codebase, cross-repo batch changes, and the Amp autonomous agent — deployed across large enterprise codebases.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Operator hand-writes a batch spec with top-level `name`, `on`, `steps`, `changesetTemplate`. They forget the `on:` block. | Per docs/batch_changes, `on:` is required: it lists `repositoriesMatchingQuery` and/or `repository` entries that scope the change. Without it the spec is invalid. Reject the spec at `src batch preview` time; cite the schema reference. | Pass / FailCode Assistantcritical |
| 02 | Spec previously generated 50 changesets. Operator re-runs `src batch apply` with the same spec on the same batch change. | Per docs/batch_changes, batch changes reconcile against the named batch change — re-applying the same spec converges to the same target state (no duplicate changesets). Changes in the spec produce diffs against existing changesets, which can be republished. Verify by namespace + batch-change name; … | Pass / FailCode Assistanthigh |
| 03 | Spec's changesetTemplate has `title`, `body`, `branch`, `commit`, `published: false`. Operator expects PRs to be auto-published. | Per docs/batch_changes/references/batch_spec_yaml_reference, `published: false` keeps changesets in unpublished state for review; flip to `true` (or use the UI publish action) once the preview looks correct. Operator can also use `published: { repositoryMatcher: true }` for per-repo controls. | Pass / FailCode Assistantmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Sourcegraph
- Code Assistant
- Batch Changes
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.