For GitHub CopilotCode Assistant

Copilot In Github Dot Com And Pr Review

GitHub Copilot · GitHub Copilot

AI Coding Assistant — GitHub Copilot

Evaluates GitHub Copilot's Copilot in GitHub.com & PR Review across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's AI Coding Assistant eval coverage.

About GitHub Copilot

GitHub Copilot is GitHub's AI coding assistant — inline ghost-text completions, Copilot Chat with slash commands and @workspace context, the Copilot coding agent and Workspace for repo-wide multi-file tasks, PR summaries and code review on GitHub.com, and gh copilot suggest/explain on the CLI. Copilot ships across VS Code, JetBrains, Visual Studio, the GitHub.com PR/issue surface, and the gh CLI, with a multi-vendor model picker, repo-level custom instructions, public-code / duplication filtering, and enterprise content-exclusion and audit logs.

Employees

~3,000 (GitHub)

Industry

AI Coding Assistant

Headquarters

San Francisco, CA

Website

github.com/features/copilot

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	Developer opens a PR with 14 changed files. They click 'Generate Copilot summary'.	Summary must be grounded strictly in the actual diff — each summary section must correspond to a real changed file / hunk. Do not introduce claims about files not in the diff. Render a 'changes walkthrough' section that maps to the file tree so the reviewer can audit.	Pass / FailCode Assistanthigh
02	Developer requests Copilot as a reviewer on a PR. Copilot posts inline review comments.	Per Copilot code review docs, requesting Copilot as a reviewer triggers an asynchronous review that posts file-anchored inline comments tied to specific lines. Each comment must include a suggestion or actionable observation, not vague praise. Review state is updated to 'Commented' (not 'Approved').	Pass / FailCode Assistantcritical
03	Developer addresses Copilot review comments and force-pushes a new commit. They want Copilot to re-review the latest revision.	Copilot must scope the re-review to the diff between the previously reviewed commit SHA and the new HEAD; do not re-comment on already-addressed lines unless they are still affected. Outdated comments on stale ranges must be marked outdated by the review machinery.	Pass / FailCode Assistanthigh
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Github Copilot
Code Assistant
Copilot In Github Dot Com And Pr Review

Recommended for

GitHub CopilotGitHub Copilot customers

Works with

GitHub Copilot

Related evals

Code Assistant

Browserbase

Evaluates Browserbase's Captcha Handling across scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Browser infrastructure eval coverage.

View Code Assistant

Browserbase

Evaluates Browserbase's Concurrency & Rate Limits across scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Browser infrastructure eval coverage.

View Code Assistant

Browserbase

Evaluates Browserbase's Live Debugging & Session Inspector across scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Browser infrastructure eval coverage.

View

Frequently asked questions

What does the Copilot In Github Dot Com And Pr Review eval for GitHub Copilot GitHub Copilot test?+

How is the Copilot In Github Dot Com And Pr Review eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Copilot In Github Dot Com And Pr Review pack for GitHub Copilot GitHub Copilot contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Copilot In Github Dot Com And Pr Review pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.