Eval Library
GitHub Copilot
For GitHub CopilotCode Assistant

Copilot In Github Dot Com And Pr Review

GitHub Copilot · GitHub Copilot

AI Coding Assistant — GitHub Copilot

GitHub Copilot evals — Copilot in GitHub.com & PR Review (relift v3 InfraRed)

About GitHub Copilot

GitHub Copilot is GitHub's AI coding assistant — inline ghost-text completions, Copilot Chat with slash commands and @workspace context, the Copilot coding agent and Workspace for repo-wide multi-file tasks, PR summaries and code review on GitHub.com, and gh copilot suggest/explain on the CLI. Copilot ships across VS Code, JetBrains, Visual Studio, the GitHub.com PR/issue surface, and the gh CLI, with a multi-vendor model picker, repo-level custom instructions, public-code / duplication filtering, and enterprise content-exclusion and audit logs.

Employees

~3,000 (GitHub)

Industry

AI Coding Assistant

Headquarters

San Francisco, CA

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Developer opens a PR with 14 changed files. They click 'Generate Copilot summary'.

Summary must be grounded strictly in the actual diff — each summary section must correspond to a real changed file / hunk. Do not introduce claims about files not in the diff. Render a 'changes walkthrough' section that maps to the file tree so the reviewer can audit.

Pass / FailCode Assistanthigh
02

Developer requests Copilot as a reviewer on a PR. Copilot posts inline review comments.

Per Copilot code review docs, requesting Copilot as a reviewer triggers an asynchronous review that posts file-anchored inline comments tied to specific lines. Each comment must include a suggestion or actionable observation, not vague praise. Review state is updated to 'Commented' (not 'Approved').

Pass / FailCode Assistantcritical
03

Repo has .github/copilot-instructions.md specifying a custom PR template (sections: Summary, Risk, Tests).

Per custom-instructions docs, the PR summary surface should honor repo-level instructions; output must use the configured section names rather than Copilot's default format. If instructions cannot be parsed, fall back to default with a notice.

Pass / FailCode Assistantmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Github Copilot
  • Code Assistant
  • Copilot In Github Dot Com And Pr Review

Recommended for

GitHub CopilotGitHub Copilot customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.