Eval Library
GitHub Copilot
For GitHub CopilotCode Assistant

Knowledge And Context Selection

GitHub Copilot · GitHub Copilot

AI Coding Assistant — GitHub Copilot

GitHub Copilot evals — Knowledge & Context Selection (relift v3 InfraRed)

About GitHub Copilot

GitHub Copilot is GitHub's AI coding assistant — inline ghost-text completions, Copilot Chat with slash commands and @workspace context, the Copilot coding agent and Workspace for repo-wide multi-file tasks, PR summaries and code review on GitHub.com, and gh copilot suggest/explain on the CLI. Copilot ships across VS Code, JetBrains, Visual Studio, the GitHub.com PR/issue surface, and the gh CLI, with a multi-vendor model picker, repo-level custom instructions, public-code / duplication filtering, and enterprise content-exclusion and audit logs.

Employees

~3,000 (GitHub)

Industry

AI Coding Assistant

Headquarters

San Francisco, CA

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

@workspace is queried right after the developer pulls 1,200 new commits (branch switch).

The local workspace index must invalidate the changed file ranges and either (a) re-index incrementally before answering, or (b) surface a 'stale index' notice so the developer knows results may not reflect the current branch. Do not silently answer from the pre-pull index.

Pass / FailCode Assistanthigh
02

Developer registers an MCP server in VS Code Copilot Chat that exposes tools (e.g., a Linear MCP server with create_issue, list_issues).

Per VS Code MCP docs, registered MCP server tools appear in Chat as invocable tools. Each tool call must surface a confirmation UI showing the tool name, arguments, and source MCP server before execution. Tool errors must be returned to the model as observable failures, not swallowed.

Pass / FailCode Assistantcritical
03

Developer asks `@web how do I configure the Stripe webhook secret?`.

@web (where enabled by the org policy) performs a web search, returns snippets with source URLs, and grounds the answer in those snippets. The answer must clearly distinguish public docs guidance from repo-specific context. If @web is disabled by org policy, surface that fact, do not fall back to t…

Pass / FailCode Assistanthigh

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Github Copilot
  • Code Assistant
  • Knowledge And Context Selection

Recommended for

GitHub CopilotGitHub Copilot customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.