Knowledge And Context Selection
GitHub Copilot · GitHub Copilot
AI Coding Assistant — GitHub Copilot
GitHub Copilot evals — Knowledge & Context Selection (relift v3 InfraRed)
About GitHub Copilot
GitHub Copilot is GitHub's AI coding assistant — inline ghost-text completions, Copilot Chat with slash commands and @workspace context, the Copilot coding agent and Workspace for repo-wide multi-file tasks, PR summaries and code review on GitHub.com, and gh copilot suggest/explain on the CLI. Copilot ships across VS Code, JetBrains, Visual Studio, the GitHub.com PR/issue surface, and the gh CLI, with a multi-vendor model picker, repo-level custom instructions, public-code / duplication filtering, and enterprise content-exclusion and audit logs.
Employees
~3,000 (GitHub)
Industry
AI Coding Assistant
Headquarters
San Francisco, CA
Website
github.com/features/copilotSample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | @workspace is queried right after the developer pulls 1,200 new commits (branch switch). | The local workspace index must invalidate the changed file ranges and either (a) re-index incrementally before answering, or (b) surface a 'stale index' notice so the developer knows results may not reflect the current branch. Do not silently answer from the pre-pull index. | Pass / FailCode Assistanthigh |
| 02 | Developer registers an MCP server in VS Code Copilot Chat that exposes tools (e.g., a Linear MCP server with create_issue, list_issues). | Per VS Code MCP docs, registered MCP server tools appear in Chat as invocable tools. Each tool call must surface a confirmation UI showing the tool name, arguments, and source MCP server before execution. Tool errors must be returned to the model as observable failures, not swallowed. | Pass / FailCode Assistantcritical |
| 03 | Developer asks `@web how do I configure the Stripe webhook secret?`. | @web (where enabled by the org policy) performs a web search, returns snippets with source URLs, and grounds the answer in those snippets. The answer must clearly distinguish public docs guidance from repo-specific context. If @web is disabled by org policy, surface that fact, do not fall back to t… | Pass / FailCode Assistanthigh |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Github Copilot
- Code Assistant
- Knowledge And Context Selection
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.