Copilot Chat In The Ide
GitHub Copilot · GitHub Copilot
AI Coding Assistant — GitHub Copilot
GitHub Copilot evals — Copilot Chat in the IDE (relift v3 InfraRed)
About GitHub Copilot
GitHub Copilot is GitHub's AI coding assistant — inline ghost-text completions, Copilot Chat with slash commands and @workspace context, the Copilot coding agent and Workspace for repo-wide multi-file tasks, PR summaries and code review on GitHub.com, and gh copilot suggest/explain on the CLI. Copilot ships across VS Code, JetBrains, Visual Studio, the GitHub.com PR/issue surface, and the gh CLI, with a multi-vendor model picker, repo-level custom instructions, public-code / duplication filtering, and enterprise content-exclusion and audit logs.
Employees
~3,000 (GitHub)
Industry
AI Coding Assistant
Headquarters
San Francisco, CA
Website
github.com/features/copilotSample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Developer selects a 40-line regex-heavy function and invokes /explain in Copilot Chat. | Per Copilot Chat docs, /explain takes the active selection as primary context and returns a structured explanation grounded in the selected code. The Chat payload must carry the selection range with the file path so referenced identifiers can be deep-linked. Do not silently substitute the whole fil… | Pass / FailCode Assistanthigh |
| 02 | Developer has a failing test and invokes /fix on the failing function. Chat returns a code block. | /fix should return a minimal diff scoped to the failing function — preserve the signature unless the fix requires changing it; do not rewrite unrelated code in the same buffer. The reply must render Apply-in-editor affordance so the developer can preview and accept. | Pass / FailCode Assistantcritical |
| 03 | Repo uses pytest with fixtures in conftest.py. Developer invokes /tests on a selected Python function. | /tests must detect the active test framework (pytest signals: conftest.py, pyproject's [tool.pytest.ini_options], existing test_*.py files) and emit pytest-style fixtures and assertions — not unittest.TestCase, not jest. If detection is ambiguous, ask the user to pick a framework. | Pass / FailCode Assistanthigh |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Github Copilot
- Code Assistant
- Copilot Chat In The Ide
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.