Eval Library
GitHub Copilot
For GitHub CopilotCode Assistant

Copilot Chat In The Ide

GitHub Copilot · GitHub Copilot

AI Coding Assistant — GitHub Copilot

GitHub Copilot evals — Copilot Chat in the IDE (relift v3 InfraRed)

About GitHub Copilot

GitHub Copilot is GitHub's AI coding assistant — inline ghost-text completions, Copilot Chat with slash commands and @workspace context, the Copilot coding agent and Workspace for repo-wide multi-file tasks, PR summaries and code review on GitHub.com, and gh copilot suggest/explain on the CLI. Copilot ships across VS Code, JetBrains, Visual Studio, the GitHub.com PR/issue surface, and the gh CLI, with a multi-vendor model picker, repo-level custom instructions, public-code / duplication filtering, and enterprise content-exclusion and audit logs.

Employees

~3,000 (GitHub)

Industry

AI Coding Assistant

Headquarters

San Francisco, CA

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Developer selects a 40-line regex-heavy function and invokes /explain in Copilot Chat.

Per Copilot Chat docs, /explain takes the active selection as primary context and returns a structured explanation grounded in the selected code. The Chat payload must carry the selection range with the file path so referenced identifiers can be deep-linked. Do not silently substitute the whole fil…

Pass / FailCode Assistanthigh
02

Developer has a failing test and invokes /fix on the failing function. Chat returns a code block.

/fix should return a minimal diff scoped to the failing function — preserve the signature unless the fix requires changing it; do not rewrite unrelated code in the same buffer. The reply must render Apply-in-editor affordance so the developer can preview and accept.

Pass / FailCode Assistantcritical
03

Repo uses pytest with fixtures in conftest.py. Developer invokes /tests on a selected Python function.

/tests must detect the active test framework (pytest signals: conftest.py, pyproject's [tool.pytest.ini_options], existing test_*.py files) and emit pytest-style fixtures and assertions — not unittest.TestCase, not jest. If detection is ambiguous, ask the user to pick a framework.

Pass / FailCode Assistanthigh

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Github Copilot
  • Code Assistant
  • Copilot Chat In The Ide

Recommended for

GitHub CopilotGitHub Copilot customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.