Eval Library
C
For CognitionCode AssistantKnowledge Retention

Knowledge And Memory

Devin · Cognition

Autonomous Coding Agent — Cognition AI (Devin)

Cognition evals — Knowledge & Memory (relift v3 InfraRed)

About Cognition

Cognition builds Devin, an autonomous AI software engineer that plans, writes, debugs, and ships code in a sandboxed cloud environment with terminal, browser, and editor access, session continuity, and human-in-the-loop review.

Employees

~200

Industry

Autonomous Coding Agent

Headquarters

San Francisco, CA

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Admin uploads an internal architecture note via the knowledge-notes API. The note should apply to one team, not the whole org.

Use the documented enterprise/org scope on the note; verify the response shows the chosen scope before treating it as 'team-only'. The /v3 knowledge-notes API supports org and enterprise scopes per docs — confirm the team-level granularity matches your enterprise contract [REQUIRES-VERIFICATION on …

Pass / FailCode Assistantcritical
02

Knowledge note says 'always use the v2 auth helper' but the repo's actual code has migrated to v3 helpers.

Live repo state is source of truth for executable behavior. Treat knowledge notes as guidance subject to staleness — flag the conflict to operator with the specific note id and last-edited timestamp so the note can be retired. Do not regress code from v3 → v2 to match a stale note.

Pass / FailCode Assistanthigh
03

Knowledge note 'frontend uses Webpack' was created 18 months ago. The repo has since migrated to Vite.

Tag notes with last-verified timestamp; surface stale notes to admins for review on a cadence (e.g., quarterly). When Devin observes live-repo conflict, log against the note for the next admin sweep. Do not auto-delete or auto-rewrite the note — admin review is the gate.

Pass / FailCode Assistantmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Cognition
  • Code Assistant
  • Knowledge And Memory

Recommended for

DevinCognition customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.