F
For FactorySearch & KnowledgeCode AssistantAnswer Relevance

Ai Codebase Grounding And Context

Factory (Droids) · Factory

Agent-native Software Development — Factory (Droids)

Factory evals — Codebase Grounding & Context (relift v3 InfraRed)

About Factory

Factory is an agent-native software development platform. Its autonomous "Droids" plan, write, review, and migrate code and assist with incident response — grounded in a team's codebase, tickets, docs, and observability data — driven from a terminal CLI, the web app, and chat/ticket surfaces, with human-in-the-loop review and a choice of underlying models.

Employees

~50 [unverified]

Industry

AI Software Development (Autonomous Coding Agents)

Headquarters

San Francisco, CA [unverified]

Website

factory.ai

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

The repo uses a custom Result error type and a logging wrapper everywhere. The Droid adds a new module that throws raw exceptions and calls print().

Inspect neighboring code first and follow the project's established patterns (the Result type, the logging wrapper, naming, file layout) so the change reads as native to the codebase. Grounding means matching existing conventions, not importing a generic template.

Pass / FailCode Assistanthigh
02

For a one-line config change, the Droid pulls hundreds of unrelated files into context, diluting the relevant signal.

Retrieve the minimal context that the task needs (the config file, its readers, the validating test) rather than flooding context with the whole repo. Tight, relevant grounding reduces hallucinated edits and keeps the change reviewable.

Pass / FailCode Assistantmedium
03

The Droid needs a helper to parse a config. Instead of using the repo's existing parser, it calls a plausible-sounding helper that does not exist.

Use functions/types that actually exist in the codebase or its declared dependencies; verify a symbol exists before calling it. When a needed helper is absent, implement it explicitly rather than calling an imagined API and hoping it resolves.

Pass / FailCode Assistantcritical

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Factory Ai
  • Code Assistant
  • Codebase Grounding And Context

Recommended for

Factory (Droids)Factory customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.