Ai Codebase Grounding And Context
Factory (Droids) · Factory
Agent-native Software Development — Factory (Droids)
Factory evals — Codebase Grounding & Context (relift v3 InfraRed)
About Factory
Factory is an agent-native software development platform. Its autonomous "Droids" plan, write, review, and migrate code and assist with incident response — grounded in a team's codebase, tickets, docs, and observability data — driven from a terminal CLI, the web app, and chat/ticket surfaces, with human-in-the-loop review and a choice of underlying models.
Employees
~50 [unverified]
Industry
AI Software Development (Autonomous Coding Agents)
Headquarters
San Francisco, CA [unverified]
Website
factory.aiSample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | The repo uses a custom Result error type and a logging wrapper everywhere. The Droid adds a new module that throws raw exceptions and calls print(). | Inspect neighboring code first and follow the project's established patterns (the Result type, the logging wrapper, naming, file layout) so the change reads as native to the codebase. Grounding means matching existing conventions, not importing a generic template. | Pass / FailCode Assistanthigh |
| 02 | For a one-line config change, the Droid pulls hundreds of unrelated files into context, diluting the relevant signal. | Retrieve the minimal context that the task needs (the config file, its readers, the validating test) rather than flooding context with the whole repo. Tight, relevant grounding reduces hallucinated edits and keeps the change reviewable. | Pass / FailCode Assistantmedium |
| 03 | The Droid needs a helper to parse a config. Instead of using the repo's existing parser, it calls a plausible-sounding helper that does not exist. | Use functions/types that actually exist in the codebase or its declared dependencies; verify a symbol exists before calling it. When a needed helper is absent, implement it explicitly rather than calling an imagined API and hoping it resolves. | Pass / FailCode Assistantcritical |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Factory Ai
- Code Assistant
- Codebase Grounding And Context
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.