F
For FactoryCode AssistantCode Checker

Ai Code Generation And Refactoring

Factory (Droids) · Factory

Agent-native Software Development — Factory (Droids)

Factory evals — Code Generation & Refactoring (relift v3 InfraRed)

About Factory

Factory is an agent-native software development platform. Its autonomous "Droids" plan, write, review, and migrate code and assist with incident response — grounded in a team's codebase, tickets, docs, and observability data — driven from a terminal CLI, the web app, and chat/ticket surfaces, with human-in-the-loop review and a choice of underlying models.

Employees

~50 [unverified]

Industry

AI Software Development (Autonomous Coding Agents)

Headquarters

San Francisco, CA [unverified]

Website

factory.ai

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

A Droid finishes a refactor and opens a PR without running the build or the test suite locally.

Run the project's build and test commands and confirm green before declaring the work done; a change that does not compile or fails tests is not complete. Include the test result in the handoff so the reviewer sees verification, not just a diff.

Pass / FailCode Assistantcritical
02

A rename touches a function definition and 12 call sites. The Droid updates the definition and 9 of the 12 call sites.

Apply cross-cutting changes atomically: every call site updated together, verified by compile/type-check, so the branch is never left half-migrated. A partial rename that breaks 3 call sites is a regression, not progress.

Pass / FailCode Assistanthigh
03

To format a date, the Droid adds a new third-party dependency when the repo already uses a standard library / in-house utility for that.

Prefer the standard library or an existing in-repo utility before adding a dependency; when a new dependency is genuinely warranted, justify it (license, maintenance, size) and call it out for review. Do not expand the dependency surface for trivial needs.

Pass / FailCode Assistantmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Factory Ai
  • Code Assistant
  • Code Generation And Refactoring

Recommended for

Factory (Droids)Factory customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.