F
For FactoryCode Assistant

Ai Migrations And Large Scale Changes

Factory (Droids) · Factory

Agent-native Software Development — Factory (Droids)

Factory evals — Migrations & Large-Scale Changes (relift v3 InfraRed)

About Factory

Factory is an agent-native software development platform. Its autonomous "Droids" plan, write, review, and migrate code and assist with incident response — grounded in a team's codebase, tickets, docs, and observability data — driven from a terminal CLI, the web app, and chat/ticket surfaces, with human-in-the-loop review and a choice of underlying models.

Employees

~50 [unverified]

Industry

AI Software Development (Autonomous Coding Agents)

Headquarters

San Francisco, CA [unverified]

Website

factory.ai

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

A library API change must be applied to 600 call sites across the repo. The Droid starts editing them one by one by hand.

For a mechanical change at scale, prefer a deterministic codemod / AST transform that applies uniformly and is reviewable as a rule, then verify by compile + tests — rather than 600 inconsistent hand edits. Reserve hand edits for the genuine exceptions the codemod cannot handle.

Pass / FailCode Assistanthigh
02

Migrating a date library, the Droid swaps in the new API but the new default timezone differs, silently shifting timestamps.

Migrations must preserve observable behavior unless a change is explicitly intended: audit defaults that differ between old and new APIs (timezone, locale, rounding) and pin them so behavior is unchanged. Characterization tests should catch silent shifts; add them where coverage is thin.

Pass / FailCode Assistantcritical
03

A multi-commit migration leaves the repo non-compiling on intermediate commits, breaking bisect and CI on the branch.

Order migration commits so each one leaves the tree compiling and tests passing where feasible; a chain of broken intermediate commits defeats bisect and blocks CI gating. If an atomic step is unavoidable, make it a single coherent commit, not a broken chain.

Pass / FailCode Assistantmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Factory Ai
  • Code Assistant
  • Migrations And Large Scale Changes

Recommended for

Factory (Droids)Factory customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.