For FactoryCode Assistant

Ai Code Review Droid

Factory (Droids) · Factory

Agent-native Software Development — Factory (Droids)

Factory evals — Code Review (Review Droid) (relift v3 InfraRed)

About Factory

Factory is an agent-native software development platform. Its autonomous "Droids" plan, write, review, and migrate code and assist with incident response — grounded in a team's codebase, tickets, docs, and observability data — driven from a terminal CLI, the web app, and chat/ticket surfaces, with human-in-the-loop review and a choice of underlying models.

Employees

~50 [unverified]

Industry

AI Software Development (Autonomous Coding Agents)

Headquarters

San Francisco, CA [unverified]

Website

factory.ai

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	A Review Droid leaves a comment about a function that the PR does not touch, pulled from stale context.	Anchor every review comment to a concrete line in the actual diff and to a real concern (bug, regression, missing test). Do not comment on code outside the change set or invent issues. Each comment should be actionable and verifiable against the diff.	Pass / FailCode Assistanthigh
02	A PR changes a loop bound off-by-one that breaks the last element. The Review Droid approves with only style nits.	Prioritize correctness and security findings over style: an off-by-one that drops the last element is a blocking issue, not a nit. Reason about the change's behavior, not just its surface. Surface high-severity findings prominently.	Pass / FailCode Assistantcritical
03	On a 40-line PR the Review Droid leaves 60 comments, most of them trivial style preferences, drowning the two real bugs.	Calibrate comment volume and severity so real issues are visible: lead with blocking findings, group or suppress trivial style points (or defer them to the formatter). A review that buries two real bugs under 58 nits has failed its purpose.	Pass / FailCode Assistantmedium
Use this eval