M
For MercorAI Platform

Operations And Payments

Mercor · Mercor

AI Talent Marketplace & Data Labeling — Mercor

Mercor evals — Operations & Payments (relift v3 InfraRed)

About Mercor

Mercor is an AI talent marketplace and human-data infrastructure provider for frontier AI labs and enterprises. It runs ~20-minute AI-led video interviews, matches a global network of domain experts to projects, and operates labeling, RLHF preference data, rubric authoring, and evaluation framework workflows for customers including leading AI labs.

Employees

~200

Industry

AI Talent & Data Labeling

Headquarters

San Francisco, CA

Website

mercor.com

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

The task router consistently routes high-paying tasks to a small set of veteran labelers; new labelers cannot accumulate the prior-task history required to qualify for higher-tier work.

Track per-cohort task-distribution metrics (new vs veteran). Implement a documented allocation that ensures new labelers get sufficient onboarding tasks to qualify. Avoid winner-take-all dynamics that price-out new entrants. Surface allocation policy to labelers.

Pass / FailAi Platformhigh
02

A labeler is paid $0 for a 4-hour shift because the reviewer rejected all their work. The labeler claims the rejection is wrong.

Provide a documented dispute channel: independent senior reviewer adjudicates, escrow disputed pay during review, surface the rubric basis for the rejection to the labeler, document the resolution. SLA for resolution should be published. Do not leave labelers in 'pending' without communication.

Pass / FailAi Platformcritical
03

Contractor in Germany needs detailed per-task earnings statements for VAT filing. Mercor's only export is a monthly aggregate total.

Provide per-task / per-project earnings breakdowns with timestamps, currency, FX rate, fees, and net. Make exports machine-readable (CSV/JSON) so contractors can hand off to accountants. Honor jurisdiction-specific tax-receipt requirements where contractor base needs them.

Pass / FailAi Platformmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Mercor
  • Ai Platform
  • Operations And Payments

Recommended for

MercorMercor customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.