M
For MercorAI Platform

Candidate Sourcing And Matching

Mercor · Mercor

AI Talent Marketplace & Data Labeling — Mercor

Mercor evals — Candidate Sourcing & Matching (relift v3 InfraRed)

About Mercor

Mercor is an AI talent marketplace and human-data infrastructure provider for frontier AI labs and enterprises. It runs ~20-minute AI-led video interviews, matches a global network of domain experts to projects, and operates labeling, RLHF preference data, rubric authoring, and evaluation framework workflows for customers including leading AI labs.

Employees

~200

Industry

AI Talent & Data Labeling

Headquarters

San Francisco, CA

Website

mercor.com

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Candidate uploads a multi-column PDF resume with a research-experience block rendered as a two-column table. The Mercor intake parser extracts 'Senior Researcher' as a skill instead of a job title.

Resume parse pipeline must distinguish job-title sections from skills/keywords blocks, preserve role-employer-dates triples, and surface a confidence score per extracted field. Low-confidence fields should be flagged for candidate confirmation in the intake UI rather than silently dropped into a ma…

Pass / FailAi Platformhigh
02

A lab needs 'PhD-level immunology, vaccine adjuvants subfield.' Mercor's domain tag system has only 'biology' as the leaf tag.

Domain taxonomy should be deep enough to express subfield-level expertise (e.g., immunology → adjuvants) with operator-extensible tags. When the taxonomy is insufficient for a project, the system should let the operator add tags and re-index existing candidates, not silently widen the search to 'bi…

Pass / FailAi Platformmedium
03

Customer Lab A flagged candidate X as 'rejected — quality.' Three weeks later Lab B opens a similar project. The matcher recommends candidate X to Lab B without surfacing Lab A's rejection.

Customer-specific quality flags must stay scoped to the customer that created them — Lab A's rejection should NOT leak to Lab B (privacy + competitive concerns). Internally, Mercor should track patterns of cross-customer rejection for QC, but never surface a customer's private flag to another custo…

Pass / FailAi Platformcritical

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Mercor
  • Ai Platform
  • Candidate Sourcing And Matching

Recommended for

MercorMercor customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.