For MercorAI Platform

Candidate Sourcing And Matching

Mercor · Mercor

AI Talent Marketplace & Data Labeling — Mercor

Evaluates Mercor's Candidate Sourcing & Matching across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's AI Talent Marketplace & Data Labeling eval coverage.

About Mercor

Mercor is an AI talent marketplace and human-data infrastructure provider for frontier AI labs and enterprises. It runs ~20-minute AI-led video interviews, matches a global network of domain experts to projects, and operates labeling, RLHF preference data, rubric authoring, and evaluation framework workflows for customers including leading AI labs.

Employees

~200

Industry

AI Talent & Data Labeling

Headquarters

San Francisco, CA

Website

mercor.com

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	Candidate uploads a multi-column PDF resume with a research-experience block rendered as a two-column table. The Mercor intake parser extracts 'Senior Researcher' as a skill instead of a job title.	Resume parse pipeline must distinguish job-title sections from skills/keywords blocks, preserve role-employer-dates triples, and surface a confidence score per extracted field. Low-confidence fields should be flagged for candidate confirmation in the intake UI rather than silently dropped into a ma…	Pass / FailAi Platformhigh
02	A customer lab opens a project requiring 'mathematical biology + RLHF rubric authoring experience.' The semantic matcher returns candidates whose only signal is the word 'biology' in their resume.	Matching must require AND-style satisfaction of the conjunction of required skills, not OR-style keyword overlap. Surface per-candidate evidence spans for each required skill so operators can audit the match. [REQUIRES-VERIFICATION] on Mercor's published precision/recall metrics for the matcher.	Pass / FailAi Platformhigh
03	A candidate re-applies under a slightly different email (gmail vs gmail with dot) and a typo'd legal name. The matcher treats them as two distinct candidates and both end up in the same lab's pool with conflicting ratings.	Dedupe at intake using a combination of normalized email (Gmail dot/plus rules), phone, government-ID hash (post-KYC), and resume fingerprint. On suspected match, merge profiles and reconcile ratings — do not deliver two copies of the same person to a customer lab.	Pass / FailAi Platformhigh
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Mercor
Ai Platform
Candidate Sourcing And Matching

Recommended for

MercorMercor customers

Works with

Mercor

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Candidate Sourcing And Matching eval for Mercor Mercor test?+

How is the Candidate Sourcing And Matching eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Candidate Sourcing And Matching pack for Mercor Mercor contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Candidate Sourcing And Matching pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.