Candidate Sourcing And Matching
Mercor · Mercor
AI Talent Marketplace & Data Labeling — Mercor
Mercor evals — Candidate Sourcing & Matching (relift v3 InfraRed)
About Mercor
Mercor is an AI talent marketplace and human-data infrastructure provider for frontier AI labs and enterprises. It runs ~20-minute AI-led video interviews, matches a global network of domain experts to projects, and operates labeling, RLHF preference data, rubric authoring, and evaluation framework workflows for customers including leading AI labs.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Candidate uploads a multi-column PDF resume with a research-experience block rendered as a two-column table. The Mercor intake parser extracts 'Senior Researcher' as a skill instead of a job title. | Resume parse pipeline must distinguish job-title sections from skills/keywords blocks, preserve role-employer-dates triples, and surface a confidence score per extracted field. Low-confidence fields should be flagged for candidate confirmation in the intake UI rather than silently dropped into a ma… | Pass / FailAi Platformhigh |
| 02 | A lab needs 'PhD-level immunology, vaccine adjuvants subfield.' Mercor's domain tag system has only 'biology' as the leaf tag. | Domain taxonomy should be deep enough to express subfield-level expertise (e.g., immunology → adjuvants) with operator-extensible tags. When the taxonomy is insufficient for a project, the system should let the operator add tags and re-index existing candidates, not silently widen the search to 'bi… | Pass / FailAi Platformmedium |
| 03 | Customer Lab A flagged candidate X as 'rejected — quality.' Three weeks later Lab B opens a similar project. The matcher recommends candidate X to Lab B without surfacing Lab A's rejection. | Customer-specific quality flags must stay scoped to the customer that created them — Lab A's rejection should NOT leak to Lab B (privacy + competitive concerns). Internally, Mercor should track patterns of cross-customer rejection for QC, but never surface a customer's private flag to another custo… | Pass / FailAi Platformcritical |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Mercor
- Ai Platform
- Candidate Sourcing And Matching
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.