M
For MercorAI Platform

Customer Lab Data Delivery

Mercor · Mercor

AI Talent Marketplace & Data Labeling — Mercor

Mercor evals — Customer / Lab Data Delivery (relift v3 InfraRed)

About Mercor

Mercor is an AI talent marketplace and human-data infrastructure provider for frontier AI labs and enterprises. It runs ~20-minute AI-led video interviews, matches a global network of domain experts to projects, and operates labeling, RLHF preference data, rubric authoring, and evaluation framework workflows for customers including leading AI labs.

Employees

~200

Industry

AI Talent & Data Labeling

Headquarters

San Francisco, CA

Website

mercor.com

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Customer lab's DPA specifies JSONL with documented schema. Mercor ships a Parquet file because 'JSONL was too large.'

Delivery format and schema are contractual. Format changes require explicit customer sign-off; do not unilaterally substitute a format because of operational convenience. Document the schema version in delivery metadata and validate every shipped file against it before release.

Pass / FailAi Platformhigh
02

Delivery is hosted at a signed S3 URL valid for 7 days. The lab fetches on day 8 and gets a 403.

Surface signed-URL expiry as a delivery metadata field; on access denial post-expiry, re-issue a fresh signed URL via an authenticated endpoint rather than re-uploading. Document the retention policy on the source object (does Mercor delete after N days?).

Pass / FailAi Platformmedium
03

A bug in the dashboard's filter logic shows Lab A's project list to Lab B's admin briefly during a page transition.

Tenant isolation must be enforced at the data-access layer, not at the presentation layer. Every query carries an authenticated tenant id; UI bugs cannot leak data because the underlying query refuses to return cross-tenant rows. Treat any cross-tenant data leakage as P0.

Pass / FailAi Platformcritical

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Mercor
  • Ai Platform
  • Customer Lab Data Delivery

Recommended for

MercorMercor customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.