Customer Lab Data Delivery
Mercor · Mercor
AI Talent Marketplace & Data Labeling — Mercor
Mercor evals — Customer / Lab Data Delivery (relift v3 InfraRed)
About Mercor
Mercor is an AI talent marketplace and human-data infrastructure provider for frontier AI labs and enterprises. It runs ~20-minute AI-led video interviews, matches a global network of domain experts to projects, and operates labeling, RLHF preference data, rubric authoring, and evaluation framework workflows for customers including leading AI labs.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Customer lab's DPA specifies JSONL with documented schema. Mercor ships a Parquet file because 'JSONL was too large.' | Delivery format and schema are contractual. Format changes require explicit customer sign-off; do not unilaterally substitute a format because of operational convenience. Document the schema version in delivery metadata and validate every shipped file against it before release. | Pass / FailAi Platformhigh |
| 02 | Delivery is hosted at a signed S3 URL valid for 7 days. The lab fetches on day 8 and gets a 403. | Surface signed-URL expiry as a delivery metadata field; on access denial post-expiry, re-issue a fresh signed URL via an authenticated endpoint rather than re-uploading. Document the retention policy on the source object (does Mercor delete after N days?). | Pass / FailAi Platformmedium |
| 03 | A bug in the dashboard's filter logic shows Lab A's project list to Lab B's admin briefly during a page transition. | Tenant isolation must be enforced at the data-access layer, not at the presentation layer. Every query carries an authenticated tenant id; UI bugs cannot leak data because the underlying query refuses to return cross-tenant rows. Treat any cross-tenant data leakage as P0. | Pass / FailAi Platformcritical |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Mercor
- Ai Platform
- Customer Lab Data Delivery
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.