FAQ

Running evals, bringing your own, and expert review

Everything an AI or ML team needs to go from a library pack — or your own eval — to gated releases with human judgment where it matters.

Running a Corsac eval

Start from the library and have results in CI in minutes.

How do I run a Corsac eval pack?+

Find a pack in the eval library, add it to your workspace, point it at your model or agent endpoint, and run it. Most teams run packs from CI: your runner executes the cases, scores them locally, and pushes the result to Corsac via a single API call (POST /api/app/evals/{spec_id}/external-run). Corsac stores the run, applies your pass/fail gate, and routes anything flagged to review.

What does an eval pack actually contain?+

Each pack is a versioned spec plus a dataset of test cases. The spec defines the use cases it covers, the rubric criteria, and how each case is scored — pass/fail assertions, a 1–5 LLM-judge score, or a mix. Every detail page lists the use cases it checks, sample tests, the rubric, and the scoring method.

How are evals scored?+

Depending on the pack, cases are scored as binary pass/fail assertions, graded 1–5 by an LLM judge against the rubric, or a combination. The scoring method is shown on each eval's page so you know exactly how a result is reached before you run it. You set the pass threshold that gates your release.

Do I need to sign in to browse evals?+

No. The full eval library, including each pack's use cases, rubric, and scoring method, is public so you can evaluate fit before signing up. Signing in unlocks the full sample dataset rows and lets you add a pack to your workspace to run it on your own data.

Bring your own eval

Already have evals? Corsac runs and governs them alongside the library.

Can I bring my own eval to Corsac?+

Yes. You can run your existing eval in your own harness and push the result to Corsac through the same external-run API the library packs use. Corsac becomes the system of record: it stores every run, gates releases on the result, tracks regressions over time, and routes flagged outputs to review — regardless of who authored the eval.

What format does a bring-your-own eval need to be in?+

You send an EvalResult payload — per-case metrics, aggregates, and run metadata — to the external-run endpoint. The exact field shape is in the API docs and the live OpenAPI schema. Because scoring happens in your harness, you are not locked into a particular framework; Corsac ingests the outcome.

Can I customize a library pack instead of starting from scratch?+

Yes. Clone any library pack, adjust the cases, rubric, or thresholds to match your workflow, and run the customized version. This is the fastest path for most teams: start from a proven pack for your connector or use case, then tailor it.

Manual review & the expert network

Automated scoring catches most issues. Corsac's expert network adjudicates the rest.

What is Corsac managed review?+

Managed review routes low-confidence, high-stakes, or policy-sensitive outputs to human reviewers instead of relying on automated scoring alone. The human decision (approved, rejected, with notes) is recorded against the run, giving you an auditable accountability trail for every critical AI decision.

How do I enhance an automated eval with manual review?+

Define a routing rule — for example, send every case the judge scores below your threshold, every case tagged high-severity, or a sampled percentage — to review. Those cases go to Corsac's reviewer queue; the rest pass through automatically. You get automated coverage at scale plus human judgment exactly where it matters.

Who reviews the flagged outputs?+

Corsac's managed expert network. For domain-specific work (clinical, legal, security, financial), reviews are handled by reviewers with the relevant expertise so the adjudication is credible to your stakeholders and auditors. You can also keep review in-house and use Corsac purely to record your own team's decisions.

Is the human review decision auditable?+

Yes. Every review decision is stored against the specific run with the deciding user, the decision, and free-text rationale, and is readable back through the API. That record is the point: it turns 'a human checked this' into a queryable, exportable artifact.

Access & getting started

How do I get access to Corsac?+

Request access from the Get started page. You'll get a workspace and an API key to browse evals, push runs, and configure review routing. The eval library is browsable without an account.

Is there an API?+

Yes — a REST API over HTTPS with bearer-token auth. You can list evals, push run results, list and fetch runs, and read or write the human review decision on each run. See the API docs for endpoints and copy-pasteable cURL, TypeScript, and Python examples.

Still have a question?

Browse the eval library, read the API docs, or talk to us about a pilot.