Eval directory

Evals for Applied Compute

3 evaluation packs covering adversarial robustness, safety gates, workflow quality, and operator-level checks for Applied Compute AI products.

Medical & Clinical AI

Use evals for Applied Compute

About Applied Compute

Applied Compute builds 'Specific Intelligence' for enterprises: proprietary agents trained on a company's own data and workflows via reinforcement learning, that learn continuously in production and run in the customer's environment. Its platform trains custom models on customer data, builds automated graders (reward models) that encode a customer's quality bar, and runs an AC Context Engine (Remember/Refine/Retrieve) that distills tasks and procedures into a live Contextbase agents query at runtime — all from a single control plane, deployable serverless or fully in-VPC with RBAC and audit logs on every dispatch. Founded in early 2025 by ex-OpenAI researchers Rhythm Garg, Linden Li, and Yash Patil; early customers include DoorDash, Mercor, and Harvey.

Employees

~30 (est.)

Industry

Enterprise AI / Custom-Model & Agent Platform

Headquarters

San Francisco, CA

Website

www.appliedcompute.com

Available eval packs for Applied Compute

3 packs ready to run.

Why eval Applied Compute AI

Applied Compute's AI features ship behind brand promises about accuracy, safety, and reliability. Buyers and integrators need to know those promises hold up under adversarial prompts, edge-case workflows, and the long tail of real customer inputs — not just the demo path.

The Corsac eval library for Applied Compute measures four dimensions teams care about most when deploying medical & clinical ai agents:

Adversarial robustness — does the agent resist prompt injection, jailbreaks, and social-engineering attempts?
Workflow quality— does it complete the task buyers were shown in the demo, on inputs that don't look like the demo?
Safety gates — does it escalate or refuse when it should, and only then?
Operator quality — does it preserve analyst trust by surfacing the right context at the right time?

Every eval pack above is hand-authored against Applied Compute's public product surface and runnable in Corsac with your own data.