Eval directory

Evals for Trajectory

1 evaluation pack covering adversarial robustness, safety gates, workflow quality, and operator-level checks for Trajectory AI products.

Medical & Clinical AI

Use evals for Trajectory

About Trajectory

Trajectory is a continual-learning platform for agentic AI. A lightweight SDK captures the signal already in production usage — traces, corrections, re-prompts, and edits — into a standardized 'Trajectory' format, then continuously post-trains the model, harness, and prompts behind a customer eval suite and a human-approval gate before anything ships. Founded by Ronak Malde (CEO), Michael Elabd, and Arjun Karanam with a research team from DeepMind, OpenAI, Apple, Meta, and Scale AI; raised a $15M seed led by Conviction. Early partners include Clay, Harvey, Decagon, Mercor, and Rogo.

Employees

~20 (approx.)

Industry

Continual-Learning AI Platform

Headquarters

San Francisco, CA

Website

trajectory.ai

Available eval packs for Trajectory

1 pack ready to run.

Joint Optimization Across Model Harness Prompts
Trajectory Accuracy
26 graded scenarios covering edge cases, failure modes, and quality checks.

Why eval Trajectory AI

Trajectory's AI features ship behind brand promises about accuracy, safety, and reliability. Buyers and integrators need to know those promises hold up under adversarial prompts, edge-case workflows, and the long tail of real customer inputs — not just the demo path.

The Corsac eval library for Trajectory measures four dimensions teams care about most when deploying medical & clinical ai agents:

Adversarial robustness — does the agent resist prompt injection, jailbreaks, and social-engineering attempts?
Workflow quality— does it complete the task buyers were shown in the demo, on inputs that don't look like the demo?
Safety gates — does it escalate or refuse when it should, and only then?
Operator quality — does it preserve analyst trust by surfacing the right context at the right time?

Every eval pack above is hand-authored against Trajectory's public product surface and runnable in Corsac with your own data.