All evals
T

Eval directory

Evals for Trajectory

1 evaluation pack covering adversarial robustness, safety gates, workflow quality, and operator-level checks for Trajectory AI products.

Medical & Clinical AI
Use evals for Trajectory

About Trajectory

Trajectory is a continual-learning platform for agentic AI. A lightweight SDK captures the signal already in production usage — traces, corrections, re-prompts, and edits — into a standardized 'Trajectory' format, then continuously post-trains the model, harness, and prompts behind a customer eval suite and a human-approval gate before anything ships. Founded by Ronak Malde (CEO), Michael Elabd, and Arjun Karanam with a research team from DeepMind, OpenAI, Apple, Meta, and Scale AI; raised a $15M seed led by Conviction. Early partners include Clay, Harvey, Decagon, Mercor, and Rogo.

Employees

~20 (approx.)

Industry

Continual-Learning AI Platform

Headquarters

San Francisco, CA

Available eval packs for Trajectory

1 pack ready to run.

Why eval Trajectory AI

Trajectory's AI features ship behind brand promises about accuracy, safety, and reliability. Buyers and integrators need to know those promises hold up under adversarial prompts, edge-case workflows, and the long tail of real customer inputs — not just the demo path.

The Corsac eval library for Trajectory measures four dimensions teams care about most when deploying medical & clinical ai agents:

  • Adversarial robustness — does the agent resist prompt injection, jailbreaks, and social-engineering attempts?
  • Workflow quality— does it complete the task buyers were shown in the demo, on inputs that don't look like the demo?
  • Safety gates — does it escalate or refuse when it should, and only then?
  • Operator quality — does it preserve analyst trust by surfacing the right context at the right time?

Every eval pack above is hand-authored against Trajectory's public product surface and runnable in Corsac with your own data.