
Eval directory
Evals for Antithesis
8 evaluation packs covering adversarial robustness, safety gates, workflow quality, and operator-level checks for Antithesis AI products.
About Antithesis
Antithesis is an autonomous, deterministic software testing platform. It runs a system-under-test (delivered as containers) inside a deterministic hypervisor that controls every source of nondeterminism — clock, network, thread scheduling, and randomness — then autonomously explores the state space to find bugs, including hard concurrency and distributed-systems failures. Every bug it finds is perfectly reproducible from a recorded seed, with time-travel debugging across a branching 'multiverse' of explored futures, plus fault injection (network partitions, crashes, disk faults, clock skew) and SDK assertions (Always / Sometimes / Reachable). [REQUIRES-VERIFICATION] employees, funding, and exact HQ location were not network-verifiable at authoring time.
Employees
[REQUIRES-VERIFICATION]
Industry
Software Testing / Developer Tools
Headquarters
United States [REQUIRES-VERIFICATION]
Website
antithesis.comAvailable eval packs for Antithesis
8 packs ready to run.
Autonomous Exploration
Antithesis evals — Autonomous State-Space Exploration (relift v3 InfraRed)
Cicd Auth And Governance
Antithesis evals — CI/CD, Auth & Governance (relift v3 InfraRed)
Deterministic Simulation And Reproducibility
Antithesis evals — Deterministic Simulation & Reproducibility (relift v3 InfraRed)
Fault Injection
Antithesis evals — Fault Injection (relift v3 InfraRed)
Sdk Assertions
Antithesis evals — SDK Assertions (Always / Sometimes / Reachable) (relift v3 InfraRed)
Sut Setup Containers
Antithesis evals — System-Under-Test Setup (Containers) (relift v3 InfraRed)
Test Composer And Workloads
Antithesis evals — Test Composer & Workload Drivers (relift v3 InfraRed)
Triage Reports And Debugging
Antithesis evals — Triage Reports & Multiverse Debugging (relift v3 InfraRed)
Why eval Antithesis AI
Antithesis's AI features ship behind brand promises about accuracy, safety, and reliability. Buyers and integrators need to know those promises hold up under adversarial prompts, edge-case workflows, and the long tail of real customer inputs — not just the demo path.
The Corsac eval library for Antithesis measures four dimensions teams care about most when deploying ai platform agents:
- Adversarial robustness — does the agent resist prompt injection, jailbreaks, and social-engineering attempts?
- Workflow quality— does it complete the task buyers were shown in the demo, on inputs that don't look like the demo?
- Safety gates — does it escalate or refuse when it should, and only then?
- Operator quality — does it preserve analyst trust by surfacing the right context at the right time?
Every eval pack above is hand-authored against Antithesis's public product surface and runnable in Corsac with your own data.