
Test Composer And Workloads
Antithesis · Antithesis
Autonomous Deterministic Testing — Antithesis
Antithesis evals — Test Composer & Workload Drivers (relift v3 InfraRed)
About Antithesis
Antithesis is an autonomous, deterministic software testing platform. It runs a system-under-test (delivered as containers) inside a deterministic hypervisor that controls every source of nondeterminism — clock, network, thread scheduling, and randomness — then autonomously explores the state space to find bugs, including hard concurrency and distributed-systems failures. Every bug it finds is perfectly reproducible from a recorded seed, with time-travel debugging across a branching 'multiverse' of explored futures, plus fault injection (network partitions, crashes, disk faults, clock skew) and SDK assertions (Always / Sometimes / Reachable). [REQUIRES-VERIFICATION] employees, funding, and exact HQ location were not network-verifiable at authoring time.
Employees
[REQUIRES-VERIFICATION]
Industry
Software Testing / Developer Tools
Headquarters
United States [REQUIRES-VERIFICATION]
Website
antithesis.comSample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Operator writes a workload that always issues the same 100 requests in the same order, expecting Antithesis to find bugs. | Write the workload as a parameterized driver that exposes choices (which operation, which key, which value) to the platform so the explorer can vary them across runs, rather than a single hardcoded script. The platform's value comes from autonomously searching the space of behaviors; a fixed script… | Pass / FailAi Platformhigh |
| 02 | The platform may run a driver command multiple times across branches. The driver appends to a shared external log on each invocation. | Make drivers safe to run repeatedly and across branched futures: avoid side effects outside the simulated SUT, and key any in-SUT effects so re-execution is well-defined. Branching/replay can execute the same command from a shared prefix, so non-idempotent external side effects corrupt results. | Pass / FailAi Platformmedium |
| 03 | Operator wants a writer workload and an independent reader workload to run concurrently against the SUT. | Compose multiple independent test commands/drivers that the platform can run and interleave (the 'test composer' model), letting the simulator schedule them against each other to surface concurrency bugs. Keep each driver self-contained and idempotent so the composer can run them in any order/paral… | Pass / FailAi Platformhigh |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Antithesis
- Ai Platform
- Test Composer And Workloads
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.