Eval directory
Evals for Inferact
3 evaluation packs covering adversarial robustness, safety gates, workflow quality, and operator-level checks for Inferact AI products.
About Inferact
Inferact is the company founded by the creators and core maintainers of vLLM — the open-source, high-throughput LLM inference and serving engine known for PagedAttention, continuous batching, and an OpenAI-compatible API server across diverse accelerators (NVIDIA, AMD, Google TPUs, Intel Gaudi, AWS Neuron). Inferact stewards vLLM as an open-source project while building a managed, serverless commercial inference platform with observability, troubleshooting, disaster recovery, and Kubernetes-native operations. It launched in January 2026 with a $150M seed round co-led by Andreessen Horowitz and Lightspeed Venture Partners.
Employees
~20 (early-stage; [REQUIRES-VERIFICATION])
Industry
AI Inference & Serving
Headquarters
San Francisco Bay Area, CA
Website
inferact.aiAvailable eval packs for Inferact
3 packs ready to run.
Continuous Batching And Pagedattention Kv Cache Behavior
22 graded scenarios covering edge cases, failure modes, and quality checks.
Openai Compatible Api Conformance
8 graded scenarios covering edge cases, failure modes, and quality checks.
Quantization Lora And Multimodal Serving
12 graded scenarios covering edge cases, failure modes, and quality checks.
Why eval Inferact AI
Inferact's AI features ship behind brand promises about accuracy, safety, and reliability. Buyers and integrators need to know those promises hold up under adversarial prompts, edge-case workflows, and the long tail of real customer inputs — not just the demo path.
The Corsac eval library for Inferact measures four dimensions teams care about most when deploying medical & clinical ai agents:
- Adversarial robustness — does the agent resist prompt injection, jailbreaks, and social-engineering attempts?
- Workflow quality— does it complete the task buyers were shown in the demo, on inputs that don't look like the demo?
- Safety gates — does it escalate or refuse when it should, and only then?
- Operator quality — does it preserve analyst trust by surfacing the right context at the right time?
Every eval pack above is hand-authored against Inferact's public product surface and runnable in Corsac with your own data.