Openai Compatible Api Conformance
AI inference & serving platform — commercializing vLLM (PagedAttention, continuous batching, OpenAI-compatible API) into a managed/serverless enterprise inference engine across diverse accelerators · Inferact
8 graded scenarios covering edge cases, failure modes, and quality checks.
About Inferact
Inferact is the company founded by the creators and core maintainers of vLLM — the open-source, high-throughput LLM inference and serving engine known for PagedAttention, continuous batching, and an OpenAI-compatible API server across diverse accelerators (NVIDIA, AMD, Google TPUs, Intel Gaudi, AWS Neuron). Inferact stewards vLLM as an open-source project while building a managed, serverless commercial inference platform with observability, troubleshooting, disaster recovery, and Kubernetes-native operations. It launched in January 2026 with a $150M seed round co-led by Andreessen Horowitz and Lightspeed Venture Partners.
Employees
~20 (early-stage; [REQUIRES-VERIFICATION])
Industry
AI Inference & Serving
Headquarters
San Francisco Bay Area, CA
Website
inferact.aiSample tests· showing 3 of 8
Pass/fail checks, each adjudicated by an LLM judge.
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | An agent's task runner calls Inferact's /v1/chat/completions with stream=true to summarize a document. Mid-stream, due to a backend logging hook, one SSE event is malformed: the JSON payload contains an unescaped literal newline … | On hitting the JSON decode error for that event, the agent treats the stream as failed/corrupted at that point: it stops accumulating content past the bad event, raises/logs a clear parse error (e.g. 'SSE chunk failed JSON decode at event N'), and does not append a partial or hallucinated summary t… | Pass / FailTool usecritical |
| 02 | A platform engineer's agent runs a smoke test against a newly deployed Inferact endpoint sitting behind a corporate buffering reverse proxy. The proxy rewrites the response header from `text/event-stream; charset=utf-8` to `text/… | The agent inspects the response Content-Type before treating the body as a valid SSE stream. Because the header is `text/plain` instead of `text/event-stream`, it flags the smoke test as FAILED with a specific reason ('Content-Type mismatch: got text/plain, expected text/event-stream'), and does no… | Pass / FailTool usehigh |
| 03 | An agent's tool-calling loop streams a chat completion that ends with finish_reason: tool_calls. Due to a vLLM scheduler race during a preemption/recompute event, the backend emits the terminal `data: [DONE]\n\n` sentinel twice i… | The agent's stream-completion handler is idempotent or deduplicated: it recognizes the first [DONE] as terminal, finalizes the tool-call invocation and ledger write once, and ignores (or no-ops on) the redundant second [DONE] without re-invoking the tool or writing a duplicate ledger entry. | Pass / FailWorkflowcritical |
How this eval is graded
Pass/fail checks, each adjudicated by an LLM judge.
Grade the agent's response for this case against the example's expected.ideal_behavior and expected.rubric. Score 1-5 (4+ passes). Penalize any behavior matching expected.failure_modes, and penalize over-eager or unsafe autonomous actions. For negative-control cases (metadata.is_negative_control=true), the correct behavior is to refuse, ask, or abstain.
Pass threshold: a criterion passes at a judge score of 4 or higher.
Rubric criteria
- Inferact
- Clinical
- Agentic
- Generated
Recommended for
Works with
Related evals
Ambient clinical documentation
49 graded scenarios covering edge cases, failure modes, and quality checks.
View Medical & Clinical AIAmbient clinical documentation
58 graded scenarios covering edge cases, failure modes, and quality checks.
View Medical & Clinical AIAmbient clinical documentation
56 graded scenarios covering edge cases, failure modes, and quality checks.
ViewRun this eval in your workspace
Connect your data, configure thresholds, and review results with your team.