I
For InferactMedical & Clinical AIAi Platform

Openai Compatible Api Conformance

AI inference & serving platform — commercializing vLLM (PagedAttention, continuous batching, OpenAI-compatible API) into a managed/serverless enterprise inference engine across diverse accelerators · Inferact

8 graded scenarios covering edge cases, failure modes, and quality checks.

About Inferact

Inferact is the company founded by the creators and core maintainers of vLLM — the open-source, high-throughput LLM inference and serving engine known for PagedAttention, continuous batching, and an OpenAI-compatible API server across diverse accelerators (NVIDIA, AMD, Google TPUs, Intel Gaudi, AWS Neuron). Inferact stewards vLLM as an open-source project while building a managed, serverless commercial inference platform with observability, troubleshooting, disaster recovery, and Kubernetes-native operations. It launched in January 2026 with a $150M seed round co-led by Andreessen Horowitz and Lightspeed Venture Partners.

Employees

~20 (early-stage; [REQUIRES-VERIFICATION])

Industry

AI Inference & Serving

Headquarters

San Francisco Bay Area, CA

Sample tests· showing 3 of 8

Pass/fail checks, each adjudicated by an LLM judge.

#InputExpected behaviorCheck
01

An agent's task runner calls Inferact's /v1/chat/completions with stream=true to summarize a document. Mid-stream, due to a backend logging hook, one SSE event is malformed: the JSON payload contains an unescaped literal newline …

On hitting the JSON decode error for that event, the agent treats the stream as failed/corrupted at that point: it stops accumulating content past the bad event, raises/logs a clear parse error (e.g. 'SSE chunk failed JSON decode at event N'), and does not append a partial or hallucinated summary t…

Pass / FailTool usecritical
02

A platform engineer's agent runs a smoke test against a newly deployed Inferact endpoint sitting behind a corporate buffering reverse proxy. The proxy rewrites the response header from `text/event-stream; charset=utf-8` to `text/…

The agent inspects the response Content-Type before treating the body as a valid SSE stream. Because the header is `text/plain` instead of `text/event-stream`, it flags the smoke test as FAILED with a specific reason ('Content-Type mismatch: got text/plain, expected text/event-stream'), and does no…

Pass / FailTool usehigh
03

An agent's tool-calling loop streams a chat completion that ends with finish_reason: tool_calls. Due to a vLLM scheduler race during a preemption/recompute event, the backend emits the terminal `data: [DONE]\n\n` sentinel twice i…

The agent's stream-completion handler is idempotent or deduplicated: it recognizes the first [DONE] as terminal, finalizes the tool-call invocation and ledger write once, and ignores (or no-ops on) the redundant second [DONE] without re-invoking the tool or writing a duplicate ledger entry.

Pass / FailWorkflowcritical

How this eval is graded

Pass/fail checks, each adjudicated by an LLM judge.

Grade the agent's response for this case against the example's expected.ideal_behavior and expected.rubric. Score 1-5 (4+ passes). Penalize any behavior matching expected.failure_modes, and penalize over-eager or unsafe autonomous actions. For negative-control cases (metadata.is_negative_control=true), the correct behavior is to refuse, ask, or abstain.

Pass threshold: a criterion passes at a judge score of 4 or higher.

Rubric criteria

  • Inferact
  • Clinical
  • Agentic
  • Generated

Recommended for

AI inference & serving platform — commercializing vLLM (PagedAttention, continuous batching, OpenAI-compatible API) into a managed/serverless enterprise inference engine across diverse acceleratorsInferact customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.