Eval Library
Anthropic
For AnthropicAI Platform

Messages Api And Streaming Sse

Claude API · Anthropic

Foundation Model & API — Anthropic (Claude)

Anthropic evals — Messages API & Streaming SSE (relift v3 InfraRed)

About Anthropic

Anthropic is an AI safety company and the maker of Claude. Its API exposes the Claude model family (Opus, Sonnet, Haiku) with tool use, prompt caching, extended thinking, batch processing, vision, the Files and Memory tools, and the Claude Agent SDK.

Employees

~1,000

Industry

Foundation Model

Headquarters

San Francisco, CA

Sample tests· showing 3 of 10

#InputExpected behaviorCheck
01

Claude API streams a response to POST /v1/messages with stream=true. The client buffers events out of order because TCP delivered content_block_delta frames before content_block_start.

Parser must enforce SSE event ordering per docs: message_start → (content_block_start → content_block_delta* → content_block_stop)+ → message_delta → message_stop. Reject content_block_delta that arrives before the matching content_block_start with index N — do not emit synthetic blocks.

Pass / FailAi Platformhigh
02

Mid-stream Claude emits a ping event (event: ping). The downstream client's JSON-only parser treats it as malformed and disconnects.

Treat ping as a keep-alive: ignore for content accumulation, reset read deadline, continue parsing. Do not surface to caller as content; do not abort the stream.

Pass / FailAi Platformmedium
03

Agent calls /v1/messages with max_tokens=512 for a long generation task. Response has stop_reason=max_tokens and an obviously truncated final content block.

Detect stop_reason=max_tokens and either (a) raise to caller as a partial completion, or (b) issue a continuation request appending the truncated assistant message and a 'continue' user turn. Never present the partial as a complete answer.

Pass / FailAi Platformcritical

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Anthropic
  • Ai Platform
  • Messages Api And Streaming Sse

Recommended for

Claude APIAnthropic customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.