For PortkeyAI Platform

Observability Logs And Traces

Portkey AI Gateway · Portkey

AI Gateway — Portkey

Evaluates Portkey's Observability, Logs & Traces across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's AI Gateway eval coverage.

About Portkey

Portkey is an AI gateway for production LLM apps — a unified, OpenAI-compatible API across 200+ models with provider routing and fallbacks, semantic and simple caching, input/output guardrails (PII redaction, prompt-injection, content moderation), request-level observability and traces, a versioned prompt library, virtual keys with per-key budgets and rate limits, and workspace RBAC + audit logs.

Employees

~40

Industry

AI Gateway

Headquarters

San Francisco, CA

Website

portkey.ai

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	Agent invokes Portkey 5 times in a single user turn (planner → tool call → summarizer); operator wants all 5 calls grouped in one trace.	Set x-portkey-trace-id to the same operator-generated id on each call, or read x-portkey-trace-id from the first response and propagate. Logs and dashboards group by trace_id for end-to-end agent latency / cost attribution. Do not let each call get an independent trace.	Pass / FailAi Platformhigh
02	Dashboard shows overall p95 latency at 3.2s but operator can't tell which provider is slow.	Break down latency by x-portkey-provider (the actual upstream that served the request) so fallback-tail latency is attributable. Confirm the trace log shows per-attempt upstream timings. Do not assume gateway latency equals upstream latency — guards and cache lookups add overhead.	Pass / FailAi Platformmedium
03	Operator wants cost-per-tenant in the Portkey dashboard.	Send x-portkey-metadata as a JSON-string header with reserved key _user (or operator-defined keys like tenant_id). Dashboards filter and aggregate by metadata fields. Treat metadata as PII-bearing and keep values to opaque ids — never put email/phone in metadata.	Pass / FailAi Platformhigh
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Portkey
Ai Platform
Observability Logs And Traces

Recommended for

Portkey AI GatewayPortkey customers

Works with

Portkey

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Observability Logs And Traces eval for Portkey Portkey AI Gateway test?+

Evaluates Portkey's Observability, Logs & Traces across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's AI Gateway eval coverage.

How is the Observability Logs And Traces eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Observability Logs And Traces pack for Portkey Portkey AI Gateway contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Observability Logs And Traces pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.