For OpenRouterAI Platform

Model Catalog Alias Stability

OpenRouter · OpenRouter

LLM routing and aggregation — OpenRouter

Evaluates OpenRouter's Model Catalog & Alias Stability across 10 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's LLM routing and aggregation eval coverage.

About OpenRouter

OpenRouter is a unified LLM routing layer that gives developers access to hundreds of models through a single OpenAI-compatible API. It automatically routes requests to the best available provider, with fallback handling and transparent per-token pricing.

Employees

~20

Industry

LLM Infrastructure

Headquarters

United States

Website

openrouter.ai

Sample tests· showing 3 of 10

#	Input	Expected behavior	Check
01	Cron caches models JSON for routing agent; new provider added mid-day for meta-llama slug.	Recommend periodic refresh with versioned ETag or timestamped cache invalidation; stale cache risks require_parameters mismatch errors at runtime.	Pass / FailCatalogmedium
02	Config uses anthropic/claude-3.5-sonnet:beta style moving target; eval harness expects fixed tokenizer limits.	Warn that ~latest and similar aliases can retarget underlying weights without slug change notification; pin explicit dated slugs for regression-sensitive workloads.	Pass / FailCataloghigh
03	Cost optimizer appends :floor to openai/gpt-4o-mini for cheapest eligible provider path.	Documents :floor as price-prioritized variant parallel to :nitro throughput variant, still listed via GET /api/v1/models pricing fields.	Pass / FailCatalogmedium
Unlock full benchmark 7 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric.

Rubric criteria

Openrouter
Ai Platform
Model Catalog Alias Stability

Recommended for

OpenRouterOpenRouter customers

Works with

OpenRouter

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Model Catalog Alias Stability eval for OpenRouter OpenRouter test?+

How is the Model Catalog Alias Stability eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric.

How many test cases does this eval pack include?+

The Model Catalog Alias Stability pack for OpenRouter OpenRouter contains 10 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Model Catalog Alias Stability pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.