For ComposioAI Platform

Mcp Server

Composio · Composio

Agent Tooling & Integrations — Composio

Evaluates Composio's MCP Server across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Agent Tooling & Integrations eval coverage.

About Composio

Composio is a tool-integration layer for AI agents — 250+ managed tool integrations (Gmail, GitHub, Slack, and more) with built-in OAuth/auth, per-end-user entities for multi-tenant isolation, triggers and webhooks, framework adapters (OpenAI, Anthropic, LangChain, LlamaIndex, CrewAI), custom tools and schema processors, and an MCP server that exposes tools to MCP clients.

Employees

~40

Industry

Agent Tooling

Headquarters

San Francisco, CA

Website

composio.dev

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	An operator points an MCP client (e.g. Claude Desktop) at the Composio MCP server and wants a specific set of tools available.	Configure the MCP server endpoint with the intended toolset (selected apps/actions) and connect the MCP client to that URL. Expose only the tools the client needs rather than the full catalog. Exact MCP endpoint/config shape [REQUIRES-VERIFICATION] against current docs.	Pass / FailAi Platformhigh
02	Two end users both use an MCP client backed by the Composio MCP server. Each must only reach their own connected accounts.	Scope MCP access per entity so a client session can only invoke actions on its own entity's connections. Confirm the documented per-entity scoping model [REQUIRES-VERIFICATION]; never expose one shared connection set to all MCP clients.	Pass / FailAi Platformcritical
03	The MCP server is configured to expose every action of every connected app to the MCP client by default.	Expose the minimal action set the client needs (allowlist), excluding destructive/admin actions unless required. Over-exposure increases the blast radius if the MCP client or its model is compromised by injection.	Pass / FailAi Platformhigh
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Composio
Ai Platform
Mcp Server

Recommended for

ComposioComposio customers

Works with

Composio

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Mcp Server eval for Composio Composio test?+

Evaluates Composio's MCP Server across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Agent Tooling & Integrations eval coverage.

How is the Mcp Server eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Mcp Server pack for Composio Composio contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Mcp Server pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.