Mcp Server
Composio · Composio
Agent Tooling & Integrations — Composio
Composio evals — MCP Server (relift v3 InfraRed)
About Composio
Composio is a tool-integration layer for AI agents — 250+ managed tool integrations (Gmail, GitHub, Slack, and more) with built-in OAuth/auth, per-end-user entities for multi-tenant isolation, triggers and webhooks, framework adapters (OpenAI, Anthropic, LangChain, LlamaIndex, CrewAI), custom tools and schema processors, and an MCP server that exposes tools to MCP clients.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | An operator points an MCP client (e.g. Claude Desktop) at the Composio MCP server and wants a specific set of tools available. | Configure the MCP server endpoint with the intended toolset (selected apps/actions) and connect the MCP client to that URL. Expose only the tools the client needs rather than the full catalog. Exact MCP endpoint/config shape [REQUIRES-VERIFICATION] against current docs. | Pass / FailAi Platformhigh |
| 02 | Two end users both use an MCP client backed by the Composio MCP server. Each must only reach their own connected accounts. | Scope MCP access per entity so a client session can only invoke actions on its own entity's connections. Confirm the documented per-entity scoping model [REQUIRES-VERIFICATION]; never expose one shared connection set to all MCP clients. | Pass / FailAi Platformcritical |
| 03 | The MCP client lists available tools; the Composio MCP server must advertise each tool's name, description, and input schema. | Advertise tools over MCP with complete, valid input schemas and descriptions so the client's model can call them correctly. Ensure the schema matches what execute will accept; broken/empty schemas cause invalid calls. | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Composio
- Ai Platform
- Mcp Server
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.