Eval Library
C
For ComposioAI Platform

Framework Integrations

Composio · Composio

Agent Tooling & Integrations — Composio

Composio evals — Framework Integrations (relift v3 InfraRed)

About Composio

Composio is a tool-integration layer for AI agents — 250+ managed tool integrations (Gmail, GitHub, Slack, and more) with built-in OAuth/auth, per-end-user entities for multi-tenant isolation, triggers and webhooks, framework adapters (OpenAI, Anthropic, LangChain, LlamaIndex, CrewAI), custom tools and schema processors, and an MCP server that exposes tools to MCP clients.

Employees

~40

Industry

Agent Tooling

Headquarters

San Francisco, CA

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Using the OpenAI adapter, the model returns a tool_calls array. The integrator must run the Composio actions and continue the conversation.

Hand each tool_call to Composio's handler to execute the corresponding action, then append one tool-role message per tool_call_id with the result and call the model again. Match results to tool_call_id exactly; loop until the model returns a final message.

Pass / FailAi Platformhigh
02

A Composio action with nested object arguments is exposed to OpenAI and to LangChain; the LangChain path silently flattens the nested args.

Ensure the adapter preserves the action's full input schema (nested objects, required fields, enums) in each framework's tool format so the model fills arguments correctly. Validate that the rendered schema round-trips back to a valid execute payload.

Pass / FailAi Platformmedium
03

An operator wires Composio tools into a LangChain (or CrewAI) agent and the agent executes tools but never carries entity context.

Bind the toolset to the current user's entity when constructing the framework tools so every execution the agent performs runs under that entity. Do not build one global toolset shared across users in a multi-tenant agent.

Pass / FailAi Platformcritical

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Composio
  • Ai Platform
  • Framework Integrations

Recommended for

ComposioComposio customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.