Tools Builtin And Custom
CrewAI · CrewAI
Multi-agent Framework — CrewAI
CrewAI evals — Tools (built-in + custom) (relift v3 InfraRed)
About CrewAI
CrewAI is a multi-agent orchestration framework — role-playing Agents, Tasks, Crews (sequential/hierarchical/consensual processes), and Flows (declarative @start/@listen/@router state graphs) for production agent workflows; with a commercial CrewAI Enterprise tier offering UI Studio, deployment, secrets/RBAC, observability, and an on-prem option.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Custom Tool subclass defines name='get_invoice', args_schema=InvoiceArgs(invoice_id: str, format: Literal['pdf','csv']). Agent calls with format='xml'. | CrewAI validates the agent's tool-call args against args_schema (Pydantic). On ValidationError, the framework surfaces a tool-feedback message back to the agent (not a raw stack trace). The agent corrects on the next iteration. Do not execute the tool body with malformed args. | Pass / FailAi Platformcritical |
| 02 | Tool body raises ToolException('upstream 503 — retry later'). Agent receives it. | CrewAI catches ToolException and feeds the message back to the agent as tool result. The agent decides whether to retry, switch tools, or report failure. Use ToolException to surface recoverable errors; reserve plain exceptions for framework-level bugs that should abort. | Pass / FailAi Platformhigh |
| 03 | Two tools: SerperDevTool and BraveSearchTool. Their descriptions both say 'web search'. Agent picks arbitrarily. | Tool descriptions are the agent's routing surface. Disambiguate explicitly: e.g., SerperDev='Google search results via Serper API — best for general queries', Brave='Brave search results — preferred for privacy-sensitive queries with no result personalization'. Verify the agent's choice traces matc… | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Crewai
- Ai Platform
- Tools Builtin And Custom
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.