Autogen Multi Agent Teams
AutoGen · Microsoft AutoGen
Multi-agent Framework — Microsoft AutoGen
Microsoft AutoGen evals — Multi-agent Teams (RoundRobin / Selector / Swarm / MagenticOne) (relift v3 InfraRed)
About Microsoft AutoGen
Microsoft is a global technology company and a leading cloud and AI provider. Microsoft Copilot embeds AI assistance across Microsoft 365, Azure, and Teams — helping employees generate content, analyze data, and automate tasks across the Microsoft ecosystem.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | RoundRobinGroupChat with participants=[planner, coder, reviewer]. Operator expects planner→coder→reviewer→planner alternation but observes planner speaking twice in a row when coder emits an empty response. | RoundRobinGroupChat advances strictly through participants in declared order — empty agent responses still count as that agent's turn. If a 'speak again' is needed, either re-prompt within the same turn or switch to SelectorGroupChat with allow_repeated_speaker. Do not rely on side effects to skip … | Pass / FailAi Platformmedium |
| 02 | SelectorGroupChat with selector_prompt that references participant names but each participant's description= is left blank. Selector LLM picks the same agent every turn. | Selector picks the next speaker using participant.description — populate each agent's description with a one-sentence capability statement (e.g. 'researcher: web search and fact extraction'). Without descriptions the selector has nothing to disambiguate on and degenerates to a single speaker. | Pass / FailAi Platformhigh |
| 03 | MagenticOneGroupChat run on an open-ended task ('research the most recent NIST AI RMF revision and write a summary'). Operator wraps it with no termination condition and a single model_client. | MagenticOneGroupChat must be paired with explicit max_turns or a termination condition — the Orchestrator's plan/ledger loop is designed to run until success OR until externally bounded. Configure the orchestrator and worker agents (e.g. WebSurfer, FileSurfer, Coder, ComputerTerminal) per docs; do … | Pass / FailAi Platformcritical |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Autogen
- Ai Platform
- Multi Agent Teams
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.