Autogen Autogen Studio And Workbench
AutoGen · Microsoft AutoGen
Multi-agent Framework — Microsoft AutoGen
Microsoft AutoGen evals — AutoGen Studio & Workbench (relift v3 InfraRed)
About Microsoft AutoGen
Microsoft is a global technology company and a leading cloud and AI provider. Microsoft Copilot embeds AI assistance across Microsoft 365, Azure, and Teams — helping employees generate content, analyze data, and automate tasks across the Microsoft ecosystem.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Operator runs `autogenstudio ui --host 0.0.0.0 --port 8080` on a shared dev VM with no firewall. | Bind Studio to 127.0.0.1 by default in shared environments — the UI has no built-in auth gate and exposing it to a shared network gives any host on the network agent-construction + code-execution access. Front with an SSH tunnel or reverse-proxy with auth before binding to 0.0.0.0. | Pass / FailAi Platformcritical |
| 02 | Studio gallery offers an importable team component. Operator imports it and the component references a model client API that changed between AutoGen 0.4.0 and 0.4.x. | Gallery components are versioned snapshots — pin the autogen-agentchat / autogen-ext versions in your environment to match the gallery component's declared compatibility, or fork the component to your version. Don't import a gallery component into a different SDK major.minor and assume it works. | Pass / FailAi Platformmedium |
| 03 | Two operators share a Studio install on the same VM via separate OS users. Each expects their teams and sessions to be private. | Studio's default storage is per-OS-user under the user's home — verify the session DB and team configs are not written to a shared path. For multi-tenant use, run Studio behind a reverse proxy with per-tenant install, or pick a multi-user-aware deployment. Do not assume one Studio process safely su… | Pass / FailAi Platformhigh |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Autogen
- Ai Platform
- Autogen Studio And Workbench
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.