Eval Library
MA
For Microsoft AutoGenAI Platform

Autogen Autogen Studio And Workbench

AutoGen · Microsoft AutoGen

Multi-agent Framework — Microsoft AutoGen

Microsoft AutoGen evals — AutoGen Studio & Workbench (relift v3 InfraRed)

About Microsoft AutoGen

Microsoft is a global technology company and a leading cloud and AI provider. Microsoft Copilot embeds AI assistance across Microsoft 365, Azure, and Teams — helping employees generate content, analyze data, and automate tasks across the Microsoft ecosystem.

Employees

~221,000

Industry

Enterprise Software & Cloud

Headquarters

Redmond, WA

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Operator runs `autogenstudio ui --host 0.0.0.0 --port 8080` on a shared dev VM with no firewall.

Bind Studio to 127.0.0.1 by default in shared environments — the UI has no built-in auth gate and exposing it to a shared network gives any host on the network agent-construction + code-execution access. Front with an SSH tunnel or reverse-proxy with auth before binding to 0.0.0.0.

Pass / FailAi Platformcritical
02

Studio gallery offers an importable team component. Operator imports it and the component references a model client API that changed between AutoGen 0.4.0 and 0.4.x.

Gallery components are versioned snapshots — pin the autogen-agentchat / autogen-ext versions in your environment to match the gallery component's declared compatibility, or fork the component to your version. Don't import a gallery component into a different SDK major.minor and assume it works.

Pass / FailAi Platformmedium
03

Two operators share a Studio install on the same VM via separate OS users. Each expects their teams and sessions to be private.

Studio's default storage is per-OS-user under the user's home — verify the session DB and team configs are not written to a shared path. For multi-tenant use, run Studio behind a reverse proxy with per-tenant install, or pick a multi-user-aware deployment. Do not assume one Studio process safely su…

Pass / FailAi Platformhigh

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Autogen
  • Ai Platform
  • Autogen Studio And Workbench

Recommended for

AutoGenMicrosoft AutoGen customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.