Eval Library
MA
For Microsoft AutoGenAI Platform

Autogen Model Clients And Providers

AutoGen · Microsoft AutoGen

Multi-agent Framework — Microsoft AutoGen

Microsoft AutoGen evals — Model Clients & Providers (relift v3 InfraRed)

About Microsoft AutoGen

Microsoft is a global technology company and a leading cloud and AI provider. Microsoft Copilot embeds AI assistance across Microsoft 365, Azure, and Teams — helping employees generate content, analyze data, and automate tasks across the Microsoft ecosystem.

Employees

~221,000

Industry

Enterprise Software & Cloud

Headquarters

Redmond, WA

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Operator instantiates OpenAIChatCompletionClient(model='custom-llama-v1') against a self-hosted endpoint but omits model_info=. Tool calling silently fails.

For non-OpenAI-canonical model names, pass model_info={'vision': bool, 'function_calling': bool, 'json_output': bool, 'family': '...'} so AutoGen knows what capabilities the model exposes. Without model_info on unknown models, capability checks (tool calling, structured output) default conservative…

Pass / FailAi Platformhigh
02

Operator wants per-token streaming in the UI. Uses model_client_stream=True on the AssistantAgent and iterates team.run_stream().

Set model_client_stream=True on the agent so model deltas surface as ModelClientStreamingChunkEvent in run_stream. UI must render chunk events as they arrive; final message arrives as a normal ChatMessage at end of stream. Don't ignore chunk events and wait for ChatMessage only — that loses the str…

Pass / FailAi Platformmedium
03

Operator constructs AzureOpenAIChatCompletionClient with api_key= directly instead of Azure AD token provider.

Use azure_ad_token_provider with DefaultAzureCredential (or a workload-specific credential) so the client picks up managed-identity / workload-identity in production. Set api_version explicitly to a pinned preview/GA value. Hardcoded api_keys are fine for dev but a production smell.

Pass / FailAi Platformhigh

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Autogen
  • Ai Platform
  • Model Clients And Providers

Recommended for

AutoGenMicrosoft AutoGen customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.