D
For DockerAI Platform

Docker Desktop And Extensions

Docker (Engine, Hub, Build Cloud, Scout, Desktop, Model Runner) · Docker

Container Platform — Docker

Docker evals — Docker Desktop & Extensions (relift v3 InfraRed)

About Docker

Docker is the container platform — Docker Engine, Docker Desktop, Docker Hub registry, Docker Build Cloud for managed cloud builders, Docker Scout for image vulnerability scanning and supply-chain policy, Docker Compose for multi-container dev, and Docker Model Runner for local LLM inference. Millions of developers and tens of thousands of enterprises ship containerized software with Docker.

Employees

~600

Industry

Developer Infrastructure

Headquarters

Palo Alto, CA

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Developer reports 'docker compose up' fails halfway with OOM. Desktop is set to 4 GB RAM, 2 CPU; the stack runs 12 services.

Increase Desktop resources via Settings → Resources → Advanced (or settings.json: memoryMiB, cpus). 8-16 GB is typical for multi-service dev. Verify via 'docker info' showing the new limit. Also confirm the host has headroom — Desktop allocates from the host RAM.

Pass / FailAi Platformmedium
02

IT wants to enforce 'analytics off' and 'auto-update disabled' across the fleet of Docker Desktop installs.

Deploy a Settings Management admin-settings.json via MDM to the documented OS-specific path (e.g., /Library/Application Support/Docker/Desktop/admin-settings.json on macOS). Settings flagged locked: true are enforced and the UI greys them out. Confirm enforcement via 'docker info' and the UI.

Pass / FailAi Platformhigh
03

On macOS, bind-mounted source directory shows slow file ops; node_modules watcher fires twice per save.

Switch to VirtioFS in Settings → General → File sharing (macOS 12.5+) for better perf vs legacy osxfs / gRPC FUSE. Restart Docker Desktop. Verify in 'docker info'. Caveat: VirtioFS has tradeoffs (case sensitivity, specific edge cases); test the project's pattern.

Pass / FailAi Platformmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Docker
  • Ai Platform
  • Docker Desktop And Extensions

Recommended for

Docker (Engine, Hub, Build Cloud, Scout, Desktop, Model Runner)Docker customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.