Docker Desktop And Extensions
Docker (Engine, Hub, Build Cloud, Scout, Desktop, Model Runner) · Docker
Container Platform — Docker
Docker evals — Docker Desktop & Extensions (relift v3 InfraRed)
About Docker
Docker is the container platform — Docker Engine, Docker Desktop, Docker Hub registry, Docker Build Cloud for managed cloud builders, Docker Scout for image vulnerability scanning and supply-chain policy, Docker Compose for multi-container dev, and Docker Model Runner for local LLM inference. Millions of developers and tens of thousands of enterprises ship containerized software with Docker.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Developer reports 'docker compose up' fails halfway with OOM. Desktop is set to 4 GB RAM, 2 CPU; the stack runs 12 services. | Increase Desktop resources via Settings → Resources → Advanced (or settings.json: memoryMiB, cpus). 8-16 GB is typical for multi-service dev. Verify via 'docker info' showing the new limit. Also confirm the host has headroom — Desktop allocates from the host RAM. | Pass / FailAi Platformmedium |
| 02 | IT wants to enforce 'analytics off' and 'auto-update disabled' across the fleet of Docker Desktop installs. | Deploy a Settings Management admin-settings.json via MDM to the documented OS-specific path (e.g., /Library/Application Support/Docker/Desktop/admin-settings.json on macOS). Settings flagged locked: true are enforced and the UI greys them out. Confirm enforcement via 'docker info' and the UI. | Pass / FailAi Platformhigh |
| 03 | On macOS, bind-mounted source directory shows slow file ops; node_modules watcher fires twice per save. | Switch to VirtioFS in Settings → General → File sharing (macOS 12.5+) for better perf vs legacy osxfs / gRPC FUSE. Restart Docker Desktop. Verify in 'docker info'. Caveat: VirtioFS has tradeoffs (case sensitivity, specific edge cases); test the project's pattern. | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Docker
- Ai Platform
- Docker Desktop And Extensions
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.