Docker Build Cloud
Docker (Engine, Hub, Build Cloud, Scout, Desktop, Model Runner) · Docker
Container Platform — Docker
Docker evals — Docker Build Cloud (relift v3 InfraRed)
About Docker
Docker is the container platform — Docker Engine, Docker Desktop, Docker Hub registry, Docker Build Cloud for managed cloud builders, Docker Scout for image vulnerability scanning and supply-chain policy, Docker Compose for multi-container dev, and Docker Model Runner for local LLM inference. Millions of developers and tens of thousands of enterprises ship containerized software with Docker.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Operator wants Docker Build Cloud builds from CI. CI currently uses the default 'docker' driver which only builds for the runner's architecture. | Run 'docker buildx create --driver cloud <org>/<builder-name> --use'. Subsequent 'docker buildx build' streams the build to the cloud builder endpoint; native ARM and amd64 stages run on dedicated hardware (no QEMU). Authenticate to Docker Hub first (docker login). | Pass / FailAi Platformhigh |
| 02 | Team builds Python+native-deps images for both linux/amd64 and linux/arm64. Local QEMU build for ARM takes 12 min; native ARM stage from CI ~2 min. | Run 'docker buildx build --builder <cloud> --platform linux/amd64,linux/arm64 --push'. Build Cloud executes amd64 stages on amd64 hosts and arm64 stages on arm64 hosts — no emulation. Net speedup is workload-dependent [REQUIRES-VERIFICATION on the 6x figure]. | Pass / FailAi Platformmedium |
| 03 | Developer A and developer B run the same build with identical deps. A's build is cold; B (later) gets a cache hit by sharing the cloud builder. | Build Cloud maintains per-builder cache shared across all users of that builder. Both A and B target the same <org>/<builder>; B's overlapping layers are imported from cache. Verify in build output (CACHED lines). Cache eviction policy is provider-managed. | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Docker
- Ai Platform
- Docker Build Cloud
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.