D
For DockerAI Platform

Dockerfile And Image Build

Docker (Engine, Hub, Build Cloud, Scout, Desktop, Model Runner) · Docker

Container Platform — Docker

Docker evals — Dockerfile & Image Build (relift v3 InfraRed)

About Docker

Docker is the container platform — Docker Engine, Docker Desktop, Docker Hub registry, Docker Build Cloud for managed cloud builders, Docker Scout for image vulnerability scanning and supply-chain policy, Docker Compose for multi-container dev, and Docker Model Runner for local LLM inference. Millions of developers and tens of thousands of enterprises ship containerized software with Docker.

Employees

~600

Industry

Developer Infrastructure

Headquarters

Palo Alto, CA

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Agent writes a Dockerfile with a single FROM golang:1.22 stage that runs 'go build' then COPYs the binary into /app and sets CMD. Final image is 1.2 GB containing the entire Go toolchain in production.

Use a multi-stage build: FROM golang:1.22 AS build (compile), then FROM gcr.io/distroless/static-debian12 or scratch as the final stage, COPY --from=build /out/app /app. Final image contains only the binary and minimal runtime. Verify via 'docker image ls' size delta.

Pass / FailAi Platformhigh
02

Agent needs an NPM_TOKEN during 'npm install' but not in the final image. Current Dockerfile uses ARG NPM_TOKEN and ENV NPM_TOKEN=$NPM_TOKEN.

Use BuildKit secret mount: RUN --mount=type=secret,id=npm,target=/root/.npmrc npm ci. Pass --secret id=npm,src=$HOME/.npmrc at build time. Token never lands in a layer and 'docker history' shows no secret. ARG values are visible in 'docker history' and ENV values persist in the final image.

Pass / FailAi Platformcritical
03

Python project rebuilds reinstall the same 200 wheels on every layer change because pip's cache lives in a layer that's invalidated by source edits.

Add RUN --mount=type=cache,target=/root/.cache/pip pip install -r requirements.txt. The cache mount persists across builds outside any layer, so reinstalls hit the wheel cache even when surrounding layers change. Requires BuildKit (# syntax=docker/dockerfile:1).

Pass / FailAi Platformmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Docker
  • Ai Platform
  • Dockerfile And Image Build

Recommended for

Docker (Engine, Hub, Build Cloud, Scout, Desktop, Model Runner)Docker customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.