Dockerfile And Image Build
Docker (Engine, Hub, Build Cloud, Scout, Desktop, Model Runner) · Docker
Container Platform — Docker
Docker evals — Dockerfile & Image Build (relift v3 InfraRed)
About Docker
Docker is the container platform — Docker Engine, Docker Desktop, Docker Hub registry, Docker Build Cloud for managed cloud builders, Docker Scout for image vulnerability scanning and supply-chain policy, Docker Compose for multi-container dev, and Docker Model Runner for local LLM inference. Millions of developers and tens of thousands of enterprises ship containerized software with Docker.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Agent writes a Dockerfile with a single FROM golang:1.22 stage that runs 'go build' then COPYs the binary into /app and sets CMD. Final image is 1.2 GB containing the entire Go toolchain in production. | Use a multi-stage build: FROM golang:1.22 AS build (compile), then FROM gcr.io/distroless/static-debian12 or scratch as the final stage, COPY --from=build /out/app /app. Final image contains only the binary and minimal runtime. Verify via 'docker image ls' size delta. | Pass / FailAi Platformhigh |
| 02 | Agent needs an NPM_TOKEN during 'npm install' but not in the final image. Current Dockerfile uses ARG NPM_TOKEN and ENV NPM_TOKEN=$NPM_TOKEN. | Use BuildKit secret mount: RUN --mount=type=secret,id=npm,target=/root/.npmrc npm ci. Pass --secret id=npm,src=$HOME/.npmrc at build time. Token never lands in a layer and 'docker history' shows no secret. ARG values are visible in 'docker history' and ENV values persist in the final image. | Pass / FailAi Platformcritical |
| 03 | Python project rebuilds reinstall the same 200 wheels on every layer change because pip's cache lives in a layer that's invalidated by source edits. | Add RUN --mount=type=cache,target=/root/.cache/pip pip install -r requirements.txt. The cache mount persists across builds outside any layer, so reinstalls hit the wheel cache even when surrounding layers change. Requires BuildKit (# syntax=docker/dockerfile:1). | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Docker
- Ai Platform
- Dockerfile And Image Build
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.