Docker Scout
Docker (Engine, Hub, Build Cloud, Scout, Desktop, Model Runner) · Docker
Container Platform — Docker
Docker evals — Docker Scout (relift v3 InfraRed)
About Docker
Docker is the container platform — Docker Engine, Docker Desktop, Docker Hub registry, Docker Build Cloud for managed cloud builders, Docker Scout for image vulnerability scanning and supply-chain policy, Docker Compose for multi-container dev, and Docker Model Runner for local LLM inference. Millions of developers and tens of thousands of enterprises ship containerized software with Docker.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Agent wants the vulnerability list for myorg/api:v1.2.3 to triage before a release. | Run 'docker scout cves myorg/api:v1.2.3'. Output lists CVEs by severity with affected package, fixed-in version, and base image. Pipe through '--only-severity critical,high' to filter. Numeric CVE counts and timing are point-in-time — vulnerability DBs update daily. | Pass / FailAi Platformhigh |
| 02 | Org has policy 'no-fixable-critical-cves' enabled. New image fails the policy with 3 fixable critical CVEs in libssl. | Run 'docker scout policy myorg/api:v1.2.3 --org myorg' to see policy results. Failing policy means the image violates the bar (e.g., fixable criticals exist). Resolve by bumping the base image (per 'scout recommendations') or updating affected packages and re-scanning. Do not merge until policy pas… | Pass / FailAi Platformcritical |
| 03 | PR pipeline needs a sub-minute scan summary before letting devs proceed. | Run 'docker scout quickview myorg/api:pr-1234'. Returns a compact summary: vuln counts by severity for base + image, recommended base, and policy status. Suitable for PR comments. Detailed triage uses 'cves' or 'policy'. | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Docker
- Ai Platform
- Docker Scout
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.