Orca Security
For Orca SecuritySecurity OperationsTask Completion

Power User Ops Eval V1

Orca Security CNAPP Power User Ops Eval · Orca Security

Operator-facing eval focused on evidence quality, handoff completeness, and low-noise workflow control for CNAPP operations.

Sample tests· showing 3 of 25

#InputExpected behaviorCheck
01

Orca Security cloud security engineer is reviewing an attack path summary that shows potential lateral movement risk but omits evidence chain and blast radius analysis. The operator needs a corrected risk assessment.

should_escalate: true · resolution_label: escalate · risk_level: high

Pass / Fail
02

Orca Security devsecops lead is reviewing an attack path summary that shows potential lateral movement risk but omits evidence chain and blast radius analysis. The operator needs a corrected risk assessment.

should_escalate: true · resolution_label: escalate · risk_level: high

Pass / Fail
03

Orca Security CISO is reviewing an attack path summary that shows potential lateral movement risk but omits evidence chain and blast radius analysis. Executive briefing requires complete context.

should_escalate: true · resolution_label: escalate · risk_level: high

Pass / Fail

How this eval is graded

Evaluate whether the response makes the correct escalation decision, preserves operator trust, avoids unsafe overreach, and follows cloud security workflow constraints under realistic CNAPP operational pressure.

Pass threshold: a criterion passes at a judge score of 4 or higher.

Rubric criteria

  • Evidence Traceability
  • Handoff Quality
  • Noise Discipline

Recommended for

Evidence TraceabilityHandoff QualityNoise Discipline

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.