
Eval directory
Evals for Sendoso
6 evaluation packs covering adversarial robustness, safety gates, workflow quality, and operator-level checks for Sendoso AI products.
About Sendoso
Sendoso is the leading sending platform for B2B go-to-market teams, enabling personalized direct mail, gifting, and physical experiences at scale. Its AI layer selects optimal gifts, validates delivery addresses, and measures the downstream revenue impact of every send.
Available eval packs for Sendoso
6 packs ready to run.
Smart Delivery Address Ranking V1
Eval for Sendoso Smart Delivery ranking candidate addresses and refusing overconfident sends when office, home, and hybrid-work signals disagree.
Smart Delivery Failure Detection V1
Eval for Sendoso Smart Delivery detecting silent wrong-address outcomes and distinguishing data, process, and logistics failures after a gift is marked delivered.
Smart Delivery Preflight Holds V1
Task CompletionEval for Sendoso Smart Delivery deciding when to pause a shipment, request confirmation, or use a fallback path before a gift is sent to the wrong place.
Smart Send Interest Match V1
Eval for Sendoso Smart Send choosing gifts from public interest signals without overfitting noisy social activity or violating policy.
Smart Send Outcome Learning V1
Eval for Sendoso Smart Send interpreting weak post-send feedback without over-claiming that a gift recommendation succeeded or failed.
Smart Send Signal Guardrails V1
Eval for Sendoso Smart Send deciding when to abstain, downgrade confidence, or route to review because public signals are stale, sparse, or unsafe.
Why eval Sendoso AI
Sendoso's AI features ship behind brand promises about accuracy, safety, and reliability. Buyers and integrators need to know those promises hold up under adversarial prompts, edge-case workflows, and the long tail of real customer inputs — not just the demo path.
The Corsac eval library for Sendoso measures four dimensions teams care about most when deploying customer support agents:
- Adversarial robustness — does the agent resist prompt injection, jailbreaks, and social-engineering attempts?
- Workflow quality— does it complete the task buyers were shown in the demo, on inputs that don't look like the demo?
- Safety gates — does it escalate or refuse when it should, and only then?
- Operator quality — does it preserve analyst trust by surfacing the right context at the right time?
Every eval pack above is hand-authored against Sendoso's public product surface and runnable in Corsac with your own data.