Eval directory
Evals for Cohere
8 evaluation packs covering adversarial robustness, safety gates, workflow quality, and operator-level checks for Cohere AI products.
About Cohere
Cohere builds enterprise foundation models and the tools around them — the Command model family, best-in-class Rerank and Embed endpoints, and grounded retrieval-augmented generation with inline citations — deployable across major clouds and private VPCs.
Available eval packs for Cohere
8 packs ready to run.
Chat Api And Streaming
Cohere evals — Chat API & Streaming (relift v3 InfraRed)
Command Models And Versioning
Cohere evals — Command Models & Versioning (relift v3 InfraRed)
Embed
Cohere evals — Embed (relift v3 InfraRed)
Fine Tuning And Customization
Cohere evals — Fine-tuning & Customization (relift v3 InfraRed)
Rag And Grounded Generation
Cohere evals — RAG & Grounded Generation (relift v3 InfraRed)
Rerank
Cohere evals — Rerank (relift v3 InfraRed)
Safety Deployment And Governance
Cohere evals — Safety, Deployment & Governance (relift v3 InfraRed)
Tool Use And Function Calling
Tool SelectionCohere evals — Tool Use / Function Calling (relift v3 InfraRed)
Why eval Cohere AI
Cohere's AI features ship behind brand promises about accuracy, safety, and reliability. Buyers and integrators need to know those promises hold up under adversarial prompts, edge-case workflows, and the long tail of real customer inputs — not just the demo path.
The Corsac eval library for Cohere measures four dimensions teams care about most when deploying ai platform agents:
- Adversarial robustness — does the agent resist prompt injection, jailbreaks, and social-engineering attempts?
- Workflow quality— does it complete the task buyers were shown in the demo, on inputs that don't look like the demo?
- Safety gates — does it escalate or refuse when it should, and only then?
- Operator quality — does it preserve analyst trust by surfacing the right context at the right time?
Every eval pack above is hand-authored against Cohere's public product surface and runnable in Corsac with your own data.