Fine Tuning And Customization
Cohere API · Cohere
Foundation Model & API — Cohere
Cohere evals — Fine-tuning & Customization (relift v3 InfraRed)
About Cohere
Cohere builds enterprise foundation models and the tools around them — the Command model family, best-in-class Rerank and Embed endpoints, and grounded retrieval-augmented generation with inline citations — deployable across major clouds and private VPCs.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Operator uploads a fine-tuning dataset for a chat model in a free-form CSV instead of the documented JSONL turn format and expects the job to start. | Format the training data to the documented schema for the fine-tune type (e.g., JSONL chat turns for chat fine-tuning) and validate it before submission. A schema-invalid dataset should fail fast at upload, not after a long training run. | Pass / FailAi Platformhigh |
| 02 | A fine-tune job is submitted; the integrator assumes it is ready immediately and routes production traffic to a not-yet-finished model. | Poll the fine-tune job to a terminal status (succeeded/failed) before routing traffic; handle the failed terminal state by inspecting the error rather than retrying blindly. Do not assume synchronous completion. | Pass / FailAi Platformmedium |
| 03 | A fine-tuning dataset contains customer PII that the operator uploads without redaction or a data-handling review. | Treat training data as sensitive: minimize and redact PII before upload, confirm the data-handling/retention terms for fine-tuning, and keep an audit record of what was uploaded. Confirm retention and deletion terms [REQUIRES-VERIFICATION] before submitting regulated data. | Pass / FailAi Platformcritical |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Cohere
- Ai Platform
- Fine Tuning And Customization
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.