
Training And Finetuning
Baseten · Baseten
AI Model Serving — Baseten
Baseten evals — Training & Fine-tuning (relift v3 InfraRed)
About Baseten
Baseten is a model serving platform that lets ML teams deploy, scale, and monitor any model — including custom fine-tunes and private weights — with production-grade autoscaling and GPU infrastructure. It supports both synchronous and asynchronous inference patterns.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Operator POSTs to /v1/training/jobs with base_model, dataset_uri (s3://...), GPU class (H100), and hyperparameters. Response carries job_id. | Persist job_id immediately with dataset hash + hyperparameter snapshot for reproducibility. Status transitions PENDING → RUNNING → SUCCEEDED|FAILED|CANCELED. On SUCCEEDED, fetch checkpoint URI from the job record and validate it deserializes before deploying. | Pass / FailAi Platformhigh |
| 02 | Dataset is 12 GB in operator's S3 bucket. Operator passes dataset_uri=s3://my-bucket/data.jsonl in the job spec. | Baseten reads dataset_uri using the workspace's configured S3 credentials. Verify the workspace has IAM access to the bucket BEFORE submitting the job; otherwise the job will start, fail at dataset-load, and consume queue time. Use workspace-attached storage when cross-account IAM is friction. | Pass / FailAi Platformcritical |
| 03 | Fine-tuning a 13B model with LoRA. Operator picks H100 80GB for 'speed.' | For LoRA fine-tuning of a 13B model, A100 80GB is typically sufficient and cheaper than H100. Pick H100 only for full-parameter fine-tunes or when throughput-per-dollar wins. Compute $/epoch across SKUs, not $/hour, before committing. [REQUIRES-VERIFICATION] for current GPU-second prices. | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Baseten
- Ai Platform
- Training And Finetuning
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.