
Truss And Model Packaging
Baseten · Baseten
AI Model Serving — Baseten
Baseten evals — Truss & Model Packaging (relift v3 InfraRed)
About Baseten
Baseten is a model serving platform that lets ML teams deploy, scale, and monitor any model — including custom fine-tunes and private weights — with production-grade autoscaling and GPU infrastructure. It supports both synchronous and asynchronous inference patterns.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Operator scaffolds a new Truss with `truss init` and edits config.yaml to declare the model. They omit model_metadata.example_model_input. | config.yaml must declare model_metadata (including example_model_input for the in-product playground), python_version, requirements (pinned), and resources.accelerator. Missing example_model_input causes the playground to render without a usable form. Validate config.yaml with `truss config validat… | Pass / FailAi Platformhigh |
| 02 | Operator deploys a 13B-parameter LLM with resources.accelerator: A10G (24GB). | Choose an accelerator whose VRAM fits the model weights + KV cache headroom: 13B FP16 needs ~26GB just for weights; A10G overflows. Select A100 (40/80GB) or H100. Verify VRAM headroom for max_seq_len, not just steady-state. [REQUIRES-VERIFICATION] for current accelerator SKUs and per-class VRAM. | Pass / FailAi Platformcritical |
| 03 | Operator pushes a Truss with model_metadata: {tags: ['llm', 'production']}. Downstream tooling filters by tag. | Treat tags as searchable metadata only; do not encode runtime semantics into tag values. Use the deployments environment system for prod/dev separation, not a 'production' tag. Tags are useful for ownership, modality, license. | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Baseten
- Ai Platform
- Truss And Model Packaging
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.