Deployment Guide

Best practices for deploying models to production at scale.

Choosing instance types

Upbox auto-selects optimal instances based on your model size and latency requirements. Override with `upbox deploy --gpu a100` for specific hardware.

Cold start optimization

Upbox eliminates cold starts with pre-warmed pools. For latency-critical apps, enable `always-on` mode to guarantee sub-10ms response times.

Batch inference

Process large datasets efficiently with batch endpoints. Upload data via `upbox batch create`, monitor progress, and download results when complete.

GPU sharing

Run multiple small models on a single GPU with fractional GPU allocation. Reduce costs by up to 80% for lightweight models.

Custom runtimes

Bring your own Docker image with `upbox deploy --image your-registry/model:tag`. Useful for custom dependencies or specialized inference code.

Environment variables

Inject secrets and configuration via environment variables. Store sensitive values in Upbox Secrets and reference them in your deployment config.