Deployment Guide
Best practices for deploying models to production at scale.
Choosing instance types
Upbox auto-selects optimal instances based on your model size and latency requirements. Override with `upbox deploy --gpu a100` for specific hardware.
Cold start optimization
Upbox eliminates cold starts with pre-warmed pools. For latency-critical apps, enable `always-on` mode to guarantee sub-10ms response times.
Batch inference
Process large datasets efficiently with batch endpoints. Upload data via `upbox batch create`, monitor progress, and download results when complete.
GPU sharing
Run multiple small models on a single GPU with fractional GPU allocation. Reduce costs by up to 80% for lightweight models.
Custom runtimes
Bring your own Docker image with `upbox deploy --image your-registry/model:tag`. Useful for custom dependencies or specialized inference code.
Environment variables
Inject secrets and configuration via environment variables. Store sensitive values in Upbox Secrets and reference them in your deployment config.