Deployment Guide

Best practices for deploying models to production at scale.

Choosing instance types

Upbox auto-selects optimal instances based on your model size and latency requirements. Override with `upbox deploy --gpu a100` for specific hardware.

Cold start optimization

Upbox eliminates cold starts with pre-warmed pools. For latency-critical apps, enable `always-on` mode to guarantee sub-10ms response times.

Batch inference

Process large datasets efficiently with batch endpoints. Upload data via `upbox batch create`, monitor progress, and download results when complete.

Custom runtimes

Bring your own Docker image with `upbox deploy --image your-registry/model:tag`. Useful for custom dependencies or specialized inference code.

Environment variables

Inject secrets and configuration via environment variables. Store sensitive values in Upbox Secrets and reference them in your deployment config.

Governance & Compliance

Was this page helpful?