API Reference
REST endpoints for deploying, managing, and querying your models.
Deployments API
POST `/api/v1/deployments` with your model binary or registry URL to create a new deployment. Returns a deployment ID and live endpoint URL within seconds.
Inference API
POST to your model endpoint with JSON input. Responses include predictions, latency metrics, and request IDs for tracing. Batch inference is supported via `/batch`.
Models API
List all deployed models via `GET /api/v1/models`. Filter by status, framework, or creation date. Each model includes version history and traffic metrics.
Versions API
Manage model versions with `GET /api/v1/models/{id}/versions`. Promote, rollback, or delete versions. Traffic splitting between versions is configurable.
Metrics API
Query real-time and historical metrics via `GET /api/v1/metrics`. Includes latency percentiles, throughput, error rates, and GPU utilization.
Webhooks API
Subscribe to deployment events with `POST /api/v1/webhooks`. Get notified on deployment success, failures, scaling events, and drift alerts.
Rate limits
Inference endpoints have no rate limits - scale is handled automatically. Management APIs allow 1000 requests per minute per API key.
Authentication
All requests require an `Authorization: Bearer <API_KEY>` header. API keys can be scoped to specific models or environments (staging, production).
SDKs
Official SDKs available for Python (`pip install upbox`), TypeScript (`npm install @upbox/sdk`), and Go (`go get github.com/upbox/sdk-go`).
OpenAPI spec
Download the full OpenAPI 3.0 specification at `/api/v1/openapi.json` for code generation and API exploration tools.