API Reference

REST endpoints for deploying, managing, and querying your models.

Deployments API

POST `/api/v1/deployments` with your model binary or registry URL to create a new deployment. Returns a deployment ID and live endpoint URL within seconds.

Inference API

POST to your model endpoint with JSON input. Responses include predictions, latency metrics, and request IDs for tracing. Batch inference is supported via `/batch`.

Models API

List all deployed models via `GET /api/v1/models`. Filter by status, framework, or creation date. Each model includes version history and traffic metrics.

Versions API

Manage model versions with `GET /api/v1/models/{id}/versions`. Promote, rollback, or delete versions. Traffic splitting between versions is configurable.

Metrics API

Query real-time and historical metrics via `GET /api/v1/metrics`. Includes latency percentiles, throughput, error rates, and GPU utilization.

Webhooks API

Subscribe to deployment events with `POST /api/v1/webhooks`. Get notified on deployment success, failures, scaling events, and drift alerts.

Rate limits

Inference endpoints have no rate limits - scale is handled automatically. Management APIs allow 1000 requests per minute per API key.

Authentication

All requests require an `Authorization: Bearer <API_KEY>` header. API keys can be scoped to specific models or environments (staging, production).

SDKs

Official SDKs available for Python (`pip install upbox`), TypeScript (`npm install @upbox/sdk`), and Go (`go get github.com/upbox/sdk-go`).

OpenAPI spec

Download the full OpenAPI 3.0 specification at `/api/v1/openapi.json` for code generation and API exploration tools.