December 20, 2024•9 min read

Building a 200+ Location Edge Network for AI Inference

The architecture decisions behind our global edge network that delivers sub-50ms latency anywhere on Earth.

EdgeGlobal

The Physics of Latency

Light travels at about 200,000 km/s through fiber optic cables. A round trip from New York to Singapore is roughly 15,000 km - that's 150ms of latency just from physics, before your model even starts computing. The only way to beat physics is to be closer to your users.

Edge Location Selection

We've deployed inference infrastructure in 200+ locations across every continent except Antarctica. Each location is selected based on population density, network connectivity, and proximity to major cloud regions. The result: 95% of the world's internet users are within 50ms of an Upbox edge node.

Model Replication Strategy

Not every model needs to be everywhere. We automatically analyze your traffic patterns and replicate models to the regions where they're needed. A model serving mostly European users doesn't need to be in São Paulo. This intelligent replication keeps costs down while maintaining global performance.

Anycast Routing

When a request hits our network, it's automatically routed to the nearest healthy edge location using anycast BGP routing. This happens at the network layer, before any application logic. If an edge location goes down, traffic automatically flows to the next nearest location. No DNS propagation delays, no manual failover.

Edge-to-Origin Optimization

For models that can't be fully replicated to the edge (like 100B+ parameter models), we use a hybrid approach. Lightweight preprocessing happens at the edge, heavy computation happens in regional hubs, and results are cached at the edge. This gives you edge-like latency even for massive models.

Back to all articles