Flat Datacenter Networks at Scale at Amazon
Posted by tanelpoder 21 hours ago
Comments
Comment by epistasis 4 hours ago
> The results were striking: compared to traditional fat-tree networks, RNG (Resilient Network Graphs) uses 69% fewer routers, delivers 33% higher throughput, cuts network power by 40%, and lowers operating costs by 27. In early 2026, RNG became the default design for most newly built Amazon data centers globally.
> For cabling, they developed the ShuffleBox—a passive optical device whose internal wiring combined with randomized ShuffleBox-to-ShuffleBox cabling yields “quasi-random” graphs that behave like truly random graphs.
This is pretty incredible, random layouts of networks that have on-average better properties...
I'm really curious about the long tail of performance though. What is the worst case scenario here? And are there some better case scenarios? Uniformity in Clos networks is pretty great, but many loads don't need uniformity, and if these RNG-based networks have non-uniformity, perhaps that has operational characteristics that can be helpful or harmful.
Comment by UltraSane 3 hours ago
Comment by epistasis 2 hours ago
I think Section 9, and Figures 13/14 in the Arxiv preprint sort of address this, but it doesn't mention anything about accounting for real-world failures in fat trees. I haven't had a chance to read it all, though...
Comment by dhruvrrp 42 minutes ago
Edit: Answering my own question, Jellyfish proved theoretically that random networks can be better, and this is a working implementation based on that solves the problems with creating/operating random networks.
Comment by socketcluster 4 hours ago
It's interesting to see it being done at the data centre level as well.
Comment by trumpdong 41 seconds ago
Comment by protocolture 1 hour ago
Comment by wmf 1 hour ago
Comment by jeffbee 1 hour ago
Comment by jsolson 53 minutes ago
Comment by fdr 3 hours ago
Comment by kev009 4 hours ago
Comment by wofo 2 hours ago
Comment by mattclarkdotnet 2 hours ago
Comment by jeffbee 2 hours ago
Comment by mino 3 hours ago
Comment by jeffbee 1 hour ago
Comment by cyberax 2 hours ago
It's not cheap, and it's limited to `us-east-1`, but it's at least _possible_ now via AWS Interconnect: https://aws.amazon.com/interconnect/lastmile/pricing/
Comment by wmf 2 hours ago
Comment by tanelpoder 21 hours ago