字节跳动万卡集群网络分析

本文主要是介绍字节跳动万卡集群网络分析，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

从公开的信息披露，截至2023年9月，字节跳动已经建立超过一万张的英伟达Ampere架构GPU集群，目前正在建设Hopper架构的集群。英伟达Ampere架构主要包括A100和A800型号的芯片，Hopper架构相较前者则更新，主要包括H100和H800芯片

字节和北大公布的论文，关于网络拓扑的描述主要是其中一章节：

Network topology. Our datacenter network is built with highperformance switches based on Broadcom Tomahawk 4 chips. The total bandwidth of each Tomahawk chip is 25.6Tbps with 64×400Gbps ports. Three layers of switches are connected in a CLOS-like topology to connect more than 10,000 GPUs. For switches at each layer, the bandwidth percentage between downlink and uplink is 1:1. That is, 32 ports are used as downlink and 32 ports are used as uplink. The network provides high bandwidth with a small diameter. Every node can communicate with other nodes within a limited number of hops.

Reducing ECMP hashing conflicts. We carefully design the network topology and schedule network traffic to reduce ECMP hashing conflicts. First, at the top-of-rack (ToR) switch level, one 400G downlink port is split into two 200G downlink ports with specific AOC cables. The conflict probability is reduced as the bandwidth of each uplink is double of that of a downlink. Second, eight 200G NICs on the server is connected to eight different switches in a multi-rail way. The number of GPU servers connected by the same sets of ToR switches can reach 64. And we strategically schedule the dataintensive nodes from our training tasks to operate under the

这篇关于字节跳动万卡集群网络分析的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！