Facebook parent company Meta is building a massive AI supercomputer with the help of Nvidia DGX A100 systems. The machine should be the most powerful of its kind.
Meta plans to build the world’s most powerful AI supercomputer with the AI Research SuperCluster (RSC). When finished, the system will combine 16,000 Nvidia A100 GPUs with 4,000 AMD Epyc Rome 7742 processors. Each individual compute node is an Nvidia DGX A100 system with two CPUs and eight GPUs. In total, RSC will contain 2,000 notes. The supercomputer will be glued together with the Nvidia Quantum Infiniband interconnect, which has a capacity of 200 GB/s.
Meta expects its RSC to provide five exaflops of mixed-precision (FP16 and FP32) computing power. This will make the cluster more or less an exascale supercomputer. After all, Nvidia and Facebook’s parent company are focusing on AI-specific benchmarks, rather than general benchmarks to officially set a system’s computing power for a place in the Top 500. How the system will perform there is unclear.
Expand
Today, the RSC system already exists with 760 DGX-A100 nices, good for 1,895 petaflops of AI computing performance. Meta will expand the supercomputer to the overall system in the coming months. Mark Zuckerberg’s company hopes to use the supercomputer to train very sophisticated AI models that can support real-time translation, among other things. Furthermore, Meta plans to use the Research SuperCluster for AR-related research.
Meta relies on Nvidia partner Penguin Computing to build the supercomputer. That will roll out the entire infrastructure. Penguin will eventually provide the cluster with one exabyte of superfast storage with a bandwidth of 16 terabytes per second. For that, the company relies on storage technology from Pure Storage.