Itdaily - The race to the megawatt: why we are cramming more and more computing power into a single rack

The race to the megawatt: why we are cramming more and more computing power into a single rack

The race to the megawatt: why we are cramming more and more computing power into a single rack

Server racks are evolving at a breakneck pace from a few dozen kilowatts to a full megawatt per cabinet. HPE explains why that extreme density is not a goal in itself, but a physical necessity for modern AI and HPC workloads.

HPE says it constantly receives questions from customers about why a rack must consume 400 kilowatts or more. The answer is simple: proximity. The closer the computing nodes and the network are to each other, the faster the system performs as a whole. This aligns perfectly with the mantra of Cray founder Seymour Cray. He always emphasized that it is not the speed of individual components that counts, but the speed of the system.

Cray systems achieve low latency and high performance by coupling everything very closely and using copper interconnects instead of, for example, InfiniBand. This keeps costs lower and performance high. We see the same principle now with AI configurations such as an Nvidia Vera Rubin NVL72, where an all-to-all topology requires next-generation GPUs of two to three kilowatts to be placed as close together as possible.

Heat has to go somewhere

That tight coupling comes at a price. When you pack components so closely together, they become thermally limited: it gets increasingly hotter in ever smaller spaces. Air cooling simply becomes impossible, and liquid cooling can no longer be an afterthought. Chip designers must consider how the whole unit will be cooled as early as the circuit board and chip design phase—a fundamentally different design philosophy than in the past, when a random heat sink would dissipate the heat.

The big question is where the limit lies. No one knows exactly, because a one-megawatt rack is already looming on the horizon, more than double the densest racks today. What comes after that—two megawatts, five megawatts—depends on what the chip requires. HPE emphasizes that no one designs a five-megawatt rack for the sake of the design itself; there must be a concrete benefit to packing things so closely together.

Furthermore, chipmakers are consciously slowing themselves down. According to HPE, they could technically design a five-megawatt GPU, but then no one on earth would have the technology to cool it, and who would buy it? AMD, Nvidia, and server builders all work with the same suppliers of CDUs (Coolant Distribution Units) and manifolds.

As a result, there is an increasing collaborative dialogue about the supply chain, where everyone knows in advance where the market needs to be in two years. In the past, a megawatt rack was a surprise that the entire chain had to scramble to catch up with. Today, these leaps are planned together, precisely to avoid the entire chain being unable to keep up.

Power and cooling go hand in hand

Increasing density is never about cooling alone. HPE notes that customers usually come knocking when they have thermal problems. However, as soon as you offer an 80-kilowatt rack, it often turns out that the same customer cannot even get that much power delivered.

Power and cooling must grow at the same pace. An extreme shortage of either will block progress regardless. That is why the race to the megawatt cabinet is just as much a story about power supply, with the transition to 800-volt DC as one of the next major infrastructure interventions.