How Google Lies About the Power of Its Latest Chips, Compared to El Capitan

How Google Lies About the Power of Its Latest Chips, Compared to El Capitan

Google claims that its new Ironwood-based inference clusters in the cloud are 24 times more powerful than the world’s most powerful supercomputer. This claim is easily demonstrable and completely wrong.

At Google Cloud Next, Google announces the Ironwood accelerator, and doesn’t shy away from deception. This powerful chip is optimized for inference, offering an alternative to Nvidia’s GPUs. Google bundles the chip within its cloud platform in clusters or pods with 9,216 Ironwood chips, and then claims that such a cluster effortlessly outperforms the world’s most powerful exascale system.

Bold Claim

On Google’s announcement blog, we currently see a nice video stating that the Ironwood cluster is 24 times more powerful than the most powerful supercomputer at this moment. Further in the description, we read again literally that the Ironwood pod would be 24 times more powerful than El Capitan.

Google spreads the speed claim broadly.

Unfortunately, the comparison is so clearly off the mark that it’s hard to believe Google isn’t intentionally misleading. Let’s look at the details.

ExaFlops

Google claims that its Ironwood cluster has a computing power of 42.5 ExaFlops. Flops stands for Floating point operations per second. For this, it doesn’t use a universal benchmark, but its own test that measures computing power in FP8. This means the test uses numbers stored in eight computer bits. The figure represents how many calculations per second the system can perform with such 8-bit numbers.

The cloud provider then looks at the performance of El Capitan, as advertised by the Top 500 list. It states that El Capitan clocks in at 1.7 ExaFlops, as measured by the standardized Linpack benchmark universally used to compare the computing power of supercomputers.

FP8 vs. FP64

Linpack works with FP64 values. These are numbers stored in 64 bits. FP64 numbers are much more extensive and detailed than the shortened (rounded) FP8 numbers. For illustration: π in FP8 can be viewed as approximately 3.125, in FP64 the value is more like 3.141592653589793. Try doubling FP8-π in your head, and now do the same for FP64-π. One of the two calculations is significantly easier than the other.

For AI workloads, precision is not of utmost importance. Speed is more relevant. That’s why the workloads rely on numbers that take up less memory, like FP8. For other workloads, such a lack of precision can cause problems. In any case, a calculation with FP8 is not equivalent to a calculation with FP64.

Under the hood, accelerators are equipped with optimized systems to handle floating point numbers of variable precision. FP64 is eight times more complex than FP8, but it’s not sufficient to multiply El Capitan’s 1.7 ExaFlops by eight. If we do that, we get to 13.6 ExaFlops and Google Ironwood is still more powerful (but not by a factor of 24).

The Real Power of El Capitan

In practice, variables such as specific hardware and memory bandwidth play a role. El Capitan is equipped with 43,808 AMD Instinct MI300A accelerators. Based on AMD’s specifications, these are each good for 1.96 PetaFlops of computing power (though this could be even more in certain scenarios). Conservatively estimated, El Capitan’s AI accelerators together deliver at least 85 ExaFlops of FP8 computing power. That’s more than double that of the Google Ironwood pod.

Conservatively estimated, El Capitan’s AI accelerators together deliver at least 85 ExaFlops of FP8 computing power.

This makes it clear that Google hasn’t suddenly built an exascale cluster. Even with these figures, the comparison is complex because they’re not based on a measurement with a standardized test. Ironwood is not optimized for FP64 and would probably not score very well on the Linpack benchmark. We would avoid a direct comparison ourselves, but Google apparently doesn’t.

Not Faster but Slower

In any case, the 9,216-chip Google Ironwood cluster is not 24 times faster than El Capitan. In the most optimistic case, where we try to compare the computing power of the El Capitan hardware in FP8 for which Ironwood is optimized, Google’s cluster is barely half as performant.

This makes sense: Google incorporates 9,216 Ironwood chips in its HPC solution, while El Capitan combines 43,808 of AMD’s latest accelerators in a custom-built system.

We don’t know how Google decided it was a good idea to compare an ExaFlops value in FP8 from its own test with an FP64 value from another test. We have asked Google for a response. It’s understandable for a complete layman to make this mistake: after all, it says ExaFlops twice. However, anyone somewhat familiar with the subject should at least understand that comparing FP64 and FP8 in this way is like comparing apples and oranges.

Perhaps the marketing team asked Google Gemini AI for advice