Google Launches Ironwood: Powerful Yet Efficient Cloud Chip for Inference

Google Launches Ironwood: Powerful Yet Efficient Cloud Chip for Inference

Google introduces Ironwood, a self-developed chip built around AI tensor cores and optimized for inference in the data center. Ironwood is thus an alternative to Nvidia chips.

At Google Cloud Next, Google showcases the Ironwood chip. Ironwood represents the seventh generation of Tensor Processing Unit (TPU) developed by Google itself. TPUs are chips equipped with tensor cores, optimized for AI workloads.

Ironwood is specially developed with inference in mind. The chip is capable of loading and using robust AI models. For this purpose, each Ironwood chip has 192 GB of High Bandwidth Memory (HBM) on board. That’s six times more than predecessor Trillium. The HBM bandwidth per chip has nearly doubled, from 4.5 TBps for Trillium to 7.2 TBps for Ironwood.

Clusters

Google will make the chips available through clusters in the cloud. These come in two sizes. In the entry-level version, Google already combines 256 Ironwood chips, which is essentially an HPC cluster that can support inference at scale.

If you need more, Google also provides ‘pods’ of 9,216 Ironwood chips. In this configuration, Google claims the system delivers 42.5 ExaFlops (FP8). Google claims that the pod is faster than El Capitan, but this claim is baseless: the world’s fastest supercomputer is indeed at least twice as performant. We do assume that a 9,216-chip Ironwood cluster is particularly powerful for AI inference at scale.

Optimization

Ironwood can not only support modern LLMs but is also optimized for more classic AI workloads. Think of recommendation engines: the systems capable of generating personalized offers on web shops. The built-in SparseCore is an accelerator specifically for such tasks.

Google also highlights the efficiency of Ironwood. The chip delivers 29.3 Flops per Watt, compared to only 14.6 for Trillium. Google has not publicly disclosed who manufactures the TPUs or on which node for some time. Given the efficiency, it’s likely an current process node at a major manufacturer like TSMC.

Custom Chips

All major cloud players are working on their own chips. Google was early with its Tensor accelerators, but AWS now also has a large portfolio. Think of the Graviton CPUs, the Trainium training chips, and the Inferentia inference accelerators. For Microsoft, it’s Maia.

With their own (AI) chips, cloud providers can try to make a clearer distinction between their offerings and those of competitors. Moreover, they reduce their dependence on Nvidia’s very expensive chips, which are not always readily available. Ironwood is thus an asset for Google towards customers, as well as insurance against Nvidia dominance.