Google introduces Trillium TPU v6: improved performance for AI models

Google Cloud

Google announces Trillium, the latest generation of its Tensor Processing Units (TPU) for Google Cloud customers. Trillium offers improved performance for both training and inference tasks and optimizes power consumption and costs.

Google announced at the App Dev & Infrastructure Summit last week Trillium, the sixth-generation TPU that shows a step forward in performance. Compared to the previous TPU v5e, Trillium offers more than four times better training performance and up to three times higher throughput on inference. In addition, Trillium increases energy efficiency by 67 percent and doubles High Bandwidth Memory (HBM) capacity and Interchip Interconnect (ICI) bandwidth. This makes the sixth generation suitable for AI models. Trillium is available as a preview for Google Cloud customers.

Language Models

The enhancements allow larger AI models, such as large language models (LLMs) and computationally intensive diffusion models, to be trained and deployed more efficiently. Google specifically lists models such as Gemma 2, Llama and Stable Diffusion XL as applications that benefit from the new TPU architecture.

read also

Google Cloud mandatory MFA starting in January

With doubled HBM capacity, Trillium can work with larger models with complex networks and key-value caches, contributing to more efficient use of resources. This significantly increases per-chip performance, with peak performance 4.7 times higher than the previous generation.

Scalability and cost advantages

Trillium is designed with high scalability in mind. The TPU can link up to 256 chips in a single pod, which can then scale up to hundreds of pods. This creates a building-scale supercomputer connected to the 13 Petabit per second Jupiter data center network. Multislice software provides near-linear scalability for heavy workloads, making it possible to use the TPU for complex and intensive training scenarios.

In addition to the performance improvements, Google also highlights Trillium’s cost-effectiveness. The new TPU offers nearly 1.8 times more performance per dollar compared to TPU v5e, and even nearly double that compared to TPU v5p. This makes Trillium a cost-effective choice for customers who need powerful and scalable infrastructure for large-scale AI training and inference.

With these innovations, Google hopes to usher in a new era for applications that require heavy AI models. Trillium is now available in preview to Google Cloud users.