Nvidia announces Vera Rubin (again): optimized CPU for Rubin GPU

Datacenter 6min 17.03.'26 09:27 Michaël Aussems

Nvidia is launching its own Vera processor as part of Vera Rubin. With this, the AI specialist is singing a similar tune as at CES, although the performance at GTC 2026 has a few extra verses.

Nvidia is using GTC 2026 to launch Vera Rubin. Anyone who followed the Nvidia news at CES might be experiencing feelings of déjà vu: Nvidia already introduced Rubin and Vera back in January. Although that announcement was fairly complete, including the introduction of the integrated Vera Rubin NVL72 systems, the focus was primarily on the new Rubin GPU. At GTC, the Vera CPU is getting extra time in the spotlight.

Successor to Grace

Vera succeeds Nvidia’s Grace CPU. That chip supported two generations of Nvidia AI servers, starting with Hopper and followed by Blackwell. The ARM-based Grace chip was also part of the Grace Hopper and Grace Blackwell ‘superchips’, which combined CPU and GPU.

Vera is also an ARM chip. The processor is built from 88 Olympus computing cores, developed by Nvidia itself. The cores are compatible with multithreading, allowing Vera to handle 176 threads. Vera has a new efficient memory subsystem and supports LPDDR5X at 1.2 TB/s. The system memory reaches a maximum of 1.5 TB, which is three times more than its predecessor, Grace.

Faster and more efficient (for AI)

Nvidia claims that Vera is 50% faster and twice as efficient as x86 CPUs. ARM does indeed have an efficiency advantage, and Nvidia has optimized this chip for the type of AI workloads it typically uses for benchmarks. In the scenarios Vera is intended for, that seems like a plausible claim.

The Vera CPU was developed with AI workloads in mind. The compilers, runtime engines, analytics pipelines, and agent-based workloads run optimally on the architecture of the computing cores. Vera primarily plays a conducting role: ultimately, it is the (Nvidia) GPUs that handle the bulk of the AI workloads.

Strong together

To that end, as previously announced, Vera is paired with the new Rubin GPU. The connection is made via the NVLink C2C interconnect, which offers 1.8 TB/s of bandwidth.

Vera and Rubin can be found together in the Vera Rubin NVL72 systems, which combine 72 Rubin GPUs with 36 Vera CPUs. Vera and Rubin are, in turn, supported by several other chips developed by Nvidia itself, including the ConnectX-9 SuperNICs and BlueField-4 DPUs. Nvidia is proud that the NVL72 racks are filled with self-built and optimized chips.

Nvidia CEO Jensen Huang calls Vera Rubin a generational leap, though in the AI era, we seem to change generations more often than our underwear. In any case, Vera Rubin is once again significantly faster than Grace Blackwell. Nvidia states that a large LLM with a mixture of experts architecture can be trained with Vera Rubin using only a quarter of the GPUs compared to the previous generation. Inference gets an even bigger boost: throughput is ten times higher, and the cost per token is ten times lower.

CPU cabinet

Nvidia is also introducing a Vera CPU cabinet, containing 265 liquid-cooled Vera CPUs. This should be relevant for large-scale AI factories, where (tens of) thousands of agent-based workloads can run simultaneously. According to Nvidia, such a CPU rack can drive up to 22,500 CPU-driven environments, and with a small footprint. We don’t know exactly how much power such a CPU cabinet requires, but it’s safe to assume that deployment requires a very specialized data center.

Vera CPUs are currently rolling off the production line. The first Vera-based systems will appear in the second half of this year. Nvidia already shared this in January, when the focus was on Rubin.

All major server manufacturers and hyperscalers are embracing the CPU (and Rubin CPU), Nvidia reiterated at its own conference. They won’t all exclusively showcase massive NVL72 systems. For instance, HPE is announcing servers today built around a more modest reference architecture from Nvidia: Nvidia HGX Rubin NVL8, with eight GPUs and, of course, a Vera CPU inside.

Impact

Nvidia’s systems for AI inference and training are the fuel for the AI hype. All major AI developers, from OpenAI to Meta to Google, are hunting for as many powerful systems as possible to build their new models. Vera uses LPDDR5X memory, while the Rubin GPUs that Vera must drive require about 288 GB of HBM4 memory each.

Per server, a Vera Rubin system would estimatedly require about 1,152 TB of SSD capacity to adequately feed the chips. The impressive systems Nvidia is announcing at GTC are thus directly responsible for the shortage of RAM and NAND memory, and the price increase of laptops and smartphones.

While Nvidia’s innovation supports the faster and more efficient development of increasingly capable AI models and inference data centers, it also drives up the cost and limits the availability of classic IT equipment for traditional businesses and consumers worldwide.

The fact that Vera Rubin is more efficient than Grace Blackwell is not a tangible advantage in that regard. It’s not as if customers will order fewer systems and demand for memory will decrease. The hunger for more AI training and inference capacity remains insatiable for the time being.