Connect with us

Tech

Intel Shoots “Granite Rapids” Xeon 6 Into The Datacenter

Published

on

Intel Shoots “Granite Rapids” Xeon 6 Into The Datacenter


Intel has been talking about its “Granite Rapids” Xeon 6 processors for so long that it would be easy to forget that they have not yet been formally announced.

But today, the high end of the “Granite Rapids” server CPU lineup makes its debut, several weeks before AMD is widely expected to announced its “Turin” sixth generation Epyc processors, and while we think AMD will continue to make market share gains, the combination of Granite Rapids plus the “Sierra Forest” Xeon 6 chips announced in June of this year will help Intel slow the CPU market share losses in the datacenter, even if it doesn’t reverse the trends.

And honestly, given the chip manufacturing process lead that AMD still has thanks to its partnership with Taiwan Semiconductor Manufacturing Co and Intel’s own woes with its foundry operations, this is the best you can expect.

As we have pointed out many times, there are design wins and supply wins, and while prior generations of Xeons were clearly only supply wins, it is fair to say that both Sierra Forest and Granite Rapids are starting to get some design wins even if what Intel is selling is still due mostly to supply wins.

The chiplet package and architecture of the E-core and P-core variants of the Xeon 6 chips, short for “efficiency” and “performance” in the Intel lingo, were divulged way back at Hot Chips 2023, for which you can read our coverage here, and our deep dive into Sierra Forest from this summer, Intel Brings A Big Fork To A Server CPU Knife Fight, fills in many of the gaps in the Xeon 6 technology and strategy. So without much fuss, we are going to just jump into the Granite Rapids lineup and the roadmap for future Xeon 6 chips early next year.

We will, of course, do an architectural deep dive on Granite Rapids subsequent to this initial story. And we will do a review of the competitive analysis Intel has done pitting the Granite Rapids against the current fourth generation “Genoa” Epyc 9004 chips from November 2022 and the “Bergamo” Epyc 97X4 chips (which have their core counts cranked like Sierra Forest) from June 2023 and the impending “Turin” Epycs that are expected soon. (The AMD Advancing AI 2024 event in San Francisco on October 10 is a good guess when Turin will be unveiled.)

The Granite Rapids processors are based on the “Redwood Cove” P-core, am update to the “Golden Cove” core used in Sapphire Rapids and Emerald Rapids. The Redwood Cove core offers from 5 percent to 7 percent more instructions per clock (IPC) on integer workloads compared to the Golden Cove core, which is a nominal increase but an increase nonetheless. We are taking the midpoint of 6 percent higher IPC for our comparisons to prior generations of Xeons. And we were cautioned to not focus too much on this commonly used metric. (We don’t think we do, by the way, but it has its uses.)

“I did give a little lecture recently that there is too much focus on IPC,” Ronak Singhal, senior Intel Fellow and chief architect of the Xeon 6 line, tells The Next Platform. “Specifically, if my internal team comes to me and offers me a core with 5 percent IPC and a core with 15 percent IPC, which is better for Xeon? The answer is it depends on other parameters, particularly power. If the 5 percent IPC option costs me 0 percent more power but the 15 percent IPC option costs 30 percent more power, then on average the two options are about the same in a power-constrained world and one is likely less complex. So, while everyone likes discussing IPC, we really need to talk about power-constrained performance. I say this all because the core in Granite Rapids focuses more on power reduction in many ways than IPC uplift.”

Fair enough, and it makes sense. Look at it this way. If you took two Emerald Rapids CPUs (which means four chiplets) kept them at the Intel 7 (really a 10 nanometer) you would create a 112-core compute complex that would weigh in at over 700 watts and would be twice the socket size. If you took the same two Emerald Rapids CPUs (again, four chiplets) and shrunk them to Intel 3 (some say akin to a 5 nanometer process, others say more like a 3 nanometer process), you could double the performance in roughly the same wattage just due to process shrink, but it would probably be close to 700 watts again, which is 2X compared to the original chip.

With Granite Rapids, however, Intel boosted the core count by 2.3X to 120 cores from 56 cores with these two prior P-core processors and the power only went up to 500 watts for the top bin part, an increase of only 1.4X.

The situation is a bit more complex than that, of course, because the Granite Rapids and Sierra Forrest use a mix of Intel 3 and Intel 7 processes for the multiple chiplets in the package. With Sapphire Rapids and Emerald Rapids, Intel kept I/O and memory controllers on the same chiplets as the compute cores. But with Sierra Forest and Granite Rapids, the I/O and memory dies are separated from the compute cores, and implemented in different processes, like this:

There are four different P-core compute die and I/O die combinations in the Xeon 6 family, one of which – the top-end Ultra Core Count, or UCC variant – is being introduced today.

Granite Rapids Xeon 6 variants with fewer numbers of compute tiles – two for the Extreme Core Count (XCC) variant or one for the High Core Count (HCC) variant) – and one with a smaller compute tile as well as two I/O dies, called the Low Core Count (LCC), are coming down the pike sometime in 2025.

Here is what the core die packages look like:

The Granite Rapids UCC package announced today is called the Xeon 6 6900P, and it includes DDR5 memory that runs up to 6.4 GHz and multiplexed rank (MRDIMM) memory that can push that up to 8.8 GHz. The socket, thanks to the two I/O dies – which are constant across UCC, XCC, HCC, and LCC and which allow for any of these chips to plug directly into any “Birch Stream” platform that also supports Sierra Forest and its follow-on, “Clearwater Forest” due sometime next year in Intel’s 18A (1.8 nanometer) process.

The Granite Rapids package supports up to 96 PCI-Express 5.0 lanes, which can also run the CXL 2.0 coherent memory protocol. The packages also have up to 504 MB of L3 cache, which is ginormous compared to what Intel normally does.

As far as we know, there is not a variant of the Granite Rapids chips announced today that supports four-socket and eight-socket servers, which is a shame. The same was true of Sierra Forest Xeon 6 (which we expected given its use case), and for the prior fifth generation “Emerald Rapids” Xeon SP v5 chips launched in December 2023, which was a broader Xeon SP product line and which could have had extended NUMA clustering. You have to go back to “Sapphire Rapids” Xeon SP v4 chips from January 2023 to get a CPU from Intel that can support four-way and eight-way NUMA.

By the way, with six UltraPath Interconnect NUMA links running at 24 GT/sec, there is no technical reason why Intel and its OEM and ODM partners cannot make a NUMA machine with more than two sockets with these Granite Rapids chips. That is plenty of oomph and enough links, for sure.

Intel has not divulged the number of cores on the Granite Rapids compute tiles, but depending on what you think Intel’s yield is for its Intel 3 process, you would be reasonable to guess 48 cores or 45 cores. For the UCC variant with 128 cores, you have to yield an uneven number across those dies to make it work out. (We hate when things do not divide evenly, or even worse, do not divide by 2.) Each compute die has four DDR5 memory controllers, for a total of twelve like most high-end CPUs have today, and with MRDIMM memory, the effective bandwidth is 2.3X higher on Granite Rapids than on Emerald Rapids.

Here is a nice summary chart showing the differences between the Xeon 6 P-core and E-core variants:

Even though the P-core and E-core variants of the Xeon 6 processors are using the same I/O dies, it is clear that not all of their features are activated in the E-core versions. You will note that for single socket designs, there are somehow 136 PCI-Express 5.0 lanes available with a P-core 6700 series chip. The virtual memory addressing is much lower on the E-core chips, which stands to reason since these will only be used in machines with one or two sockets and not up to eight or more. The E-cores have different vector math units and only the P-core has AMX matrix units. The chart shows that there are P-core Xeon 6 chips coming that support four and eight sockets.

And that leads us to the SKU stack for Granite Rapids, which is pretty modest at a mere five different variations. Take a gander:

Singhal said in briefings ahead of the launch that Google and Amazon Web Services were getting custom Xeon 6 processors for their fleets, and we imagine others are as well.

And for comparison’s sake, here is the table for the Sierra Forest Xeon 6 SKUs, also modest at only seven different models:

And here is the monster table for the Emerald Rapids SKUs from last year:

As always, our relative performance figures are reckoned against the performance of any given model of Xeon against the “Nehalem” Xeon E5540 processor from 2009, which had four cores running at 2.53 GHz and 8 MB of L3 cache in an 80 watt thermal envelope. To reckon relative performance, we multiply the number of cores times the clock speed for each model times the cumulative increase in IPC for each generation.

Given this cumulative IPC, which we have tracked diligently expressly for this purpose, the Redwood Cove core delivers 2.42X more integer performance than the Nehalem core from fifteen years ago. That’s pretty good architectural enhancement. The number of cores with Granite Rapids has increased by a factor of 32X compared to Nehalem, but the clock speed for all those cores is down 21 percent even as the power consumed is up by a factor of 6.25X.

That’s the chip business for you.

You will notice one other important thing in the Granite Rapids table above: The prices are in bold red italics. That means Intel did not release prices for the Granite Rapids Xeon 6 chips. Which we obviously do not approve of. A price list provides a ceiling, something people can negotiate down from and at volume they most certainly do.

Nature abhors a vacuum, and so do our children, and so we have estimated the prices for the Granite Rapids chips to the best of our ability based on past Xeon SP pricing. We think these are the most expensive datacenter CPUs Intel has put out in the Xeon family. (Itanium doesn’t count, that was different.) And if you find out what the prices are, do share and we will, too.

One last thing. There is still more to come early next year, and this chart above will remind you of it.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now

Continue Reading