AMD launched its fifth-gen EPYC ‘Turin’ processors here in San Francisco at its Advancing AI 2024 event, whipping the covers off the deep-dive details of its new Zen 5-powered server CPU family for enterprise, AI, and cloud use cases. We also ran some of our own benchmarks in preparation for our review but decided to share a preview of the impressive results below.
AMD has unified its standard scale-up optimized models with full-fat Zen 5 cores and its scale-out optimized models with dense Zen 5c cores into one stack that flies under the EPYC 9005 Turin banner, making several impressive performance claims against Intel’s competing Xeon processors.
AMD claims that its flagship 192-core EPYC 9965 is 2.7X faster than Intel’s competing flagship Platinum 8952+, with notable speed-up claims including 4X faster video transcoding, 3.9X faster performance in HPC applications, and up to 1.6X the performance per core in virtualized environments. AMD also announced its new high-frequency 5GHz EPYC 9575F, which it claims is up to 28% faster than Zen 4 EPYC models when used to accelerate AI GPU workloads.
We’ll break down the product stack and features and then work our way to the benchmarks.
Specs and Pricing
Fifth-gen EPYC ‘Turin’ 9005 Series specs and pricing
Notably, AMD isn’t introducing its X-series models with stacked L3 cache for this generation, instead relying upon its Milan-X lineup for now. AMD says its X-series might get an upgrade every other generation, though that currently remains under consideration.
AMD’s new series scales from eight cores up to the $14,813 192-core / 384-thread EPYC 9965, a 500W behemoth that leverages TSMC’s 3nm node for the ultimate in compute density with dense Zen 5c cores. AMD also has five other Zen 5c-powered models that scale well for high-density applications with 96, 128, 144 and 160-core models.
There are standard models as well, with Zen 5 cores fabbed on the 4nm node that top out at 128 cores and 256 threads with the $12,984 EPYC 9755. This stack has a total of 22 models that begin at a mere eight cores — a new small-core level for AMD that it created in response to customer demand. AMD also has four single-socket “P” series models interspersed throughout its product stack.
AMD’s standard Zen 5 lineup now includes new high-frequency SKUs that top out at 5.0 GHz, a new high watermark for AMD’s data center CPU lineup that will maximize performance in GPU orchestration workloads. AMD has a total of five F-series models for various levels of performance and core counts.
Fifth-gen EPYC ‘Turin’ 9005 Series features
The standard Zen 5 models employ up to 16 4nm CCDs (Core Compute Dies, aka chiplets. These are paired with a large central I/O die, and each CCD offers up eight CPU cores. The Zen 5c models employ up to 12 3nm CCDs with 16 Zen 5c cores per chiplet paired with the same I/O die.
AMD claims a 17% increase in IPC for the RPYC 9005 series, born of the new Zen 5 architecture. Zen 5 also brings the notable addition of full 512b datapath support for AVX-512, though users have the option to also run the chips in a ‘double-pumped’ AVX-512 mode that issues 512b instructions as two sets of 256b, thus lowering power requirements and improving efficiency in some workloads.
With the exception of the flagship 192-core model, all Turin processors can drop into existing server platforms with the SP5 socket. The 192-core model also drops into the SP5 socket, but it requires special power accommodations, so newer motherboards are needed for that top-end model. This marks the second generation of SP5-compatible EPYC chips, with the previous generation Genoa also utilizing the platform.
This meshes well with AMD’s strategy to speed time to market and reduce upgrade friction for its customers and OEM partners. For reference, the first three generations (Naples, Milan and Rome) all utilized a common platform as well.
TDPs span from 155W to 500W, with the highest-power models often utilizing new dense watercoolers that resemble standard AIO coolers — the radiator is integrated inside the chassis, as pictured in our sample Turin server above (we have a review in progress).
The Turin family is only available with 12 channels of DDR5 memory support, with up to 12TB of memory capacity per server (6TB per socket). AMD originally spec’d Turin at DDR5-6000 but has now increased that to DDR5-6400 for qualified platforms. AMD’s platform only supports 1 DIMM per Channel (DPC).
Each CPU hosts 128 PCIe 5.0 lanes for single-socket servers, and 160 PCIe lanes in a dual-socket configuration. AMD also supports CXL 2.0 (caveats apply).
Our Turin vs Intel Granite Rapids Benchmarks
Our EPYC Turin versus Intel Granite Rapids benchmarks
We ran short on time before we traveled here to the event, so we didn’t have time to finish all of our tests — the testing for the 192-core model isn’t yet done. However, we have plenty of our own results we can share below before our review, which will be posted with the complete results in the coming days. We also have the full benchmarks of the 128-core Turin EPYC 9755 and 64-core EPYC 9575F.
We’re presenting our benchmarks without analysis as this was a last-minute addition, but stay tuned for more — I’ll also add the missing 192-core entries here once that testing is complete. Here’s a preview of our results in key areas, but we also have the analysis of AMD’s benchmark numbers further below.
AI Benchmarks
AI Benchmarks
Note that these AI benchmarks are without dedicated optimizations that could improve performance for both lineups. Also, many of these are not AMX-optimized (Advanced Matrix eXtensions), which would be advantageous to Intel in some tests.
The ever-changing landscape of the explosively expanding AI universe makes it exceedingly challenging to characterize performance in such a way that it is meaningful to the average data center application. Additionally, batch sizes and other tested parameters will vary in real deployments.
As such, take these benchmarks as a mere guide — these tests are not optimized to the level we would expect in actual deployments. Conversely, some data centers and enterprises will employ off-the-shelf AI models with a little tuning, so while the litmus of general performance is applicable, the models employed, and thus the relative positioning of the contenders, will vary accordingly.
HPC and Scalability Benchmarks
HPC and numerous Scalability Benchmarks
Compilation Benchmarks
Compilation Benchmarks
Rendering Benchmarks
Rendering Benchmarks
Encoding Benchmarks
Encoding Benchmarks
AMD’s Fifth-gen EPYC ‘Turin’ 9005 Series general-purpose benchmarks
AMD shared a series of benchmarks to solidify its performance claims, but as with all vendor-provided benchmarks, you should look for third-party verification. As you saw above, we are currently testing an EPYC Turin server, so stay tuned for our full suite of benchmarks and analysis.
We’ve included AMD’s test notes in an album at the end of the article. AMD made all of its comparisons against Intel’s fifth-gen Xeon, though Intel recently began shipping its Xeon 6 ‘Granite Rapids’ lineup. AMD says it hasn’t been able to secure those systems for testing yet, so keep in mind that these benchmarks aren’t against Intel’s current flagship.
AMD claims a new world record in the industry-standard SPEC CPU 2017 integer throughput benchmark with the EPYC 9965, with a 2.7X advantage over Intel’s fifth-gen flagship. AMD also claims a 1.4X advantage in per-core performance, which is key to effectively utilizing expensive software licenses that often cost more than the CPU itself — a core value proposition for AMD’s Turin. In fact, AMD claims 60% more performance at the same licensing costs.
Naturally, AMD also included a spate of benchmarks in general compute workloads like video transcoding, business apps, database, and rendering workloads, with a 4X, 2.3X, 3.9X, and 3X advantage over fifth-gen Xeon, respectively. AMD also provided plenty of HPC benchmarks that you can see in the above album.
AMD’s Fifth-gen EPYC ‘Turin’ 9005 Series AI benchmarks
AMD shared plenty of benchmarks to back up its assertion that Turin is the best choice for the full range of AI workloads, with those workloads falling into three different buckets.
In AI inference workloads that fully saturate the CPU, Intel has held a distinct advantage in CPU inference workloads that leverage its AMX instructions. However, AMD claims that Turin changes that equation with up to 3.0X to 3.8X faster AI inference with its 192-core EPYC 9965 in a range of AI workloads.
Many AI implementations rely upon the CPU to orchestrate the GPU AI workloads, thus pushing the GPUs along as they handle the heavy inference and training work. AMD claims advantages ranging from 1.08x to 1.2x with its new high-frequency 5GHz EPYC 9575F. AMD shared a list of Nvidia recommendations for pairings with its HGX and MGX systems and optimum pairings for its own MI300X systems.
AMD also argues that Intel’s AMX advantage only applies to 100% saturated AI throughput workloads, but AMD opines that most AI workloads occur in mixed environments where general-purpose compute workloads are also active. Here, AMD claims advantages in a range of mixed general-purpose and AI compute workloads running concurrently on the CPU, with a claimed doubling of performance per dollar over Intel’s fifth-gen Zeon.
Conclusion
Conclusion and thoughts
AMD notes that many of its customers are keeping their existing servers for longer periods of time now, with some keeping servers deployed for five or even six years. However, the company points out that you can take 1000 older Xeon Platinum 8280 servers and consolidate that down to 131 Turin servers, yielding up to 68% less power consumption with up to 87% fewer servers.
AMD started with roughly two percent of the datacenter revenue market share back in 2017 when it launched the first-gen EPYC Naples chips, but it has now expanded to an impressive 34% of the revenue share throughout the first half of the year on the strength of its fourth-gen Genoa ($2.8 billion last quarter alone). Much of that success comes from not only performance and pricing advantages, but also from on-time predictable execution, a mantra that AMD has repeatedly incessantly since the first-gen launch.
AMD says the intervening years have found it offering six times more cores and 11 times more performance over its first-gen Milan, which Turin naturally adds to. It also touts its double-digit IPC increase (~14%) with each successive generation.
Those generational improvements have built up to an exceedingly impressive lineup for Turin. As you can see above, we have both the Granite Ridge and Turin systems in-house and will share our further analysis of our own test results in our review soon. Below are the test notes from AMD’s slides.