AMD 3D V-Cache Benchmarks Show Mixed Results For Milan-X CPUs
Yesterday, tech review site Chips and Cheese released a review of AMD’s new 3D V-Cache technology, showcasing its performance compared to previous generations of Zen processors. Chips and Cheese chose AMD’s EPYC server processors for this role, including the 3D V-Cache enabled EPYC 7V73 (Milan-X) and vanilla Zen 3 EPYC 7763 (Milan).
3D V-Cache is a new technology that AMD developed, allowing the L3 cache to be stacked vertically, which can significantly increase the size of the cache while using very little space. AMD has already demonstrated impressive performance gains with the new technology, as it allows the CPU cores to be fed with more information consistently.
When comparing Zen 3 alone, with and without 3D V-Cache, Chips and Cheese noticed that the EPYC 7V73 with 3D V-Cache performed just a hair worse than the vanilla Zen 3 EPYC 7763 when the test didn’t use more L3 cache than what the 7763 had to offer. The latency difference was three to four cycles, which was a necessary trade-off due to the 3D V-Cache.
However, once the cache on the 7763 got filled up, the 7V73, with its monstrous cache size, allowed the chip to have significantly less latency than the 7763 until that 3D V-Cache gets full. Interestingly, the 7V73 also had slightly less memory latency than the 7763.
When adding Zen 1 and Zen 2 EPYC chips — like the 7551 and 7452 into the mix, we saw an even better picture of how well-engineered AMD’s 3D V-Cache chips genuinely are. Chips and Cheese noted that the L3 cache’s set to count from Zen 1 to Zen 2 costs an additional latency of about five cycles. Then the move to unify the dual 16MB chunks of L3 cache on Zen 3 from Zen 2 added an even higher seven to eight cycles of latency.
Meanwhile, AMD’s move from Zen 3 to Zen 3 3D V-Cache, and tripling the size of the L3 cache costs just three to four cycles of latency, which is the most negligible penalty we’ve seen so far.
Chips and Cheese’s chart showed that all Zen generations had near-identical L1 and L2 cache latency. Still, when it came to the L3 cache, the latency decreased as L3 cache usage increased between generations grew larger and larger, especially with Zen 3 to Zen 3 with 3D V-Cache.
Bandwidth
In the bandwidth results, Chips and Cheese discovered that AMD’s 7V73X 3D V-Cache only received about a 25% bytes per cycle increase in the single-threaded cache bandwidth test.
However, Chips and Cheese believed there might be a clock speed decrease once the CPU hits bigger workloads that take advantage of the L3 cache, indeed explaining the difference.
Another strange phenomenon occurred with the 7V73X, wherein the single CCD cache bandwidth test shows the 3D V-Cache chip has a slight deficit in bandwidth compared to the standard 7763, of about 12.5%. Chips and Cheese suspected this is to keep power in check due to the 64 cores loaded up on both chips. It makes a lot of sense as 3D V-Cache does take up more space and requires slightly more power, making CPU cooling somewhat more complex.
Interestingly, this same phenomenon also happened on AMD’s EPYC 7452 chip based on the Zen 2 microarchitecture. The EPYC 7763 Zen 3 CPU was the only chip that performed equally in both the single CCD bandwidth test and single-threaded bandwidth test.
For those wondering about Zen 1, cache bandwidth didn’t even come close to its Zen 2 and Zen 3 counterparts; the EPYC 7551 tested came with less than half the bandwidth for a vast majority of the test. It wasn’t until the middle and ending stages of the test that it got even close to catching up.
Conclusion
So what does all this data mean in terms of real-world performance? Chips and Cheese ran several benchmarks, including Gem5, libx264 4K Transcoding, 7-Zip, and more. Only in Gem5 did 3D V-Cache make a significant difference to performance. The rest was lackluster and barely noticeable, with around 5% of a performance benefit in favor of the 3D V-Cache chip.
Chips and Cheese’s preliminary results suggest that 3D V-Cache’s impact isn’t as significant as AMD has already foretold it would have. However, it’ll require more in-depth testing to pass judgment. Also, we can’t forget that this is 3D V-Cache on AMD’s EYPC server processors, so 3D V-Cache’s behavior on its consumer counterparts may vary.
For one, the 7V73X is a monster chip with a whopping 64 cores, so the chip is sensitive to thermal and power output and will throttle CPU cores quickly if needed. It’s adjusted by adding cache, adding more power and heat dissipation requirements to the CPU.
Another is server workloads, which traditionally can be more compute-heavy than latency-sensitive due to their nature. 3D V-Cache will only prove helpful if the cores are not the bottleneck, and not running threads that take a significant amount of time to process.
In the consumer space, we see chips with substantially fewer core counts, which lowers the power requirements and allows the cores to stretch their legs with minor clock speed deviation. It isn’t a problem with competent cooling systems and motherboards packing robust power delivery solutions with far more headroom than the CPU will ever require in the PC DIY space.
Apps in the consumer space are generally far less compute-intensive, making cache latency play a more critical role. This is very true in video games, where CPUs are rarely loaded up to 100%, but having lower latency means pre-rendered frames make it to the GPU faster, decreasing input lag and increasing frame rates.