Article suggests that it's due to the relatively large difference in cache architecture.
I suspect Skymont would indeed provide double digit percentage gains given identical cache setups. However, giving Crestmont a 24 MB L3 and a 100 MHz clock speed advantage seems to be enough to cancel out Skymont’s improved architecture.
Performance-wise, Skymont seems to be at its best in high IPC workloads with a small cache footprint. For example Skymont beats Crestmont by 20.8% in 548.exchange2, a workload that fits in Zen 4’s 32 KB L1D cache.
However if a workload is really cache unfriendly, Skymont’s ability to pull more memory bandwidth can show through. I suspect that’s what happens in Y-Cruncher and 549.fotonik3d, as both are very memory bandwidth bound on other architectures. There, Skymont posts huge gains.
A long time ago, I was planning (and got some initial setup going) to use nodes in a distributed application running on a cluster of diverse architectures to benchmark performance of different types of machine to automatically look for better performance per dollar.
The elephant in the room is how long the errata list is going to be.
Especially after the recent debacle with overvolting its CPUs to self-destruction, I wish they'd focus on stability and correctness a bit more.
Comparison against Ryzen AI 300: https://www.phoronix.com/review/core-ultra-7-lunar-lake-linu...
there's never a time I've been glad an article used Excel's 3D surface plot lol
So it's a huge step over Crestmont, but in practice you can't tell?
Article suggests that it's due to the relatively large difference in cache architecture.
I suspect Skymont would indeed provide double digit percentage gains given identical cache setups. However, giving Crestmont a 24 MB L3 and a 100 MHz clock speed advantage seems to be enough to cancel out Skymont’s improved architecture.
Performance-wise, Skymont seems to be at its best in high IPC workloads with a small cache footprint. For example Skymont beats Crestmont by 20.8% in 548.exchange2, a workload that fits in Zen 4’s 32 KB L1D cache.
However if a workload is really cache unfriendly, Skymont’s ability to pull more memory bandwidth can show through. I suspect that’s what happens in Y-Cruncher and 549.fotonik3d, as both are very memory bandwidth bound on other architectures. There, Skymont posts huge gains.
A long time ago, I was planning (and got some initial setup going) to use nodes in a distributed application running on a cluster of diverse architectures to benchmark performance of different types of machine to automatically look for better performance per dollar.
Ah, sounds fun and sounds like something BOINC[1] might be good for.
Perhaps one could even analyze data submitted to existing projects to analyze performance on various platforms.
[1]: https://boinc.berkeley.edu/
If Arrow Lake has Skymont attached to the ring we'll see its full performance.