I have a Zen4 7940HS system where I have installed CachyOS. This is an Arch-based Linux OS with a focus on performance. In particular, the Why CachyOS? page cites an optimized scheduler as well as packages compiled for the particular architecture. For example, rather than packages compiled to run on the lowest-common-denominator architecture, CachyOS has optimized packages for different “levels” The “-v3” level enables architectures newer than Intel Haswell or AMD Excavator and the “-v4” level enables use of AVX-512.
The July 2024 release notes highlight the addition of a Zen4 optimized repository
This is our 8th release this year, and we are very proud to announce a new optimized repository. Starting with this release, we are providing a Zen4 optimized repository. This repository will be automatically used at new installation for Zen4 and Zen5 CPUs, to provide the best performance.
The znver4 target provides a bunch of extra avx512 extensions and also other instructions. Here you can find a list of the additional used instructions by the compiler compared to the x86-64-v4 target:
abm, adx, aes, avx512bf16, avx512bitalg, avx512ifma, avx512vbmi, avx512vbmi2, avx512vnni, avx512vpopctndq, clflushopt, clwb, clzero, fsgsbase, gfni, mwaitx, pclmul, pku. prfchw, rpdid, rdrnd, rdseed, sha, sse4a, vaes, vockmulqdq, wbnoinvd, savec, xsaveopt, xsaves
This seemed intriguing so I decided to try a somewhat random collection of Phoronix tests using this repository. I compare the performance running CachyOS with Zen4 vs Ubuntu 22.04. A summary table follows:
| Metric | Direction | CachyOS Zen4 | Ubuntu 22.04 | Ratio |
|---|---|---|---|---|
| coremark | higher | 410390 iterations/sec | 438579 iterations/sec | 0.936 |
| build-linux-kernel | lower | 114.035 seconds | 116.25 seconds | 1.019 |
| openssl: SHA256 | higher | 13254334897 / second | 13420827833 / second | 0.988 |
| openssl: SHA512 | higher | 4539223377 / second | 4413779170 / second | 1.028 |
| openssl: RSA4096 | higher | 5949.0 sign/s | 5713.1 sign/s | 1.041 |
| openssl: ChaCha20 | higher | 56376095723 byte/second | 55209170900 byte/second | 1.021 |
| openssl: AES-128-GCM | higher | 108245069000 byte/second | 106316322953 byte/second | 1.018 |
| openssl: AES-256-GCM | higher | 93611141897 byte/second | 91688082637 byte/second | 1.021 |
| openssl: ChaCha20-Poly1035 | higher | 40096535620 byte/second | 39278150087 byte/second | 1.021 |
| phpbench | higher | 2243593 score | 1055625 score | 2.125 |
| ospray: particle_volume/ao | higher | 3.73062 / second | 3.64172 / second | 1.024 |
| ospray: particle_volume/scivis | higher | 3.685 / second | 3.62516 / second | 1.017 |
| ospray: particle_volume/pathtracer | higher | 120.966 / second | 121.287 / second | 0.997 |
| ospray: gravity/ao | higher | 2.62208 / second | 2.79344 / second | 0.939 |
| ospray: gravity/scvis | higher | 2.66757 / second | 2.76412 / second | 0.965 |
| ospray: gravity/pathtracer | higher | 3.58901 / second | 3.45087 / second | 1.040 |
| rawtherapee | lower | 51.331 seconds | 51.189 seconds | 0.997 |
| namd: ATPase | higher | 1.30961 / day | 1.26995 / day | 1.031 |
| namd: STMV | higher | 0.39164 | 0.37621 / day | 1.048 |
The benchmarks selected were a subset of those from this Phoronix article. That article compares the various CachyOS repositories against each other while I am doing a comparison vs. Ubuntu. Overall there was a smaller increase than I expected/hoped. Some particular items I will note from the able above:
- One additional difference is that CachyOS has a new gcc version 14.1 while Ubuntu 22.04 has gcc version 11.4
- The coremark benchmark compiles the coremark code with gcc -O2. Not sure why that became slower but related to compiler?
- The build-linux-kernel particularly measures the time of the kernel itself. I was pleasantly surprised to see this faster as my guess would have been that compile speed could have slowed
- The various OpenSSL benchmarks might be specific to the underlying instructions and again nice to see them slightly faster
- Phpbench is the particular outlier with a 2x performance improvement.
- Ospray is mixed with a few benchmarks faster and a few slower.
- rawtherapee is just a slight bit slower
- namd also shows a small improvement
Overall, it is nice to have one system running CachyOS as a dynamic updated system. Occasionally it is slightly more difficult to get benchmarks to run than Ubuntu. Presumably this is because that tends to be the default choice. So I don’t expect to shift everything over to CachyOS.
