↓
 

Performance analysis, tools and experiments

An eclectic collection

  • Overview
  • Blog
  • Workloads
    • cpu2017
      • 500.perlbench_r
      • 502.gcc_r
      • 503.bwaves_r
      • 505.mcf_r
      • 507.cactuBSSN_r
      • 508.namd_r
      • 510.parest_r
      • 511.povray_r
      • 519.lbm_r
      • 520.omnetpp_r
      • 521.wrf_r
      • 523.xalancbmk_r
      • 525.x264_r
      • 526.blender_r
      • 527.cam4_r
      • 531.deepsjeng_r
      • 538.imagick_r
      • 541.leela_r
      • 544.nab_r
      • 548.exchange2_r
      • 549.fotonik3d_r
      • 554.roms_r
      • 557.xz_r
    • geekbench
    • lmbench
    • passmark
    • pbbs
    • phoronix
      • ai-benchmark
      • aircrack-ng
      • amg
      • aobench
      • aom-av1
      • apache
      • apache-iotdb
      • appleseed
      • arrayfire
      • askap
      • asmfish
      • astcenc
      • avifenc
      • basis
      • blake2
      • blogbench
      • blender
      • blosc
      • bork
      • botan
      • brl-cad
      • build-apache
      • build-clash
      • build-eigen
      • build-erlang
      • build-ffmpeg
      • build-gcc
      • build-gdb
      • build-gem5
      • build-godot
      • build-imagemagick
      • build-linux-kernel
      • build-llvm
      • build-mesa
      • build-mplayer
      • build-nodejs
      • build-php
      • build-python
      • build-wasmer
      • build2
      • bullet
      • byte
      • cachebench
      • cassandra
      • clickhouse
      • clomp
      • cloverleaf
      • cockroach
      • compilebench
      • compress-7zip
      • compress-gzip
      • compress-lz4
      • compress-pbzip2
      • compress-rar
      • compress-xz
      • compress-zstd
      • core-latency
      • coremark
      • cp2k
      • cpp-perf-bench
      • cpuminer-opt
      • crafty
      • c-ray
      • cryptopp
      • cryptsetup
      • ctx-clock
      • cython-bench
      • dacapobench
      • daphne
      • darktable
      • dav1d
      • dbench
      • deepsparse
      • deepspeech
      • dolfyn
      • draco
      • dragonflydb
      • duckdb
      • easywave
      • ebizzy
      • embree
      • encode-flac
      • encode-mp3
      • encode-opus
      • encode-wavpack
      • espeak
      • etcpak
      • faiss
      • fast-cli
      • ffmpeg
      • ffte
      • fftw
      • fhourstones
      • financebench
      • furmark
      • gcrypt
      • gegl
      • gimp
      • git
      • glibc-bench
      • gmpbench
      • gnupg
      • gnuradio
      • go-benchmark
      • gpaw
      • graph500
      • graphics-magick
      • gromacs
      • hackbench
      • hadoop
      • heffte
      • helsing
      • himeno
      • hmmer
      • hpcg
      • incompact3d
      • indigobench
      • inkscape
      • ipc-benchmark
      • java-jmh
      • java-scimark2
      • john-the-ripper
      • jpegxl
      • jpegxl-decode
      • kvazaar
      • kripke
      • lammps
      • lczero
      • libraw
      • libreoffice
      • libxsmm
      • liquid-dsp
      • llama.cpp
      • llamafile
      • lulesh
      • lzbench
      • mbw
      • memcached
      • minibude
      • minife
      • mnn
      • mpcbench
      • m-queens
      • mrbayes
      • mutex
      • namd
      • mt-dgemm
      • ncnn
      • neat
      • nettle
      • nginx
      • ngspice
      • node-octane
      • node-web-tooling
      • npb
      • n-queens
      • numpy
      • nwchem
      • oidn
      • onednn
      • octave-benchmark
      • onnx
      • opencv
      • openfoam
      • openjpeg
      • openssl
      • openradioss
      • openscad
      • openvino
      • openvkl
      • ospray
      • ospray-studio
      • palabos
      • parboil
      • pennant
      • perl-benchmark
      • pgbench
      • phpbench
      • pjsip
      • polybench-c
      • polyhedron
      • povray
      • primesieve
      • pybench
      • pyhpc
      • pyperformance
      • pytorch
      • quadray
      • qe
      • qmcpack
      • quantlib
      • quicksilver
      • ramspeed
      • rav1e
      • rawtherapee
      • rbenchmark
      • redis
      • renaissance
      • rnnoise
      • rocksdb
      • rodinia
      • rsvg
      • schbench
      • scikit-learn
      • scimark2
      • scylladb
      • securemark
      • selenium
      • simdjson
      • smallpt
      • smhasher
      • spark
      • spark-tpcds
      • speedb
      • specfem3d
      • sqlite
      • srsran
      • stargate
      • stockfish
      • stream
      • stress-ng
      • svt-av1
      • svt-hevc
      • svt-vp9
      • sudokut
      • synthmark
      • sysbench
      • tensorflow
      • tensorflow-lite
      • tesseract
      • tjbench
      • tnn
      • toybrot
      • tscp
      • ttsiod-renderer
      • tungsten
      • uvg266
      • vkpeak
      • vpxenc
      • v-ray
      • vvenc
      • webp
      • webp2
      • whisper.cpp
      • whisperfile
      • wireguard
      • x264
      • x265
      • xmrig
      • xnnpack
      • y-cruncher
      • z3
    • stream
  • Tools
    • Compilers
    • likwid
    • perf
    • trace-cmd and kernelshark
    • wspy
  • Experiments
    • Histograms
    • clustering
    • Adding summary statistics for all benchmarks
  • Home
  • Blog
  • Workloads
    • cpu2017
      • 500.perlbench_r
      • 502.gcc_r
      • 503.bwaves_r
      • 505.mcf_r
      • 507.cactuBSSN_r
      • 508.namd_r
      • 510.parest_r
      • 511.povray_r
      • 519.lbm_r
      • 520.omnetpp_r
      • 521.wrf_r
      • 523.xalancbmk_r
      • 525.x264_r
      • 526.blender_r
      • 527.cam4_r
      • 531.deepsjeng_r
      • 538.imagick_r
      • 541.leela_r
      • 544.nab_r
      • 548.exchange2_r
      • 549.fotonik3d_r
      • 554.roms_r
      • 557.xz_r
    • geekbench
    • lmbench
    • passmark
    • pbbs
    • phoronix
      • ai-benchmark
      • aircrack-ng
      • amg
      • aobench
      • aom-av1
      • apache
      • apache-iotdb
      • appleseed
      • arrayfire
      • askap
      • asmfish
      • astcenc
      • avifenc
      • b
      • basis
      • blake2
      • blender
      • blogbench
      • blosc
      • bork
      • botan
      • brl-cad
      • build-apache
      • build-clash
      • build-eigen
      • build-erlang
      • build-ffmpeg
      • build-gcc
      • build-gdb
      • build-gem5
      • build-godot
      • build-imagemagick
      • build-linux-kernel
      • build-llvm
      • build-mesa
      • build-mplayer
      • build-nodejs
      • build-php
      • build-python
      • build-wasmer
      • build2
      • bullet
      • byte
      • c-ray
      • cachebench
      • cassandra
      • clickhouse
      • clomp
      • cloverleaf
      • cockroach
      • compilebench
      • compress-7zip
      • compress-gzip
      • compress-lz4
      • compress-pbzip2
      • compress-rar
      • compress-xz
      • compress-zstd
      • core-latency
      • coremark
      • cp2k
      • cpp-perf-bench
      • cpuminer-opt
      • crafty
      • cryptopp
      • cryptsetup
      • ctx-clock
      • cython-bench
      • dacapobench
      • daphne
      • darktable
      • dav1d
      • dbench
      • deepsparse
      • deepspeech
      • dolfyn
      • draco
      • dragonflydb
      • duckdb
      • easywave
      • ebizzy
      • embree
      • encode-flac
      • encode-mp3
      • encode-opus
      • encode-wavpack
      • espeak
      • etcpak
      • faiss
      • fast-cli
      • ffmpeg
      • ffte
      • fftw
      • fhourstones
      • financebench
      • furmark
      • gcrypt
      • gegl
      • gimp
      • git
      • glibc-bench
      • gmpbench
      • gnupg
      • gnuradio
      • go-benchmark
      • gpaw
      • graph500
      • graphics-magick
      • gromacs
      • hackbench
      • hadoop
      • heffte
      • helsing
      • himeno
      • hmmer
      • hpcg
      • incompact3d
      • indigobench
      • inkscape
      • ipc-benchmark
      • java-jmh
      • java-scimark2
      • john-the-ripper
      • jpegxl
      • jpegxl-decode
      • kripke
      • kvazaar
      • lammps
      • lczero
      • libraw
      • libreoffice
      • libxsmm
      • liquid-dsp
      • llama.cpp
      • llamafile
      • lulesh
      • lzbench
      • m-queens
      • mbw
      • memcached
      • minibude
      • minife
      • mnn
      • mpcbench
      • mrbayes
      • mt-dgemm
      • mutex
      • n-queens
      • namd
      • ncnn
      • neat
      • nettle
      • nginx
      • ngspice
      • node-octane
      • node-web-tooling
      • npb
      • numpy
      • nwchem
      • octave-benchmark
      • oidn
      • onednn
      • onnx
      • opencv
      • openfoam
      • openjpeg
      • openradioss
      • openscad
      • openssl
      • openvino
      • openvkl
      • ospray
      • ospray-studio
      • palabos
      • parboil
      • pennant
      • perl-benchmark
      • pgbench
      • phpbench
      • pjsip
      • polybench-c
      • polyhedron
      • povray
      • primesieve
      • pybench
      • pyhpc
      • pyperformance
      • pytorch
      • qe
      • qmcpack
      • quadray
      • quantlib
      • quicksilver
      • ramspeed
      • rav1e
      • rawtherapee
      • rays1bench
      • rbenchmark
      • redis
      • renaissance
      • rnnoise
      • rocksdb
      • rodinia
      • rsvg
      • schbench
      • scikit-learn
      • scimark2
      • scylladb
      • securemark
      • selenium
      • simdjson
      • smallpt
      • smhasher
      • spark
      • spark-tpcds
      • specfem3d
      • speedb
      • sqlite
      • srsran
      • stargate
      • stockfish
      • stream
      • stress-ng
      • sudokut
      • svt-av1
      • svt-hevc
      • svt-vp9
      • synthmark
      • sysbench
      • tensorflow
      • tensorflow-lite
      • tesseract
      • tjbench
      • tnn
      • toybrot
      • tscp
      • ttsiod-renderer
      • tungsten
      • uvg266
      • v-ray
      • vkpeak
      • vpxenc
      • vvenc
      • webp
      • webp2
      • whisper.cpp
      • whisperfile
      • wireguard
      • x264
      • x265
      • xmrig
      • xnnpack
      • y-cruncher
      • z3
    • stream
  • Tools
    • Compilers
    • likwid
    • perf
    • trace-cmd and kernelshark
    • wspy
  • Experiments
Home→Tags phoronix

Tag Archives: phoronix

phoronix – Ryzen AI HX 370 vs Ryzen 7840 HS

Performance analysis, tools and experiments Posted on October 10, 2024 by mevOctober 18, 2024

As a follow up comparison of Ryzen AI HX 370 processor compared to Ryzen 7840 HS, this posting looks at some Phoronix benchmarks.

I’ve run more than 200 Phoronix benchmarks in analysis using performance counters. I use these clusters to guide the benchmarks chosen trying to pick one from each cluster. In some cases where the benchmark didn’t easily run on Ubuntu 24.04, I skipped to another benchmark rather than debug the original issue. A cluster list from September 2024 below:

Cluster 0 (10 entries): 505.mcf_r 531.deepsjeng_r appleseed asmfish avifenc blender ospray primesieve stockfish v-ray
Cluster 1 (3 entries): 520.omnetpp_r amg compress-xz
Cluster 2 (7 entries): 500.perlbench_r 525.x264_r 544.nab_r brl-cad quicksilver smallpt vvenc
Cluster 3 (10 entries): aom-av1 cp2k neat openradioss qmcpack srsran svt-av1 vpxenc x264 x265
Cluster 4 (14 entries): 538.imagick_r 548.exchange2_r astcenc basis coremark cpuminer-opt kvazaar mrbayes quantlib rav1e rays1bench toybrot uvg266 webp2
Cluster 5 (10 entries): blake2 build-apache build-clash build-eigen octave-benchmark openjpeg selenium tscp tungsten vkpeak
Cluster 6 (19 entries): build2 build-ffmpeg build-gcc build-gdb build-gem5 build-godot build-linux-kernel build-llvm build-mesa build-mplayer build-nodejs build-\
wasmer hackbench helsing mnn rocksdb scylladb speedb stress-ng
Cluster 7 (5 entries): bork byte openscad phpbench sudokut
Cluster 8 (10 entries): aobench compress-lz4 crafty fhourstones git gnupg lammps lzbench tjbench webp
Cluster 9 (11 entries): apache-iotdb compress-zstd core-latency cpp-perf-bench draco encode-wavpack fftw jpegxl polybench-c polyhedron z3
Cluster 10 (10 entries): botan cachebench cryptsetup gcrypt glibc-bench gnuradio java-scimark2 nettle simdjson synthmark
Cluster 11 (7 entries): duckdb inkscape libreoffice node-web-tooling numpy perl-benchmark rsvg
Cluster 12 (15 entries): bullet compress-pbzip2 cython-bench encode-flac encode-mp3 encode-opus etcpak ffmpeg hmmer libraw node-octane pyperformance rnnoise scim\
ark2 stargate
Cluster 13 (7 entries): build-python compress-gzip compress-rar dacapobench gegl hadoop spark-tpcds
Cluster 14 (10 entries): 508.namd_r 511.povray_r aircrack-ng c-ray graphics-magick java-jmh namd povray rodinia svt-hevc
Cluster 15 (7 entries): askap hpcg incompact3d onnx parboil pytorch whisperfile
Cluster 16 (15 entries): 503.bwaves_r 507.cactuBSSN_r 510.parest_r 519.lbm_r 521.wrf_r 549.fotonik3d_r 554.roms_r cloverleaf easywave kripke mt-dgemm ncnn stream\
 tensorflow xmrig
Cluster 17 (8 entries): darktable deepsparse ffte llama.cpp llamafile npb openfoam palabos
Cluster 18 (4 entries): 541.leela_r compress-7zip m-queens n-queens
Cluster 19 (7 entries): clomp deepspeech heffte himeno lulesh ngspice ramspeed
Cluster 20 (11 entries): 523.xalancbmk_r ai-benchmark libxsmm minibude oidn onednn openvino quadray tensorflow-lite xnnpack y-cruncher
Cluster 21 (4 entries): 502.gcc_r 527.cam4_r ebizzy faiss
Cluster 22 (5 entries): blosc dragonflydb mbw minife pjsip
Cluster 23 (4 entries): john-the-ripper openssl qe specfem3d
Cluster 24 (6 entries): arrayfire build-erlang build-imagemagick build-php nginx rbenchmark
Cluster 25 (11 entries): cryptopp dolfyn espeak gmpbench mpcbench mutex nwchem pybench securemark smhasher spark
Cluster 26 (12 entries): apache cassandra cockroach compilebench ctx-clock dbench fast-cli ipc-benchmark memcached pgbench sqlite wireguard
Cluster 27 (11 entries): clickhouse daphne dav1d gimp indigobench jpegxl-decode opencv pyhpc renaissance schbench svt-vp9
Cluster 28 (9 entries): 526.blender_r 557.xz_r embree graph500 lczero openvkl ospray-studio sysbench ttsiod-renderer
Cluster 29 (8 entries): financebench gpaw gromacs liquid-dsp pennant rawtherapee tnn whisper.cpp

Following is a summary of the benchmarks followed by some observations

clusterbenchmarkmetric ratio7840 metrichx 370 metric7840 on cpuhx 370 on cpu7840 retirehx 370 retire7840 frontendhx 370 frontend7840 backendhx 370 backend7840 speculationhx 370 speculation
0ospray1.583.87314 / second6.07719 /sec14.4621.2829.3%30.7%27.3%11.8%41.1%54.2%2.3%2.4%
1compress-xz0.9628.665 seconds29.736 seconds11.0412.458.2%7.3%10.2%17.3%76.5%68.2%5.1%7.1%
2quicksilver1.4112610000 fom1776333 fom15.3819.9%49.8%15.9%6.9%15.9%38.9%59.5%4.4%2.7%
3x2651.6513.79 frames/second22.81 frames/sec7.7211.6235.0%26.9%14.3%22.5%48.0%47.4%2.7%3.0%
4coremark1.37411227 iterations/sec561065 iterations/sec11.9814.4345.7%37.0%39.7%42.0%14.2%20.2%0.3%0.8%
5build-eigen0.7763.356 seconds82.516 seconds0.930.9425.2%20.4%50.5%52.8%18.6%21.9%5.6%4.8%
6build-gcc1.061038.166 seconds976.243 seconds9.9810.9124.1%18.3%51.5%60.0%19.7%18.2%4.7%3.1%
7phpbench0.771159425 score900908 score0.800.8361.2%48.6%23.0%30.1%15.0%20.1%0.8%1.1%
8lzbench0.58192 MB/s111 MB/s0.800.8234.1%22.7%26.3%36.5%21.5%21.2%18.1%19.4%
9compress-zstd1.011534.8 MB/s1556.6 MB/s4.233.4521.4%18.3%9.5%17.8%62.8%55.7%6.3%0.2%
10simdjson0.795.58 GB/s4.41 GB/s0.930.9450.4%42.7%13.1%27.0%33.2%28.0%3.3%1.5%
11perl-benchmark0.780.068363375 seconds0.08713901 seconds0.930.9243.0%35.5%41.8%41.7%11.1%18.0%4.2%4.6%
12ffmpeg0.99252.66 fps251.11 fps3.672.6132.3%29.1%18.4%30.3%29.0%33.8%5.6%6.7%
13compress-gzip0.6928.116 seconds40.597 seconds0.960.9519.9%15.1%26.4%29.1%42.0%43.0%11.7%12.7%
14povray1.3438.681 seconds28.778 seconds13.3218.8331.8%40.1%3.5%16.3%25.5%41.5%1.3%2.0%
15whisperfile1.1154.13398 seconds48.57337 seconds7.4410.8120.0%15.2%2.2%15.5%77.3%68.9%0.3%0.3%
16easywave1.268.809 seconds7.005 seconds14.6020.534.5%4.8%3.1%15.1%83.6%74.6%0.1%0.1%
17darktable1.345.711 seconds4.267 seconds3.425.5027.9%19.1%7.2%15.2%63.5%60.9%1.3%1.0%
18compress-7zip1.0176676 MIPS77409 MIPS12.0317.2721.5%13.0%38.6%53.5%29.1%19.7%10.8%13.8%
19himeno1.074447 MFLOPS4769 MFLOPS0.910.9126.4%33.3%2.5%2.7%71.0%63.7%0.2%0.3%
20minibude1.36537.395 GFinst/s733.427 GFInst/s15.3620.5119.8%18.7%0.3%1.6%79.8%79.0%0.1%0.4%
21ebizzy0.18774839 records/s140179 records/s12.8719.827.3%0.6%35.3%63.1%57.3%36.3%0.0%0.0%
22pjsip0.794613 response/sec3665 response/sec2.402.2312.2%11.3%38.4%33.9%48.4%51.3%1.1%1.1%
23openssl1.6315219867520 bytes/s17696663040 bytes/s15.5123.2546.5%33.4%4.9%13.3%48.7%53.2%0.0%0.0%
24build-php1.1667.052 seconds65.354 seconds8.3010.2020.8%15.1%50.4%57.0%24.8%24.1%3.9%3.4%
25pybench0.84554 ms663 ms0.750.7970.1%63.9%15.9%17.0%11.4%17.0%2.6%2.1%
26dbench3.74687.037 MB/s2573 MB/s1.052.0619.4%22.2%70.0%38.3%9.9%37.5%0.7%0.9%
27indigobench1.402.090 samples/sec2.917 samples/sec14.1421.2525.8%19.9%14.8%29.3%54.0%44.9%5.4%5.4%
28lczero1.41108 nodes/sec152 nodes/sec13.2318.3416.8%14.3%4.4%3.8%78.7%81.6%0.1%0.1%
29rawtherapee1.0554.194 seconds51.600 seconds7.7110.1929.0%18.5%12.6%27.1%57.0%44.8%1.5%1.3%

The first observation is most all single-threaded benchmarks run faster on the 7840 than on the Strix 370. In contrast the largest differences are among those with largest number of “on_cpu” threads.

There are two outliers that deserve a second look:

  • ebizzy is over 5x faster on 7840 than hx 370. This benchmark runs quickly so need to make sure it is running correctly in both instances. I don’t see these ratios in the two SPEC CPU2017 benchmarks also part of this group.
  • dbench runs over 3x faster on hx370 than 7840. The on_cpu is almost twice. Again useful to understand if there is another influence affecting this benchmark. Perhaps this one testing something else.
Posted in experiment | Tagged 7840HS, performance counters, phoronix, Ryzen AI 9 HX 370 | Leave a reply

Ryzen AI, Zen5, article and laptop

Performance analysis, tools and experiments Posted on July 28, 2024 by mevJuly 28, 2024

Zen5 mobile processors have been released.

I had ordered an ASUS Zenbook S16 laptop with Ryzen 9 AI 365 processor and it arrived today. Full tech specifications are at the link but include:

  • Ryzen AI 365 processor with 10 cores: four Zen 5 and six Zen 5c, 20 threads, base clock=2.0 GHz, boost clock = 5 GHz, 10MB of L2 and 24 MB of L3.
  • Navi 3.5 integrated graphics
  • 24 GB of DDR5 memory
  • 1 TB NVMe

So far I have only run Windows 11 Home and not tried to install Linux. Part of the reason why is I also want to try the “AI PC” features and hence am cautious since changing to Linux would be mostly a one-way proposition and the 1 TB disk is not particularly large for a dual boot.

I did try three “lighter” variations to run some Linux workloads:

  • Oracle VirtualBox can install Ubuntu 24.04 but I seem to have strange hangs with what I tried.
  • VMWare Player downloads seem to be down pointing to a Cloudflare page. I can try this later.
  • WSL does install with Ubuntu 24.04. I was able to install phoronix test suite and got 413542 a score.

I haven’t yet done a lot of other testing. Some of this might involve running a more complete Linux install but I’ll wait a little to see my options (e.g. a mini-PC, this laptop, etc).

At the same time my laptop arrived they also appear to have arrived for Phoronix who did an article with Linux-based testing of an ASUS laptop. Their tests used the Ryzen AI 9 370 processor so 12 cores instead of 10 cores. The article is here.

A few interesting things I will note from the Phoronix tests and my own testing

  • Coremark scores
    • Ryzen AI 365 (my laptop with WSL) – 413542
    • Ryzen 7840 HS (my mini-PC) – 464076
    • Ryzen AI 370 (Phoronix laptop) – 426538
    • Ryzen 7840 HS (Phoronix laptop) – 443276
  • Overall, the Phoronix article picks a set of benchmarks where their Ryzen AI 370 is ~10% faster than their Ryzen 7840HS

By itself, my interest is more in exploring unique features of the Zen5 vs Zen5c cores and in seeing how the topdown core performance varies for different subsystems like branch prediction or execution units. In addition, understanding how the extra two (365) or four (370) cores contribute to workloads. So, I won’t necessarily run that many other workloads since they appear at least as dependent on other factors (e.g. power, memory speed) and are more a system-level than core-level comparison. Nevertheless still interesting to see this data start to come out.

Also note there is an AnandTech Review as well. Full review at the link, but some things that caught my attention

  • The Zenbook S16 is configured to run the Ryzen 9 AI HX 370 at just 17W, so this explains in part differences with the mini-PC running at higher power. “ASUS has taken what’s nominally a 28W chip and dialed it down to 17W for it’s out-of-the-box experience”. There are other modes that consume more power with higher TDP. Testing was done at 28W.
  • The mobile chips have a 256-bit SIMD so expect AVX-512 codes to run faster on desktop than equivalent mobile processors
  • Core-to-Core latencies are printed, looks like a useful tool…also rather intriguing differences they show with Ryzen 9 7940HS.
  • They run specrate with WSL but run with 1 core. This looks potentially interesting way to probe different core types.

Posted in hardware | Tagged phoronix, Ryzen AI 365, wsl, Zen5 | Leave a reply

CachyOS optimized packages

Performance analysis, tools and experiments Posted on July 15, 2024 by mevJuly 15, 2024

I have a Zen4 7940HS system where I have installed CachyOS. This is an Arch-based Linux OS with a focus on performance. In particular, the Why CachyOS? page cites an optimized scheduler as well as packages compiled for the particular architecture. For example, rather than packages compiled to run on the lowest-common-denominator architecture, CachyOS has optimized packages for different “levels” The “-v3” level enables architectures newer than Intel Haswell or AMD Excavator and the “-v4” level enables use of AVX-512.

The July 2024 release notes highlight the addition of a Zen4 optimized repository

This is our 8th release this year, and we are very proud to announce a new optimized repository. Starting with this release, we are providing a Zen4 optimized repository. This repository will be automatically used at new installation for Zen4 and Zen5 CPUs, to provide the best performance.

The znver4 target provides a bunch of extra avx512 extensions and also other instructions. Here you can find a list of the additional used instructions by the compiler compared to the x86-64-v4 target: abm, adx, aes, avx512bf16, avx512bitalg, avx512ifma, avx512vbmi, avx512vbmi2, avx512vnni, avx512vpopctndq, clflushopt, clwb, clzero, fsgsbase, gfni, mwaitx, pclmul, pku. prfchw, rpdid, rdrnd, rdseed, sha, sse4a, vaes, vockmulqdq, wbnoinvd, savec, xsaveopt, xsaves

This seemed intriguing so I decided to try a somewhat random collection of Phoronix tests using this repository. I compare the performance running CachyOS with Zen4 vs Ubuntu 22.04. A summary table follows:

MetricDirectionCachyOS Zen4Ubuntu 22.04Ratio
coremarkhigher410390 iterations/sec438579 iterations/sec0.936
build-linux-kernellower114.035 seconds116.25 seconds1.019
openssl: SHA256higher13254334897 / second13420827833 / second0.988
openssl: SHA512higher4539223377 / second4413779170 / second1.028
openssl: RSA4096higher5949.0 sign/s5713.1 sign/s1.041
openssl: ChaCha20higher56376095723 byte/second55209170900 byte/second1.021
openssl: AES-128-GCMhigher108245069000 byte/second106316322953 byte/second1.018
openssl: AES-256-GCMhigher93611141897 byte/second91688082637 byte/second1.021
openssl: ChaCha20-Poly1035higher40096535620 byte/second39278150087 byte/second1.021
phpbenchhigher2243593 score1055625 score2.125
ospray: particle_volume/aohigher3.73062 / second3.64172 / second1.024
ospray: particle_volume/scivishigher3.685 / second3.62516 / second1.017
ospray: particle_volume/pathtracerhigher120.966 / second121.287 / second0.997
ospray: gravity/aohigher2.62208 / second2.79344 / second0.939
ospray: gravity/scvishigher2.66757 / second2.76412 / second0.965
ospray: gravity/pathtracerhigher3.58901 / second3.45087 / second1.040
rawtherapeelower51.331 seconds51.189 seconds0.997
namd: ATPasehigher1.30961 / day1.26995 / day1.031
namd: STMVhigher0.391640.37621 / day1.048

The benchmarks selected were a subset of those from this Phoronix article. That article compares the various CachyOS repositories against each other while I am doing a comparison vs. Ubuntu. Overall there was a smaller increase than I expected/hoped. Some particular items I will note from the able above:

  • One additional difference is that CachyOS has a new gcc version 14.1 while Ubuntu 22.04 has gcc version 11.4
  • The coremark benchmark compiles the coremark code with gcc -O2. Not sure why that became slower but related to compiler?
  • The build-linux-kernel particularly measures the time of the kernel itself. I was pleasantly surprised to see this faster as my guess would have been that compile speed could have slowed
  • The various OpenSSL benchmarks might be specific to the underlying instructions and again nice to see them slightly faster
  • Phpbench is the particular outlier with a 2x performance improvement.
  • Ospray is mixed with a few benchmarks faster and a few slower.
  • rawtherapee is just a slight bit slower
  • namd also shows a small improvement

Overall, it is nice to have one system running CachyOS as a dynamic updated system. Occasionally it is slightly more difficult to get benchmarks to run than Ubuntu. Presumably this is because that tends to be the default choice. So I don’t expect to shift everything over to CachyOS.

Posted in experiment | Tagged benchmarks, cachyos, phoronix | Leave a reply

graphics-magick sharpen, compiler improvements

Performance analysis, tools and experiments Posted on March 19, 2024 by mevMarch 19, 2024

The following Phoronix Article – https://www.phoronix.com/review/nvidia-gh200-compilers compares GCC 13.2 with Clang 17.0.2 on an ARM platform. On the discussions attached the improvement for graphics-magick sharpen benchmark particularly stand out. So I thought I would see if I could see a … Continue reading →

Posted in experiment | Tagged benchmarks, compiler, phoronix | Leave a reply

200 phoronix tests

Performance analysis, tools and experiments Posted on March 4, 2024 by mevMarch 4, 2024

I passed over 200 Phoronix tests added. There were a little over 10 benchmark articles in February. I seem to have most all the benchmarks when an article comes out and only needed to add one or two for some … Continue reading →

Posted in experiment | Tagged benchmarks, phoronix | Leave a reply

phoronix – January 2024

Performance analysis, tools and experiments Posted on February 2, 2024 by mevFebruary 2, 2024

Phoronix has published its roundup of benchmark/performance/review articles – https://www.phoronix.com/news/January-2024-Highlights Included were 10 articles with reviews and benchmarks. I’ve been keeping up with CPU workloads listed and now >130 workloads total. I haven’t added GPU/graphics tests because I haven’t developed … Continue reading →

Posted in experiment, website | Tagged benchmarks, phoronix | Leave a reply

50 phoronix workloads…

Performance analysis, tools and experiments Posted on January 14, 2024 by mevJanuary 14, 2024

I am now up to 50 phoronix workloads as summarized on the workloads page. For each one I have a graph and some pages of information. My general idea is to take the benchmark-based Phoronix articles and see if I … Continue reading →

Posted in experiment | Tagged benchmarks, phoronix | Leave a reply

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Archives

  • November 2024
  • October 2024
  • September 2024
  • July 2024
  • June 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • February 2023

Tags

7840HS bad data benchmarks cachyos cluster compiler coremark cpu2017 data fabric getrusage gnuplot i5-13500H icache ipc kernel l3 metrics namd opcache perf performance counters perf_event_open phoronix Ryzen AI 9 HX 370 Ryzen AI 365 scaling stream threshold topdown tree virtualization website wsl Zen5

Recent Posts

  • Virtualization comparisons
  • Updating to a new kernel and graphics driver
  • SPEC CPU2017 Ryzen AI HX 370 vs. Ryzen 7840 HS
  • phoronix – Ryzen AI HX 370 vs Ryzen 7840 HS
  • New Ryzen AI 9 HX 370 machine
©2026 - Performance analysis, tools and experiments - Weaver Xtreme Theme
↑