↓
 

Performance analysis, tools and experiments

An eclectic collection

  • Overview
  • Blog
  • Workloads
    • cpu2017
      • 500.perlbench_r
      • 502.gcc_r
      • 503.bwaves_r
      • 505.mcf_r
      • 507.cactuBSSN_r
      • 508.namd_r
      • 510.parest_r
      • 511.povray_r
      • 519.lbm_r
      • 520.omnetpp_r
      • 521.wrf_r
      • 523.xalancbmk_r
      • 525.x264_r
      • 526.blender_r
      • 527.cam4_r
      • 531.deepsjeng_r
      • 538.imagick_r
      • 541.leela_r
      • 544.nab_r
      • 548.exchange2_r
      • 549.fotonik3d_r
      • 554.roms_r
      • 557.xz_r
    • geekbench
    • lmbench
    • passmark
    • pbbs
    • phoronix
      • ai-benchmark
      • aircrack-ng
      • amg
      • aobench
      • aom-av1
      • apache
      • apache-iotdb
      • appleseed
      • arrayfire
      • askap
      • asmfish
      • astcenc
      • avifenc
      • basis
      • blake2
      • blogbench
      • blender
      • blosc
      • bork
      • botan
      • brl-cad
      • build-apache
      • build-clash
      • build-eigen
      • build-erlang
      • build-ffmpeg
      • build-gcc
      • build-gdb
      • build-gem5
      • build-godot
      • build-imagemagick
      • build-linux-kernel
      • build-llvm
      • build-mesa
      • build-mplayer
      • build-nodejs
      • build-php
      • build-python
      • build-wasmer
      • build2
      • bullet
      • byte
      • cachebench
      • cassandra
      • clickhouse
      • clomp
      • cloverleaf
      • cockroach
      • compilebench
      • compress-7zip
      • compress-gzip
      • compress-lz4
      • compress-pbzip2
      • compress-rar
      • compress-xz
      • compress-zstd
      • core-latency
      • coremark
      • cp2k
      • cpp-perf-bench
      • cpuminer-opt
      • crafty
      • c-ray
      • cryptopp
      • cryptsetup
      • ctx-clock
      • cython-bench
      • dacapobench
      • daphne
      • darktable
      • dav1d
      • dbench
      • deepsparse
      • deepspeech
      • dolfyn
      • draco
      • dragonflydb
      • duckdb
      • easywave
      • ebizzy
      • embree
      • encode-flac
      • encode-mp3
      • encode-opus
      • encode-wavpack
      • espeak
      • etcpak
      • faiss
      • fast-cli
      • ffmpeg
      • ffte
      • fftw
      • fhourstones
      • financebench
      • furmark
      • gcrypt
      • gegl
      • gimp
      • git
      • glibc-bench
      • gmpbench
      • gnupg
      • gnuradio
      • go-benchmark
      • gpaw
      • graph500
      • graphics-magick
      • gromacs
      • hackbench
      • hadoop
      • heffte
      • helsing
      • himeno
      • hmmer
      • hpcg
      • incompact3d
      • indigobench
      • inkscape
      • ipc-benchmark
      • java-jmh
      • java-scimark2
      • john-the-ripper
      • jpegxl
      • jpegxl-decode
      • kvazaar
      • kripke
      • lammps
      • lczero
      • libraw
      • libreoffice
      • libxsmm
      • liquid-dsp
      • llama.cpp
      • llamafile
      • lulesh
      • lzbench
      • mbw
      • memcached
      • minibude
      • minife
      • mnn
      • mpcbench
      • m-queens
      • mrbayes
      • mutex
      • namd
      • mt-dgemm
      • ncnn
      • neat
      • nettle
      • nginx
      • ngspice
      • node-octane
      • node-web-tooling
      • npb
      • n-queens
      • numpy
      • nwchem
      • oidn
      • onednn
      • octave-benchmark
      • onnx
      • opencv
      • openfoam
      • openjpeg
      • openssl
      • openradioss
      • openscad
      • openvino
      • openvkl
      • ospray
      • ospray-studio
      • palabos
      • parboil
      • pennant
      • perl-benchmark
      • pgbench
      • phpbench
      • pjsip
      • polybench-c
      • polyhedron
      • povray
      • primesieve
      • pybench
      • pyhpc
      • pyperformance
      • pytorch
      • quadray
      • qe
      • qmcpack
      • quantlib
      • quicksilver
      • ramspeed
      • rav1e
      • rawtherapee
      • rbenchmark
      • redis
      • renaissance
      • rnnoise
      • rocksdb
      • rodinia
      • rsvg
      • schbench
      • scikit-learn
      • scimark2
      • scylladb
      • securemark
      • selenium
      • simdjson
      • smallpt
      • smhasher
      • spark
      • spark-tpcds
      • speedb
      • specfem3d
      • sqlite
      • srsran
      • stargate
      • stockfish
      • stream
      • stress-ng
      • svt-av1
      • svt-hevc
      • svt-vp9
      • sudokut
      • synthmark
      • sysbench
      • tensorflow
      • tensorflow-lite
      • tesseract
      • tjbench
      • tnn
      • toybrot
      • tscp
      • ttsiod-renderer
      • tungsten
      • uvg266
      • vkpeak
      • vpxenc
      • v-ray
      • vvenc
      • webp
      • webp2
      • whisper.cpp
      • whisperfile
      • wireguard
      • x264
      • x265
      • xmrig
      • xnnpack
      • y-cruncher
      • z3
    • stream
  • Tools
    • Compilers
    • likwid
    • perf
    • trace-cmd and kernelshark
    • wspy
  • Experiments
    • Histograms
    • clustering
    • Adding summary statistics for all benchmarks
  • Home
  • Blog
  • Workloads
    • cpu2017
      • 500.perlbench_r
      • 502.gcc_r
      • 503.bwaves_r
      • 505.mcf_r
      • 507.cactuBSSN_r
      • 508.namd_r
      • 510.parest_r
      • 511.povray_r
      • 519.lbm_r
      • 520.omnetpp_r
      • 521.wrf_r
      • 523.xalancbmk_r
      • 525.x264_r
      • 526.blender_r
      • 527.cam4_r
      • 531.deepsjeng_r
      • 538.imagick_r
      • 541.leela_r
      • 544.nab_r
      • 548.exchange2_r
      • 549.fotonik3d_r
      • 554.roms_r
      • 557.xz_r
    • geekbench
    • lmbench
    • passmark
    • pbbs
    • phoronix
      • ai-benchmark
      • aircrack-ng
      • amg
      • aobench
      • aom-av1
      • apache
      • apache-iotdb
      • appleseed
      • arrayfire
      • askap
      • asmfish
      • astcenc
      • avifenc
      • b
      • basis
      • blake2
      • blender
      • blogbench
      • blosc
      • bork
      • botan
      • brl-cad
      • build-apache
      • build-clash
      • build-eigen
      • build-erlang
      • build-ffmpeg
      • build-gcc
      • build-gdb
      • build-gem5
      • build-godot
      • build-imagemagick
      • build-linux-kernel
      • build-llvm
      • build-mesa
      • build-mplayer
      • build-nodejs
      • build-php
      • build-python
      • build-wasmer
      • build2
      • bullet
      • byte
      • c-ray
      • cachebench
      • cassandra
      • clickhouse
      • clomp
      • cloverleaf
      • cockroach
      • compilebench
      • compress-7zip
      • compress-gzip
      • compress-lz4
      • compress-pbzip2
      • compress-rar
      • compress-xz
      • compress-zstd
      • core-latency
      • coremark
      • cp2k
      • cpp-perf-bench
      • cpuminer-opt
      • crafty
      • cryptopp
      • cryptsetup
      • ctx-clock
      • cython-bench
      • dacapobench
      • daphne
      • darktable
      • dav1d
      • dbench
      • deepsparse
      • deepspeech
      • dolfyn
      • draco
      • dragonflydb
      • duckdb
      • easywave
      • ebizzy
      • embree
      • encode-flac
      • encode-mp3
      • encode-opus
      • encode-wavpack
      • espeak
      • etcpak
      • faiss
      • fast-cli
      • ffmpeg
      • ffte
      • fftw
      • fhourstones
      • financebench
      • furmark
      • gcrypt
      • gegl
      • gimp
      • git
      • glibc-bench
      • gmpbench
      • gnupg
      • gnuradio
      • go-benchmark
      • gpaw
      • graph500
      • graphics-magick
      • gromacs
      • hackbench
      • hadoop
      • heffte
      • helsing
      • himeno
      • hmmer
      • hpcg
      • incompact3d
      • indigobench
      • inkscape
      • ipc-benchmark
      • java-jmh
      • java-scimark2
      • john-the-ripper
      • jpegxl
      • jpegxl-decode
      • kripke
      • kvazaar
      • lammps
      • lczero
      • libraw
      • libreoffice
      • libxsmm
      • liquid-dsp
      • llama.cpp
      • llamafile
      • lulesh
      • lzbench
      • m-queens
      • mbw
      • memcached
      • minibude
      • minife
      • mnn
      • mpcbench
      • mrbayes
      • mt-dgemm
      • mutex
      • n-queens
      • namd
      • ncnn
      • neat
      • nettle
      • nginx
      • ngspice
      • node-octane
      • node-web-tooling
      • npb
      • numpy
      • nwchem
      • octave-benchmark
      • oidn
      • onednn
      • onnx
      • opencv
      • openfoam
      • openjpeg
      • openradioss
      • openscad
      • openssl
      • openvino
      • openvkl
      • ospray
      • ospray-studio
      • palabos
      • parboil
      • pennant
      • perl-benchmark
      • pgbench
      • phpbench
      • pjsip
      • polybench-c
      • polyhedron
      • povray
      • primesieve
      • pybench
      • pyhpc
      • pyperformance
      • pytorch
      • qe
      • qmcpack
      • quadray
      • quantlib
      • quicksilver
      • ramspeed
      • rav1e
      • rawtherapee
      • rays1bench
      • rbenchmark
      • redis
      • renaissance
      • rnnoise
      • rocksdb
      • rodinia
      • rsvg
      • schbench
      • scikit-learn
      • scimark2
      • scylladb
      • securemark
      • selenium
      • simdjson
      • smallpt
      • smhasher
      • spark
      • spark-tpcds
      • specfem3d
      • speedb
      • sqlite
      • srsran
      • stargate
      • stockfish
      • stream
      • stress-ng
      • sudokut
      • svt-av1
      • svt-hevc
      • svt-vp9
      • synthmark
      • sysbench
      • tensorflow
      • tensorflow-lite
      • tesseract
      • tjbench
      • tnn
      • toybrot
      • tscp
      • ttsiod-renderer
      • tungsten
      • uvg266
      • v-ray
      • vkpeak
      • vpxenc
      • vvenc
      • webp
      • webp2
      • whisper.cpp
      • whisperfile
      • wireguard
      • x264
      • x265
      • xmrig
      • xnnpack
      • y-cruncher
      • z3
    • stream
  • Tools
    • Compilers
    • likwid
    • perf
    • trace-cmd and kernelshark
    • wspy
  • Experiments
Home→Published 2024 1 2 3 >>

Yearly Archives: 2024

Post navigation

← Older posts

Virtualization comparisons

Performance analysis, tools and experiments Posted on November 13, 2024 by mevNovember 13, 2024

Installing and reinstalling Operating Systems can be easier to do if I maintain several virtual machines with each configuration. While this lets me compare VMs and OSs against each other, there is also a question on how the virtual environment compares against the host environment. So I’ve created a few configurations I can use for these comparisons. In particular:

NameThreadsMemoryNotes
boulder1632GBHost: 7840HS, Zen4; ubuntu 24.04
niwot2432GBHost: RX 370, Zen5; ubuntu 24.04
boulder-ubuntu816GBubuntu 24.04 guest
boulder-cachyos816GBcachyos guest
boulder “constrainted”832 GBhost with taskset –cpulist
niwot-ubuntu1216GBubuntu 24.04 guest
niwot-cachyos1216GBcachyos guest
niwot “constrained”1232GBhost with taskset –cpulist

Since I can’t dedicate the entire machine to the VM, I instead bind the VM to run with one thread bound per (hyper-threaded host) core. I also define half the memory. I can then compare this against a host “constrained” configuration that also runs on those same cores, e.g.

taskset --cpu-list 0-15:2 phoronix-test-suite batch-run coremark

The first benchmark I pick for such a comparison is coremark.

NameThreadsScore
boulder16412415
niwot24563857
boulder-ubuntu8296640
boulder-cachyos8310674
boulder “constrained”8317576
niwot-ubuntu12356810
niwot-cachyos12369503
niwot “constrained”12401518

First thing to note is that the 7840 “constrained” configuration runs at 77% of the full host configuration (317576/412415) while the 370 “constrained” configuration runs at 71% (401518/563857) so running half the cores isn’t quite as large for the 370.

Next thing to notice is the Ubuntu virtual machine performance of 7840 is 93% of constrained while 370 is 88% of constrained. The net effect is the host only benchmark is 1.37x faster on 370 than 7840 but the virtual machine is only 1.20x faster. CachyOS is faster and hence it is 98% of host on 7840 and 93% of host on 370.

This is only one benchmark so will also be useful to cross-check how much these trends also apply to other workloads. I can probably also separate this to see how much the “constrained” matches the full system and then see what the virtualization overhead as two separate comparisons.

Posted in experiment | Tagged coremark, virtualization | Leave a reply

Updating to a new kernel and graphics driver

Performance analysis, tools and experiments Posted on November 12, 2024 by mevNovember 12, 2024

My Ryzen 9 AI hx 370 system was repeatedly crashing the display. The system would still be up but dmesg told me there were cpu lockups.

Looking at Phoronix reviews of the HX 370 it suggested their laptop also saw these crashes and they updated to a newer version of the Linux kernel and Mesa graphics. So I found this page which described the “mainline” package for retrieving and picking up kernels.

That worked well until I tried “perf” and it didn’t find a perf package for my kernel. So learned another trick. If you type “sudo apt install linux-tools-” and then a tab it will try to autocomplete. More usefully it will give you a list of available tools. Once that set of perf tools is installed, I then went to do the same thing but “sudo apt install linux-image-” and picked the corresponding kernel.

No to update mesa, I found this page which gave me the instructions for getting the latest Mesa drivers. With both installed, I now will see if this helps my kernel/driver crashes.

Posted in hardware | Tagged kernel, Ryzen AI 9 HX 370 | Leave a reply

SPEC CPU2017 Ryzen AI HX 370 vs. Ryzen 7840 HS

Performance analysis, tools and experiments Posted on October 10, 2024 by mevOctober 11, 2024

As a follow up to previous posting looking at Ryzen AI HX 370, I have also done some SPEC CPU2017 experiments. My general idea is to compare the two processors with a few caveats:

  • I have used a configuration file roughly based on AMD Server configuration files and using the AMD AOCC compiler. However, because I am not trying to publish the absolute best results for hardware (and haven’t tuned to do so) – I will report relative comparison results rather than absolute numbers.
  • I expect AMD to release a new version of AMD AOCC for the Zen5 core. I didn’t have it when I did these comparisons and like using the same flags on both systems so these comparisons used the same flags for both Zen4 and Zen5 systems.
  • SPEC CPU2017 guidelines give a requirement of 2 GB of memory per core. My Ryzen 370 system has 24 cores and only 32 GB of memory. So I expect some benchmarks might run out of memory. For this reason and trying to get an overall comparison I’ve thus done two runs:
    • A 16-copy run on both systems. This uses all (hyperthreaded) cores on the Ryzen 7840 HS and a mix of hyperthreading of Zen5 cores + non-hyperthreading of Zen5C cores.
    • A 24-copy run on the Ryzen 370 system.

Relative results are shown in the tables below. This gives me some opportunities to drill a little deeper on why some benchmarks have larger gains than others.

Overall the differences between 16 threads and 24 threads are interesting. Using 24 threads seems to mostly help the intrate benchmarks with the geomean going from +12% to +21% and every benchmark improving vs 7840. Overall, using 24 threads seems to be more mixed with fprate. On average slightly slower than 16-threads. In both cases, the individual benchmarks also differ.

16-thread24-thread
500.perlbench_r1.121.24
502.gcc_r1.171.15
505.mcf_r1.091.21
520.omnetpp_r1.071.16
523.xalancbmk_r1.351.23
525.x264_r1.191.31
531.deepsjeng_r1.111.18
541.leela_r0.941.07
548.exchange_r1.241.38
557.xz_r0.961.16
geomean1.121.21

My intrate comparisons range from -6% to +35% with a geometric mean of +12%

16-thread24-thread
503.bwaves_r1.111.09
507.cactuBSSN_r1.301.25
508.namd_r1.221.34
510.parest_r1.531.10
511.povray_r1.191.30
519.lbm_r1.631.59
521.wrf_r1.321.17
526.blender_r1.241.27
527.cam4_r1.611.45
538.imagick_r1.191.32
544.nab_r1.191.31
549.fotonik_r1.111.09
554.roms_r1.431.15
geomean1.301.26

My fprate comparisons range from +11% to +63% with a geometric mean of +30%

Posted in experiment, hardware | Tagged 7840HS, cpu2017, Ryzen AI 9 HX 370, Zen5 | Leave a reply

phoronix – Ryzen AI HX 370 vs Ryzen 7840 HS

Performance analysis, tools and experiments Posted on October 10, 2024 by mevOctober 18, 2024

As a follow up comparison of Ryzen AI HX 370 processor compared to Ryzen 7840 HS, this posting looks at some Phoronix benchmarks. I’ve run more than 200 Phoronix benchmarks in analysis using performance counters. I use these clusters to … Continue reading →

Posted in experiment | Tagged 7840HS, performance counters, phoronix, Ryzen AI 9 HX 370 | Leave a reply

New Ryzen AI 9 HX 370 machine

Performance analysis, tools and experiments Posted on October 8, 2024 by mevOctober 10, 2024

I have a new AMD performance machine for experiments. The processor is a Ryzen AI 9 HX 370 in a Beelink SER9 mini-PC. Following are some of the major parameters.in comparison with my Ryzen 7840HS comparison machine. Following are the … Continue reading →

Posted in experiment, hardware | Tagged 7840HS, coremark, Ryzen AI 9 HX 370, stream, Zen5 | Leave a reply

Coremark scaling 7840HS

Performance analysis, tools and experiments Posted on September 27, 2024 by mevSeptember 27, 2024

The following chart shows the Phoronix test suite coremark value when running from 1 to 16 cores. Graphically it looks as follows The question is what causes the inflection points on the graph? The scaling from 1-8 cores decreases only … Continue reading →

Posted in experiment | Tagged 7840HS, coremark, scaling | Leave a reply

wsl and performance counters?

Performance analysis, tools and experiments Posted on July 30, 2024 by mevJuly 30, 2024

I have seen some references that it might be possible to have performance counters in WSL, like this page. If I type Then WSL tells me This seems both encouraging and discouraging. Encouraging that it references a standard version of … Continue reading →

Posted in experiment | Tagged performance counters, wsl, Zen5 | Leave a reply

Ryzen AI, Zen5, article and laptop

Performance analysis, tools and experiments Posted on July 28, 2024 by mevJuly 28, 2024

Zen5 mobile processors have been released. I had ordered an ASUS Zenbook S16 laptop with Ryzen 9 AI 365 processor and it arrived today. Full tech specifications are at the link but include: So far I have only run Windows … Continue reading →

Posted in hardware | Tagged phoronix, Ryzen AI 365, wsl, Zen5 | Leave a reply

CachyOS optimized packages

Performance analysis, tools and experiments Posted on July 15, 2024 by mevJuly 15, 2024

I have a Zen4 7940HS system where I have installed CachyOS. This is an Arch-based Linux OS with a focus on performance. In particular, the Why CachyOS? page cites an optimized scheduler as well as packages compiled for the particular … Continue reading →

Posted in experiment | Tagged benchmarks, cachyos, phoronix | Leave a reply

clustering

Performance analysis, tools and experiments Posted on June 8, 2024 by mevJune 8, 2024

Following are affinity clusters of similar benchmarks based on on_cpu (# of cores used) and topdown metrics (retirement, fronted stall, backend stall, speculation) Experimented some and settled on the following approach for clustering. Attributes of interest First question was “clustering … Continue reading →

Posted in experiment | Tagged benchmarks, cluster | Leave a reply

Post navigation

← Older posts

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Archives

  • November 2024
  • October 2024
  • September 2024
  • July 2024
  • June 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • February 2023

Tags

7840HS bad data benchmarks cachyos cluster compiler coremark cpu2017 data fabric getrusage gnuplot i5-13500H icache ipc kernel l3 metrics namd opcache perf performance counters perf_event_open phoronix Ryzen AI 9 HX 370 Ryzen AI 365 scaling stream threshold topdown tree virtualization website wsl Zen5

Recent Posts

  • Virtualization comparisons
  • Updating to a new kernel and graphics driver
  • SPEC CPU2017 Ryzen AI HX 370 vs. Ryzen 7840 HS
  • phoronix – Ryzen AI HX 370 vs Ryzen 7840 HS
  • New Ryzen AI 9 HX 370 machine
©2026 - Performance analysis, tools and experiments - Weaver Xtreme Theme
↑