↓
 

Performance analysis, tools and experiments

An eclectic collection

  • Overview
  • Blog
  • Workloads
    • cpu2017
      • 500.perlbench_r
      • 502.gcc_r
      • 503.bwaves_r
      • 505.mcf_r
      • 507.cactuBSSN_r
      • 508.namd_r
      • 510.parest_r
      • 511.povray_r
      • 519.lbm_r
      • 520.omnetpp_r
      • 521.wrf_r
      • 523.xalancbmk_r
      • 525.x264_r
      • 526.blender_r
      • 527.cam4_r
      • 531.deepsjeng_r
      • 538.imagick_r
      • 541.leela_r
      • 544.nab_r
      • 548.exchange2_r
      • 549.fotonik3d_r
      • 554.roms_r
      • 557.xz_r
    • geekbench
    • lmbench
    • passmark
    • pbbs
    • phoronix
      • ai-benchmark
      • aircrack-ng
      • amg
      • aobench
      • aom-av1
      • apache
      • apache-iotdb
      • appleseed
      • arrayfire
      • askap
      • asmfish
      • astcenc
      • avifenc
      • basis
      • blake2
      • blogbench
      • blender
      • blosc
      • bork
      • botan
      • brl-cad
      • build-apache
      • build-clash
      • build-eigen
      • build-erlang
      • build-ffmpeg
      • build-gcc
      • build-gdb
      • build-gem5
      • build-godot
      • build-imagemagick
      • build-linux-kernel
      • build-llvm
      • build-mesa
      • build-mplayer
      • build-nodejs
      • build-php
      • build-python
      • build-wasmer
      • build2
      • bullet
      • byte
      • cachebench
      • cassandra
      • clickhouse
      • clomp
      • cloverleaf
      • cockroach
      • compilebench
      • compress-7zip
      • compress-gzip
      • compress-lz4
      • compress-pbzip2
      • compress-rar
      • compress-xz
      • compress-zstd
      • core-latency
      • coremark
      • cp2k
      • cpp-perf-bench
      • cpuminer-opt
      • crafty
      • c-ray
      • cryptopp
      • cryptsetup
      • ctx-clock
      • cython-bench
      • dacapobench
      • daphne
      • darktable
      • dav1d
      • dbench
      • deepsparse
      • deepspeech
      • dolfyn
      • draco
      • dragonflydb
      • duckdb
      • easywave
      • ebizzy
      • embree
      • encode-flac
      • encode-mp3
      • encode-opus
      • encode-wavpack
      • espeak
      • etcpak
      • faiss
      • fast-cli
      • ffmpeg
      • ffte
      • fftw
      • fhourstones
      • financebench
      • furmark
      • gcrypt
      • gegl
      • gimp
      • git
      • glibc-bench
      • gmpbench
      • gnupg
      • gnuradio
      • go-benchmark
      • gpaw
      • graph500
      • graphics-magick
      • gromacs
      • hackbench
      • hadoop
      • heffte
      • helsing
      • himeno
      • hmmer
      • hpcg
      • incompact3d
      • indigobench
      • inkscape
      • ipc-benchmark
      • java-jmh
      • java-scimark2
      • john-the-ripper
      • jpegxl
      • jpegxl-decode
      • kvazaar
      • kripke
      • lammps
      • lczero
      • libraw
      • libreoffice
      • libxsmm
      • liquid-dsp
      • llama.cpp
      • llamafile
      • lulesh
      • lzbench
      • mbw
      • memcached
      • minibude
      • minife
      • mnn
      • mpcbench
      • m-queens
      • mrbayes
      • mutex
      • namd
      • mt-dgemm
      • ncnn
      • neat
      • nettle
      • nginx
      • ngspice
      • node-octane
      • node-web-tooling
      • npb
      • n-queens
      • numpy
      • nwchem
      • oidn
      • onednn
      • octave-benchmark
      • onnx
      • opencv
      • openfoam
      • openjpeg
      • openssl
      • openradioss
      • openscad
      • openvino
      • openvkl
      • ospray
      • ospray-studio
      • palabos
      • parboil
      • pennant
      • perl-benchmark
      • pgbench
      • phpbench
      • pjsip
      • polybench-c
      • polyhedron
      • povray
      • primesieve
      • pybench
      • pyhpc
      • pyperformance
      • pytorch
      • quadray
      • qe
      • qmcpack
      • quantlib
      • quicksilver
      • ramspeed
      • rav1e
      • rawtherapee
      • rbenchmark
      • redis
      • renaissance
      • rnnoise
      • rocksdb
      • rodinia
      • rsvg
      • schbench
      • scikit-learn
      • scimark2
      • scylladb
      • securemark
      • selenium
      • simdjson
      • smallpt
      • smhasher
      • spark
      • spark-tpcds
      • speedb
      • specfem3d
      • sqlite
      • srsran
      • stargate
      • stockfish
      • stream
      • stress-ng
      • svt-av1
      • svt-hevc
      • svt-vp9
      • sudokut
      • synthmark
      • sysbench
      • tensorflow
      • tensorflow-lite
      • tesseract
      • tjbench
      • tnn
      • toybrot
      • tscp
      • ttsiod-renderer
      • tungsten
      • uvg266
      • vkpeak
      • vpxenc
      • v-ray
      • vvenc
      • webp
      • webp2
      • whisper.cpp
      • whisperfile
      • wireguard
      • x264
      • x265
      • xmrig
      • xnnpack
      • y-cruncher
      • z3
    • stream
  • Tools
    • Compilers
    • likwid
    • perf
    • trace-cmd and kernelshark
    • wspy
  • Experiments
    • Histograms
    • clustering
    • Adding summary statistics for all benchmarks
  • Home
  • Blog
  • Workloads
    • cpu2017
      • 500.perlbench_r
      • 502.gcc_r
      • 503.bwaves_r
      • 505.mcf_r
      • 507.cactuBSSN_r
      • 508.namd_r
      • 510.parest_r
      • 511.povray_r
      • 519.lbm_r
      • 520.omnetpp_r
      • 521.wrf_r
      • 523.xalancbmk_r
      • 525.x264_r
      • 526.blender_r
      • 527.cam4_r
      • 531.deepsjeng_r
      • 538.imagick_r
      • 541.leela_r
      • 544.nab_r
      • 548.exchange2_r
      • 549.fotonik3d_r
      • 554.roms_r
      • 557.xz_r
    • geekbench
    • lmbench
    • passmark
    • pbbs
    • phoronix
      • ai-benchmark
      • aircrack-ng
      • amg
      • aobench
      • aom-av1
      • apache
      • apache-iotdb
      • appleseed
      • arrayfire
      • askap
      • asmfish
      • astcenc
      • avifenc
      • b
      • basis
      • blake2
      • blender
      • blogbench
      • blosc
      • bork
      • botan
      • brl-cad
      • build-apache
      • build-clash
      • build-eigen
      • build-erlang
      • build-ffmpeg
      • build-gcc
      • build-gdb
      • build-gem5
      • build-godot
      • build-imagemagick
      • build-linux-kernel
      • build-llvm
      • build-mesa
      • build-mplayer
      • build-nodejs
      • build-php
      • build-python
      • build-wasmer
      • build2
      • bullet
      • byte
      • c-ray
      • cachebench
      • cassandra
      • clickhouse
      • clomp
      • cloverleaf
      • cockroach
      • compilebench
      • compress-7zip
      • compress-gzip
      • compress-lz4
      • compress-pbzip2
      • compress-rar
      • compress-xz
      • compress-zstd
      • core-latency
      • coremark
      • cp2k
      • cpp-perf-bench
      • cpuminer-opt
      • crafty
      • cryptopp
      • cryptsetup
      • ctx-clock
      • cython-bench
      • dacapobench
      • daphne
      • darktable
      • dav1d
      • dbench
      • deepsparse
      • deepspeech
      • dolfyn
      • draco
      • dragonflydb
      • duckdb
      • easywave
      • ebizzy
      • embree
      • encode-flac
      • encode-mp3
      • encode-opus
      • encode-wavpack
      • espeak
      • etcpak
      • faiss
      • fast-cli
      • ffmpeg
      • ffte
      • fftw
      • fhourstones
      • financebench
      • furmark
      • gcrypt
      • gegl
      • gimp
      • git
      • glibc-bench
      • gmpbench
      • gnupg
      • gnuradio
      • go-benchmark
      • gpaw
      • graph500
      • graphics-magick
      • gromacs
      • hackbench
      • hadoop
      • heffte
      • helsing
      • himeno
      • hmmer
      • hpcg
      • incompact3d
      • indigobench
      • inkscape
      • ipc-benchmark
      • java-jmh
      • java-scimark2
      • john-the-ripper
      • jpegxl
      • jpegxl-decode
      • kripke
      • kvazaar
      • lammps
      • lczero
      • libraw
      • libreoffice
      • libxsmm
      • liquid-dsp
      • llama.cpp
      • llamafile
      • lulesh
      • lzbench
      • m-queens
      • mbw
      • memcached
      • minibude
      • minife
      • mnn
      • mpcbench
      • mrbayes
      • mt-dgemm
      • mutex
      • n-queens
      • namd
      • ncnn
      • neat
      • nettle
      • nginx
      • ngspice
      • node-octane
      • node-web-tooling
      • npb
      • numpy
      • nwchem
      • octave-benchmark
      • oidn
      • onednn
      • onnx
      • opencv
      • openfoam
      • openjpeg
      • openradioss
      • openscad
      • openssl
      • openvino
      • openvkl
      • ospray
      • ospray-studio
      • palabos
      • parboil
      • pennant
      • perl-benchmark
      • pgbench
      • phpbench
      • pjsip
      • polybench-c
      • polyhedron
      • povray
      • primesieve
      • pybench
      • pyhpc
      • pyperformance
      • pytorch
      • qe
      • qmcpack
      • quadray
      • quantlib
      • quicksilver
      • ramspeed
      • rav1e
      • rawtherapee
      • rays1bench
      • rbenchmark
      • redis
      • renaissance
      • rnnoise
      • rocksdb
      • rodinia
      • rsvg
      • schbench
      • scikit-learn
      • scimark2
      • scylladb
      • securemark
      • selenium
      • simdjson
      • smallpt
      • smhasher
      • spark
      • spark-tpcds
      • specfem3d
      • speedb
      • sqlite
      • srsran
      • stargate
      • stockfish
      • stream
      • stress-ng
      • sudokut
      • svt-av1
      • svt-hevc
      • svt-vp9
      • synthmark
      • sysbench
      • tensorflow
      • tensorflow-lite
      • tesseract
      • tjbench
      • tnn
      • toybrot
      • tscp
      • ttsiod-renderer
      • tungsten
      • uvg266
      • v-ray
      • vkpeak
      • vpxenc
      • vvenc
      • webp
      • webp2
      • whisper.cpp
      • whisperfile
      • wireguard
      • x264
      • x265
      • xmrig
      • xnnpack
      • y-cruncher
      • z3
    • stream
  • Tools
    • Compilers
    • likwid
    • perf
    • trace-cmd and kernelshark
    • wspy
  • Experiments
Home→Published 2024 → February

Monthly Archives: February 2024

oops…

Performance analysis, tools and experiments Posted on February 24, 2024 by mevFebruary 29, 2024

I was surprised at the narrow range of the opcache hit/miss rate and related metrics for the icache

As it turns out there is an obvious explanation. The metrics were being measured as expected, but my addition to the script was always running an invalid phoronix-test-suite configuration, so I was consistently measuring an invalid run. I have now restarted the (many) runs and expect to eventually have more interesting metrics.

After updating and rerunning tests, I now see the following distributions which make a lot more sense than what I saw before…

Posted in experiment | Tagged bad data, icache, opcache | Leave a reply

Histograms

Performance analysis, tools and experiments Posted on February 11, 2024 by mevFebruary 11, 2024

I now have the ability to create summary histograms characterizing the workloads. These are (re)-generated as I update performance reports, but following is values with ~170 workloads added. Walking through the histograms and what they describe…

Most of the runs are fairly quick, though I have a few benchmarks that run up to several hours. This is the elapsed time that often runs the workload three times. I then run this benchmark ~6 times collecting various metrics.

The distribution of worklods shows a small number of single-threaded workloads, a cluster around the number of cores w/o hyperthreading and then some that use as many cores as possible.

The number of page faults has a few outliers that are interesting for their own analysis: octave-benchmark, gimp, lulesh, openjpeg, tungsten… are these bringing file information into memory and operating on it? There is a similar story with context switches and stress-ng, wireguard, compress-rar which I assume are all more interrupt driven than CPU.

IPC shows a range that is lower than I expected but presumably some of these can’t take advantage as much of the core-bound aspects.

Similar picture for GHz which I calculate using the number of cycles divided by seconds. For some of those on the low end, it is similar to stream – waiting on memory traffic or similar reason? I assume for some others we have power limitations. Given how dynamic power is, assume the combination of IPC and GHz are more important – perhaps try an X/Y scatter plot with both variables?

Retirement rate as a percent of available slots shows more of a bell curve

Frontend stalls have a diminishing relation where those at the high end might be a subset to dive deeper

Backend stalls are more of a bell curve with a minimal amount for any of them and a small subset with a very high percentage

Speculative stalls are low for most workloads with a small number of outliers

Float density has up to have the code with little floating point and the rest on a distribution

Both the opcache and the i-cache miss rates surprise me mostly on how narrow the range of miss-rates are at. Seems like this doesn’t contribute by itself to frontend stalls as much as other factors, e.g. TLB? Separately is the miss rate the right metric or is there a more distilled metric?

Related picture with the icache miss rates.

The L2 cache density (per 1000 instructions); shows where various benchmarks use L2

Branch miss rates have a similar distribution as frontend stalls with most having a low miss rate and then a tail of a few benchmarks with higher miss rates.

How branchy is the code as determined by number of retired branches per 1000 instructions.

SMT contention is the number of slots going to the “other” core in a hyperthread. The large bar on left reflects both single-threaded workloads and those MPI workloads on physical cores.

There is a similar set of Intel 13500H benchmark plots. I won’t include them here because they reflect similar profiles (fortunately).

Overall, the histograms provide both a nice summary of a population of workloads (phoronix) where it also be interesting to compare/contrast with different workloads such as SPEC. It could also be interesting to aggregate the subset of benchmarks for a specific article. It could also be interesting to dive deeper on the outliers to understand how this affects things and how to best optimize. So many different avenues opened from this…

Posted in experiment, website | Tagged benchmarks, gnuplot, website | Leave a reply

Adding summary statistics for all benchmarks

Performance analysis, tools and experiments Posted on February 10, 2024 by mevFebruary 11, 2024

After adding general parsing of measurement statistics, I can now also create a statistical summary across all ~170 benchmarks as shown below. This lets me see for example the minimum IPC, maximum IPC, mean IPC and standard deviation. This will then provide some information whether a particular workload is “low” or “high” in a metric and how significantly.

The statistics below come from the workload statistics with AMD metrics first followed by Intel metrics. For example, based on this table we can see mean values for topdown metrics:

  • Retirement go from 0.8% to 76.2% with a mean of 32.3% and standard deviation of 15.6%. A retirement rate over 64.5% would be two standard deviations above the mean.
  • Frontend stalls go from 0.1% to 73% with a mean of 22.5% and a standard deviation of 17.5%
  • Backend stalls go from 4.1% to 97.1% with a mean of 41.6% and a standard deviation of 21.3%
  • Speculative stalls go from 0% to 21.2% with a mean of 3.56% and a standard deviation of 3.95%

These numbers are recalculated as the reports are re-generated but with 170 workloads mostly included are a good first overview of how the workloads operate on my AMD 7840.

Some next steps including flagging the outliers in the metrics and seeing how I can create histograms for different fields below.

metriccountminmaxmedianmeanstddev
elapsed1742.58.8e+035541.25e+031.62e+03
on_cpu1740167.267.315.53
inblock17205.47e+0603.19e+044.16e+05
onblock1720.464e+051339.97e+033.54e+04
page-fault1744.331.24e+052.36e+031.14e+042.04e+04
context-switch1741.515.12e+04752.18e+037.73e+03
IPC1740.034.631.441.640.88
GHz17404.621.981.911.39
retire-rate1740.876.229.232.315.6
frontend-stall1740.17319.122.517.5
backend-stall1744.197.136.841.621.3
spec-stall174021.22.73.563.95
retire-ucode1740100.07360.131
retire-fastpath1740.776.224.727.814.5
float-density1740.01367667.5133154
frontend-latency1740.158.39.113.612.6
frontend-bandwidth174028.75.56.064.86
opcache-miss8752.454.753.853.80.401
icache-miss878.29.68.58.570.294
backend-cpu1740.7649.312.310.9
backend-memory1740.495.519.923.517.4
amd-l2-miss1740.0859.816.317.311.7
amd-l2-density1740.03647038.249.758.9
spec-branch174021.12.13.073.8
spec-pipeline17401.400.110.205
branch-miss1740.0114.81.962.792.99
branch-density1748.6827612513061.9
branch-cond1745.6627192.798.448.9
branch-ind1740.00329.83.074.535.4
smt-contention174045.412.513.313.1
elapsed1691.491.44e+047501.71e+032.45e+03
on_cpu169015.89.057.765.6
inblock16608.53e+0453.92.33e+039.24e+03
onblock1660.374.25e+0538.19.06e+033.59e+04
page-fault1694.051.17e+051.73e+031.02e+041.91e+04
context-switch1691.819.27e+0472.62.44e+031e+04
IPC1690.055.541.8220.975
GHz16903.061.461.30.885
retire-rate1693.587.642.844.314.7
frontend-stall169150.418.319.411.1
backend-stall1691.393.523.827.618.1
spec-stall169046.76.89.168.55
retire-ucode169016.733.372.4
retire-fastpath1692.383.439.540.914.1
frontend-latency1690.335.19.510.36.63
frontend-bandwidth1690.425.88.49.096
backend-cpu1690.667.710.413.811
backend-memory169090.410.113.814.1
l1-stall77024.45.25.454.88
l2-stall77057.18.69.019.09
l3-stall770352.64.075.51
dram-stall7704968.8610.6
store-stall77028.30.91.563.53
intel-l2-miss1690.5992.926.828.217.1
intel-l2-density1690.02937026.935.745.2
spec-branch169046.76.28.718.58
spec-pipeline16906.20.30.4560.701
branch-miss169020.41.031.812.58
branch-density1696.2427512612760.8
branch-cond1696.2427512612760.8
branch-ind1690.0358321.122.617
Posted in experiment, website | Tagged benchmarks, metrics | Leave a reply

Creating a more automated performance table

Performance analysis, tools and experiments Posted on February 5, 2024 by mevFebruary 5, 2024

I have been maintaining a table by hand with various performance metrics – both on the website and separately in Google Sheets. In addition to the extra work required, I also by nature only put some of the columns. So … Continue reading →

Posted in experiment, website | Tagged metrics | Leave a reply

phoronix – January 2024

Performance analysis, tools and experiments Posted on February 2, 2024 by mevFebruary 2, 2024

Phoronix has published its roundup of benchmark/performance/review articles – https://www.phoronix.com/news/January-2024-Highlights Included were 10 articles with reviews and benchmarks. I’ve been keeping up with CPU workloads listed and now >130 workloads total. I haven’t added GPU/graphics tests because I haven’t developed … Continue reading →

Posted in experiment, website | Tagged benchmarks, phoronix | Leave a reply

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Archives

  • November 2024
  • October 2024
  • September 2024
  • July 2024
  • June 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • February 2023

Tags

7840HS bad data benchmarks cachyos cluster compiler coremark cpu2017 data fabric getrusage gnuplot i5-13500H icache ipc kernel l3 metrics namd opcache perf performance counters perf_event_open phoronix Ryzen AI 9 HX 370 Ryzen AI 365 scaling stream threshold topdown tree virtualization website wsl Zen5

Recent Posts

  • Virtualization comparisons
  • Updating to a new kernel and graphics driver
  • SPEC CPU2017 Ryzen AI HX 370 vs. Ryzen 7840 HS
  • phoronix – Ryzen AI HX 370 vs Ryzen 7840 HS
  • New Ryzen AI 9 HX 370 machine
©2026 - Performance analysis, tools and experiments - Weaver Xtreme Theme
↑