↓
 

Performance analysis, tools and experiments

An eclectic collection

  • Overview
  • Blog
  • Workloads
    • cpu2017
      • 500.perlbench_r
      • 502.gcc_r
      • 503.bwaves_r
      • 505.mcf_r
      • 507.cactuBSSN_r
      • 508.namd_r
      • 510.parest_r
      • 511.povray_r
      • 519.lbm_r
      • 520.omnetpp_r
      • 521.wrf_r
      • 523.xalancbmk_r
      • 525.x264_r
      • 526.blender_r
      • 527.cam4_r
      • 531.deepsjeng_r
      • 538.imagick_r
      • 541.leela_r
      • 544.nab_r
      • 548.exchange2_r
      • 549.fotonik3d_r
      • 554.roms_r
      • 557.xz_r
    • geekbench
    • lmbench
    • passmark
    • pbbs
    • phoronix
      • ai-benchmark
      • aircrack-ng
      • amg
      • aobench
      • aom-av1
      • apache
      • apache-iotdb
      • appleseed
      • arrayfire
      • askap
      • asmfish
      • astcenc
      • avifenc
      • basis
      • blake2
      • blogbench
      • blender
      • blosc
      • bork
      • botan
      • brl-cad
      • build-apache
      • build-clash
      • build-eigen
      • build-erlang
      • build-ffmpeg
      • build-gcc
      • build-gdb
      • build-gem5
      • build-godot
      • build-imagemagick
      • build-linux-kernel
      • build-llvm
      • build-mesa
      • build-mplayer
      • build-nodejs
      • build-php
      • build-python
      • build-wasmer
      • build2
      • bullet
      • byte
      • cachebench
      • cassandra
      • clickhouse
      • clomp
      • cloverleaf
      • cockroach
      • compilebench
      • compress-7zip
      • compress-gzip
      • compress-lz4
      • compress-pbzip2
      • compress-rar
      • compress-xz
      • compress-zstd
      • core-latency
      • coremark
      • cp2k
      • cpp-perf-bench
      • cpuminer-opt
      • crafty
      • c-ray
      • cryptopp
      • cryptsetup
      • ctx-clock
      • cython-bench
      • dacapobench
      • daphne
      • darktable
      • dav1d
      • dbench
      • deepsparse
      • deepspeech
      • dolfyn
      • draco
      • dragonflydb
      • duckdb
      • easywave
      • ebizzy
      • embree
      • encode-flac
      • encode-mp3
      • encode-opus
      • encode-wavpack
      • espeak
      • etcpak
      • faiss
      • fast-cli
      • ffmpeg
      • ffte
      • fftw
      • fhourstones
      • financebench
      • furmark
      • gcrypt
      • gegl
      • gimp
      • git
      • glibc-bench
      • gmpbench
      • gnupg
      • gnuradio
      • go-benchmark
      • gpaw
      • graph500
      • graphics-magick
      • gromacs
      • hackbench
      • hadoop
      • heffte
      • helsing
      • himeno
      • hmmer
      • hpcg
      • incompact3d
      • indigobench
      • inkscape
      • ipc-benchmark
      • java-jmh
      • java-scimark2
      • john-the-ripper
      • jpegxl
      • jpegxl-decode
      • kvazaar
      • kripke
      • lammps
      • lczero
      • libraw
      • libreoffice
      • libxsmm
      • liquid-dsp
      • llama.cpp
      • llamafile
      • lulesh
      • lzbench
      • mbw
      • memcached
      • minibude
      • minife
      • mnn
      • mpcbench
      • m-queens
      • mrbayes
      • mutex
      • namd
      • mt-dgemm
      • ncnn
      • neat
      • nettle
      • nginx
      • ngspice
      • node-octane
      • node-web-tooling
      • npb
      • n-queens
      • numpy
      • nwchem
      • oidn
      • onednn
      • octave-benchmark
      • onnx
      • opencv
      • openfoam
      • openjpeg
      • openssl
      • openradioss
      • openscad
      • openvino
      • openvkl
      • ospray
      • ospray-studio
      • palabos
      • parboil
      • pennant
      • perl-benchmark
      • pgbench
      • phpbench
      • pjsip
      • polybench-c
      • polyhedron
      • povray
      • primesieve
      • pybench
      • pyhpc
      • pyperformance
      • pytorch
      • quadray
      • qe
      • qmcpack
      • quantlib
      • quicksilver
      • ramspeed
      • rav1e
      • rawtherapee
      • rbenchmark
      • redis
      • renaissance
      • rnnoise
      • rocksdb
      • rodinia
      • rsvg
      • schbench
      • scikit-learn
      • scimark2
      • scylladb
      • securemark
      • selenium
      • simdjson
      • smallpt
      • smhasher
      • spark
      • spark-tpcds
      • speedb
      • specfem3d
      • sqlite
      • srsran
      • stargate
      • stockfish
      • stream
      • stress-ng
      • svt-av1
      • svt-hevc
      • svt-vp9
      • sudokut
      • synthmark
      • sysbench
      • tensorflow
      • tensorflow-lite
      • tesseract
      • tjbench
      • tnn
      • toybrot
      • tscp
      • ttsiod-renderer
      • tungsten
      • uvg266
      • vkpeak
      • vpxenc
      • v-ray
      • vvenc
      • webp
      • webp2
      • whisper.cpp
      • whisperfile
      • wireguard
      • x264
      • x265
      • xmrig
      • xnnpack
      • y-cruncher
      • z3
    • stream
  • Tools
    • Compilers
    • likwid
    • perf
    • trace-cmd and kernelshark
    • wspy
  • Experiments
    • Histograms
    • clustering
    • Adding summary statistics for all benchmarks
  • Home
  • Blog
  • Workloads
    • cpu2017
      • 500.perlbench_r
      • 502.gcc_r
      • 503.bwaves_r
      • 505.mcf_r
      • 507.cactuBSSN_r
      • 508.namd_r
      • 510.parest_r
      • 511.povray_r
      • 519.lbm_r
      • 520.omnetpp_r
      • 521.wrf_r
      • 523.xalancbmk_r
      • 525.x264_r
      • 526.blender_r
      • 527.cam4_r
      • 531.deepsjeng_r
      • 538.imagick_r
      • 541.leela_r
      • 544.nab_r
      • 548.exchange2_r
      • 549.fotonik3d_r
      • 554.roms_r
      • 557.xz_r
    • geekbench
    • lmbench
    • passmark
    • pbbs
    • phoronix
      • ai-benchmark
      • aircrack-ng
      • amg
      • aobench
      • aom-av1
      • apache
      • apache-iotdb
      • appleseed
      • arrayfire
      • askap
      • asmfish
      • astcenc
      • avifenc
      • b
      • basis
      • blake2
      • blender
      • blogbench
      • blosc
      • bork
      • botan
      • brl-cad
      • build-apache
      • build-clash
      • build-eigen
      • build-erlang
      • build-ffmpeg
      • build-gcc
      • build-gdb
      • build-gem5
      • build-godot
      • build-imagemagick
      • build-linux-kernel
      • build-llvm
      • build-mesa
      • build-mplayer
      • build-nodejs
      • build-php
      • build-python
      • build-wasmer
      • build2
      • bullet
      • byte
      • c-ray
      • cachebench
      • cassandra
      • clickhouse
      • clomp
      • cloverleaf
      • cockroach
      • compilebench
      • compress-7zip
      • compress-gzip
      • compress-lz4
      • compress-pbzip2
      • compress-rar
      • compress-xz
      • compress-zstd
      • core-latency
      • coremark
      • cp2k
      • cpp-perf-bench
      • cpuminer-opt
      • crafty
      • cryptopp
      • cryptsetup
      • ctx-clock
      • cython-bench
      • dacapobench
      • daphne
      • darktable
      • dav1d
      • dbench
      • deepsparse
      • deepspeech
      • dolfyn
      • draco
      • dragonflydb
      • duckdb
      • easywave
      • ebizzy
      • embree
      • encode-flac
      • encode-mp3
      • encode-opus
      • encode-wavpack
      • espeak
      • etcpak
      • faiss
      • fast-cli
      • ffmpeg
      • ffte
      • fftw
      • fhourstones
      • financebench
      • furmark
      • gcrypt
      • gegl
      • gimp
      • git
      • glibc-bench
      • gmpbench
      • gnupg
      • gnuradio
      • go-benchmark
      • gpaw
      • graph500
      • graphics-magick
      • gromacs
      • hackbench
      • hadoop
      • heffte
      • helsing
      • himeno
      • hmmer
      • hpcg
      • incompact3d
      • indigobench
      • inkscape
      • ipc-benchmark
      • java-jmh
      • java-scimark2
      • john-the-ripper
      • jpegxl
      • jpegxl-decode
      • kripke
      • kvazaar
      • lammps
      • lczero
      • libraw
      • libreoffice
      • libxsmm
      • liquid-dsp
      • llama.cpp
      • llamafile
      • lulesh
      • lzbench
      • m-queens
      • mbw
      • memcached
      • minibude
      • minife
      • mnn
      • mpcbench
      • mrbayes
      • mt-dgemm
      • mutex
      • n-queens
      • namd
      • ncnn
      • neat
      • nettle
      • nginx
      • ngspice
      • node-octane
      • node-web-tooling
      • npb
      • numpy
      • nwchem
      • octave-benchmark
      • oidn
      • onednn
      • onnx
      • opencv
      • openfoam
      • openjpeg
      • openradioss
      • openscad
      • openssl
      • openvino
      • openvkl
      • ospray
      • ospray-studio
      • palabos
      • parboil
      • pennant
      • perl-benchmark
      • pgbench
      • phpbench
      • pjsip
      • polybench-c
      • polyhedron
      • povray
      • primesieve
      • pybench
      • pyhpc
      • pyperformance
      • pytorch
      • qe
      • qmcpack
      • quadray
      • quantlib
      • quicksilver
      • ramspeed
      • rav1e
      • rawtherapee
      • rays1bench
      • rbenchmark
      • redis
      • renaissance
      • rnnoise
      • rocksdb
      • rodinia
      • rsvg
      • schbench
      • scikit-learn
      • scimark2
      • scylladb
      • securemark
      • selenium
      • simdjson
      • smallpt
      • smhasher
      • spark
      • spark-tpcds
      • specfem3d
      • speedb
      • sqlite
      • srsran
      • stargate
      • stockfish
      • stream
      • stress-ng
      • sudokut
      • svt-av1
      • svt-hevc
      • svt-vp9
      • synthmark
      • sysbench
      • tensorflow
      • tensorflow-lite
      • tesseract
      • tjbench
      • tnn
      • toybrot
      • tscp
      • ttsiod-renderer
      • tungsten
      • uvg266
      • v-ray
      • vkpeak
      • vpxenc
      • vvenc
      • webp
      • webp2
      • whisper.cpp
      • whisperfile
      • wireguard
      • x264
      • x265
      • xmrig
      • xnnpack
      • y-cruncher
      • z3
    • stream
  • Tools
    • Compilers
    • likwid
    • perf
    • trace-cmd and kernelshark
    • wspy
  • Experiments
Home→Tags cluster

Tag Archives: cluster

clustering

Performance analysis, tools and experiments Posted on June 8, 2024 by mevJune 8, 2024

Following are affinity clusters of similar benchmarks based on on_cpu (# of cores used) and topdown metrics (retirement, fronted stall, backend stall, speculation)

Cluster 0 (16 entries): 500.perlbench_r 508.namd_r 511.povray_r 525.x264_r 544.nab_r brl-cad c-ray john-the-ripper namd openssl povray qe quicksilver specfem3d svt-hevc vvenc
Cluster 1 (23 entries): 503.bwaves_r 507.cactuBSSN_r 510.parest_r 519.lbm_r 520.omnetpp_r 521.wrf_r 549.fotonik3d_r 554.roms_r ai-benchmark cloverleaf easywave kripke libxsmm minibud
e mt-dgemm ncnn oidn onednn openvino stream tensorflow xmrig y-cruncher
Cluster 2 (4 entries): 541.leela_r compress-7zip m-queens n-queens
Cluster 3 (5 entries): aobench compress-lz4 gnupg lammps lzbench
Cluster 4 (7 entries): 538.imagick_r 548.exchange2_r avifenc helsing hmmer rays1bench uvg266
Cluster 5 (6 entries): build-eigen build-python compress-gzip hadoop inkscape tscp
Cluster 6 (8 entries): 505.mcf_r 531.deepsjeng_r appleseed asmfish blender primesieve stockfish v-ray
Cluster 7 (14 entries): build2 build-erlang build-ffmpeg build-gcc build-gdb build-gem5 build-godot build-imagemagick build-linux-kernel build-llvm build-mesa build-mplayer build-php
 build-wasmer
Cluster 8 (10 entries): apache cassandra compilebench ctx-clock dbench fast-cli ipc-benchmark memcached sqlite wireguard
Cluster 9 (8 entries): cp2k graphics-magick qmcpack rodinia smallpt stargate vpxenc x264
Cluster 10 (6 entries): blosc clickhouse core-latency daphne gimp mbw
Cluster 11 (6 entries): arrayfire dragonflydb nginx pgbench pjsip rbenchmark
Cluster 12 (5 entries): espeak phpbench pybench securemark smhasher
Cluster 13 (7 entries): apache-iotdb encode-wavpack fftw gcrypt java-scimark2 polybench-c synthmark
Cluster 14 (12 entries): 523.xalancbmk_r 526.blender_r 527.cam4_r 557.xz_r embree graph500 lczero openvkl ospray-studio quadray sysbench tensorflow-lite
Cluster 15 (10 entries): amg askap heffte hpcg incompact3d onnx openfoam parboil pytorch ramspeed
Cluster 16 (9 entries): compress-zstd cpp-perf-bench draco indigobench jpegxl jpegxl-decode polyhedron scimark2 z3
Cluster 17 (12 entries): clomp darktable deepsparse deepspeech ffte himeno llama.cpp llamafile lulesh ngspice npb palabos
Cluster 18 (6 entries): 502.gcc_r build-nodejs ebizzy faiss minife mnn
Cluster 19 (5 entries): botan cachebench glibc-bench gnuradio nettle
Cluster 20 (8 entries): blake2 build-apache build-clash octave-benchmark openjpeg selenium tungsten vkpeak
Cluster 21 (5 entries): mpcbench node-octane openscad pyperformance rav1e
Cluster 22 (7 entries): bork byte libreoffice numpy perl-benchmark rsvg sudokut
Cluster 23 (9 entries): compress-rar dacapobench duckdb ffmpeg gegl node-web-tooling pyhpc renaissance spark-tpcds
Cluster 24 (11 entries): aircrack-ng astcenc basis coremark cpuminer-opt java-jmh kvazaar mrbayes quantlib toybrot webp2
Cluster 25 (6 entries): dav1d opencv ospray schbench svt-av1 svt-vp9
Cluster 26 (9 entries): cryptopp dolfyn encode-flac gmpbench mutex nwchem rnnoise simdjson spark
Cluster 27 (6 entries): cockroach hackbench rocksdb scylladb speedb stress-ng
Cluster 28 (13 entries): aom-av1 financebench gpaw gromacs liquid-dsp neat openradioss pennant rawtherapee srsran tnn whisper.cpp x265
Cluster 29 (11 entries): bullet compress-pbzip2 crafty cython-bench encode-mp3 encode-opus etcpak fhourstones git libraw webp

Experimented some and settled on the following approach for clustering.

Attributes of interest

First question was “clustering based on what”. Following is a more complete set of metrics that range anywhere from the amount of I/O to floating density to counter-based instrumentation. These also have rather different ranges though can be normalized using the mean and standard deviation. I experimented first using a set of 10 metrics (on_cpu, retire, frontend, backend, speculation, IPC, GHz, float-density, branch-density, smt-contention) before settling on just the first five.

Some of these metrics are correlated and why I figured it wouldn’t add value to use both AMD and Intel. Some such as I/O metrics are very orthogonal and likely make sense in broader context but I don’t have enough study to clearly characterize.

metriccountminmaxmedianmeanstddev
elapsed2473.538.76e+035321.15e+031.53e+03
on_cpu247015.96.346.685.42
inblock24704.46e+0503.65e+033.92e+04
onblock2470.468.97e+051311.57e+047.91e+04
page-fault2472.61.31e+052.1e+031.17e+042.1e+04
context-switch2472.884.77e+04662.49e+037.58e+03
IPC2470.014.591.511.640.888
GHz24705.231.871.821.4
retire-rate2470.776.229.232.116
frontend-stall2470.172.816.321.718
backend-stall2474.197.438.242.921.9
spec-stall247021.323.283.78
retire-ucode24701.200.07690.141
retire-fastpath2470.776.225.527.714.5
float-density2470.01667662.9129151
frontend-latency2470.157.88.413.413.6
frontend-bandwidth247031.74.85.834.8
opcache-miss247065.96.113.415.1
icache-miss2470.16913.416.811.5
backend-cpu2470.7649.412.811.4
backend-memory2470.295.919.924.618.1
amd-l2-miss2470.0567.517.118.313
amd-l2-density2470.0224.29e+0435.12292.72e+03
spec-branch247021.21.72.823.6
spec-pipeline2470200.1130.249
branch-miss247014.81.852.722.99
branch-density2474.931712813061.8
branch-cond2474.531192.398.348.8
branch-ind2470.00328.72.854.445.12
smt-contention247048.49.612.713.3
elapsed238161.69e+046551.66e+032.5e+03
on_cpu238015.86.7775.54
inblock2380.013.86e+0565.86.31e+033.54e+04
onblock2380.376.35e+0519.91.1e+045.18e+04
page-fault2384.071.21e+051.68e+031.06e+042.02e+04
context-switch2382.236.44e+0462.42.49e+038.92e+03
IPC2380.015.531.892.021.01
GHz23803.041.321.20.88
retire-rate2383.787.343.243.415.4
frontend-stall2380.55215.917.811
backend-stall2381.395.326.230.920.2
spec-stall238046.76.18.388.3
retire-ucode238016.72.93.262.26
retire-fastpath2382.483.239.640.214.9
frontend-latency2380.339.28.69.576.73
frontend-bandwidth2380.125.87.38.25.84
backend-cpu2380.676.111.515.512.4
backend-memory2380.389.910.715.416.1
l1-stall238029.64.15.535.66
l2-stall238058.87.17.848.18
l3-stall2380352.33.985.42
dram-stall238086.93.37.4611.7
store-stall238028.30.81.73.11
intel-l2-miss2380.3592.729.130.118.2
intel-l2-density2380.0192.12e+0421.91261.37e+03
spec-branch238046.75.77.958.3
spec-pipeline23806.20.30.4270.652
branch-miss238020.40.831.572.32
branch-density2386.2432012912961
branch-cond2386.2432012912961
branch-ind2380.0278320.822.117.4

Clustering algorithm

After a web search, I ended up with a variation of Lloyd’s Algorithm. This seems relatively straightforward and settles fairly quickly on a set of clusters. As a summary, this goes through the following steps:

  • Start with a set of randomly chosen cluster points. I picked every Nth benchmark as my starting point
  • Now assign each benchmark to the closest cluster point. I used a simple Nth dimensional distance as sqrt(distance1 ^2 + distance2 ^2 + distance3 ^2 … distanceN^2).
  • Now recompute the cluster points based on a center of the points assigned to the cluster
  • Iterate the last two steps until it converges on a set of clusters.

This takes ~8 iterations when I tried it on ~240 phoronix tests along with 23 SPEC CPU 2017 benchmarks.

Now with these clusters, I can both use them to spread out a benchmark analysis across a wide range of different benchmarks and also to think of more similar benchmarks that might substitute for each other. For example cluster #7 from the list above collects a set of build benchmarks while cluster #1 looks like a set of backend bound benchmarks running on all cores.

Posted in experiment | Tagged benchmarks, cluster | Leave a reply

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Archives

  • November 2024
  • October 2024
  • September 2024
  • July 2024
  • June 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • February 2023

Tags

7840HS bad data benchmarks cachyos cluster compiler coremark cpu2017 data fabric getrusage gnuplot i5-13500H icache ipc kernel l3 metrics namd opcache perf performance counters perf_event_open phoronix Ryzen AI 9 HX 370 Ryzen AI 365 scaling stream threshold topdown tree virtualization website wsl Zen5

Recent Posts

  • Virtualization comparisons
  • Updating to a new kernel and graphics driver
  • SPEC CPU2017 Ryzen AI HX 370 vs. Ryzen 7840 HS
  • phoronix – Ryzen AI HX 370 vs Ryzen 7840 HS
  • New Ryzen AI 9 HX 370 machine
©2026 - Performance analysis, tools and experiments - Weaver Xtreme Theme
↑