↓
 

Performance analysis, tools and experiments

An eclectic collection

  • Overview
  • Blog
  • Workloads
    • cpu2017
      • 500.perlbench_r
      • 502.gcc_r
      • 503.bwaves_r
      • 505.mcf_r
      • 507.cactuBSSN_r
      • 508.namd_r
      • 510.parest_r
      • 511.povray_r
      • 519.lbm_r
      • 520.omnetpp_r
      • 521.wrf_r
      • 523.xalancbmk_r
      • 525.x264_r
      • 526.blender_r
      • 527.cam4_r
      • 531.deepsjeng_r
      • 538.imagick_r
      • 541.leela_r
      • 544.nab_r
      • 548.exchange2_r
      • 549.fotonik3d_r
      • 554.roms_r
      • 557.xz_r
    • geekbench
    • lmbench
    • passmark
    • pbbs
    • phoronix
      • ai-benchmark
      • aircrack-ng
      • amg
      • aobench
      • aom-av1
      • apache
      • apache-iotdb
      • appleseed
      • arrayfire
      • askap
      • asmfish
      • astcenc
      • avifenc
      • basis
      • blake2
      • blogbench
      • blender
      • blosc
      • bork
      • botan
      • brl-cad
      • build-apache
      • build-clash
      • build-eigen
      • build-erlang
      • build-ffmpeg
      • build-gcc
      • build-gdb
      • build-gem5
      • build-godot
      • build-imagemagick
      • build-linux-kernel
      • build-llvm
      • build-mesa
      • build-mplayer
      • build-nodejs
      • build-php
      • build-python
      • build-wasmer
      • build2
      • bullet
      • byte
      • cachebench
      • cassandra
      • clickhouse
      • clomp
      • cloverleaf
      • cockroach
      • compilebench
      • compress-7zip
      • compress-gzip
      • compress-lz4
      • compress-pbzip2
      • compress-rar
      • compress-xz
      • compress-zstd
      • core-latency
      • coremark
      • cp2k
      • cpp-perf-bench
      • cpuminer-opt
      • crafty
      • c-ray
      • cryptopp
      • cryptsetup
      • ctx-clock
      • cython-bench
      • dacapobench
      • daphne
      • darktable
      • dav1d
      • dbench
      • deepsparse
      • deepspeech
      • dolfyn
      • draco
      • dragonflydb
      • duckdb
      • easywave
      • ebizzy
      • embree
      • encode-flac
      • encode-mp3
      • encode-opus
      • encode-wavpack
      • espeak
      • etcpak
      • faiss
      • fast-cli
      • ffmpeg
      • ffte
      • fftw
      • fhourstones
      • financebench
      • furmark
      • gcrypt
      • gegl
      • gimp
      • git
      • glibc-bench
      • gmpbench
      • gnupg
      • gnuradio
      • go-benchmark
      • gpaw
      • graph500
      • graphics-magick
      • gromacs
      • hackbench
      • hadoop
      • heffte
      • helsing
      • himeno
      • hmmer
      • hpcg
      • incompact3d
      • indigobench
      • inkscape
      • ipc-benchmark
      • java-jmh
      • java-scimark2
      • john-the-ripper
      • jpegxl
      • jpegxl-decode
      • kvazaar
      • kripke
      • lammps
      • lczero
      • libraw
      • libreoffice
      • libxsmm
      • liquid-dsp
      • llama.cpp
      • llamafile
      • lulesh
      • lzbench
      • mbw
      • memcached
      • minibude
      • minife
      • mnn
      • mpcbench
      • m-queens
      • mrbayes
      • mutex
      • namd
      • mt-dgemm
      • ncnn
      • neat
      • nettle
      • nginx
      • ngspice
      • node-octane
      • node-web-tooling
      • npb
      • n-queens
      • numpy
      • nwchem
      • oidn
      • onednn
      • octave-benchmark
      • onnx
      • opencv
      • openfoam
      • openjpeg
      • openssl
      • openradioss
      • openscad
      • openvino
      • openvkl
      • ospray
      • ospray-studio
      • palabos
      • parboil
      • pennant
      • perl-benchmark
      • pgbench
      • phpbench
      • pjsip
      • polybench-c
      • polyhedron
      • povray
      • primesieve
      • pybench
      • pyhpc
      • pyperformance
      • pytorch
      • quadray
      • qe
      • qmcpack
      • quantlib
      • quicksilver
      • ramspeed
      • rav1e
      • rawtherapee
      • rbenchmark
      • redis
      • renaissance
      • rnnoise
      • rocksdb
      • rodinia
      • rsvg
      • schbench
      • scikit-learn
      • scimark2
      • scylladb
      • securemark
      • selenium
      • simdjson
      • smallpt
      • smhasher
      • spark
      • spark-tpcds
      • speedb
      • specfem3d
      • sqlite
      • srsran
      • stargate
      • stockfish
      • stream
      • stress-ng
      • svt-av1
      • svt-hevc
      • svt-vp9
      • sudokut
      • synthmark
      • sysbench
      • tensorflow
      • tensorflow-lite
      • tesseract
      • tjbench
      • tnn
      • toybrot
      • tscp
      • ttsiod-renderer
      • tungsten
      • uvg266
      • vkpeak
      • vpxenc
      • v-ray
      • vvenc
      • webp
      • webp2
      • whisper.cpp
      • whisperfile
      • wireguard
      • x264
      • x265
      • xmrig
      • xnnpack
      • y-cruncher
      • z3
    • stream
  • Tools
    • Compilers
    • likwid
    • perf
    • trace-cmd and kernelshark
    • wspy
  • Experiments
    • Histograms
    • clustering
    • Adding summary statistics for all benchmarks
  • Home
  • Blog
  • Workloads
    • cpu2017
      • 500.perlbench_r
      • 502.gcc_r
      • 503.bwaves_r
      • 505.mcf_r
      • 507.cactuBSSN_r
      • 508.namd_r
      • 510.parest_r
      • 511.povray_r
      • 519.lbm_r
      • 520.omnetpp_r
      • 521.wrf_r
      • 523.xalancbmk_r
      • 525.x264_r
      • 526.blender_r
      • 527.cam4_r
      • 531.deepsjeng_r
      • 538.imagick_r
      • 541.leela_r
      • 544.nab_r
      • 548.exchange2_r
      • 549.fotonik3d_r
      • 554.roms_r
      • 557.xz_r
    • geekbench
    • lmbench
    • passmark
    • pbbs
    • phoronix
      • ai-benchmark
      • aircrack-ng
      • amg
      • aobench
      • aom-av1
      • apache
      • apache-iotdb
      • appleseed
      • arrayfire
      • askap
      • asmfish
      • astcenc
      • avifenc
      • b
      • basis
      • blake2
      • blender
      • blogbench
      • blosc
      • bork
      • botan
      • brl-cad
      • build-apache
      • build-clash
      • build-eigen
      • build-erlang
      • build-ffmpeg
      • build-gcc
      • build-gdb
      • build-gem5
      • build-godot
      • build-imagemagick
      • build-linux-kernel
      • build-llvm
      • build-mesa
      • build-mplayer
      • build-nodejs
      • build-php
      • build-python
      • build-wasmer
      • build2
      • bullet
      • byte
      • c-ray
      • cachebench
      • cassandra
      • clickhouse
      • clomp
      • cloverleaf
      • cockroach
      • compilebench
      • compress-7zip
      • compress-gzip
      • compress-lz4
      • compress-pbzip2
      • compress-rar
      • compress-xz
      • compress-zstd
      • core-latency
      • coremark
      • cp2k
      • cpp-perf-bench
      • cpuminer-opt
      • crafty
      • cryptopp
      • cryptsetup
      • ctx-clock
      • cython-bench
      • dacapobench
      • daphne
      • darktable
      • dav1d
      • dbench
      • deepsparse
      • deepspeech
      • dolfyn
      • draco
      • dragonflydb
      • duckdb
      • easywave
      • ebizzy
      • embree
      • encode-flac
      • encode-mp3
      • encode-opus
      • encode-wavpack
      • espeak
      • etcpak
      • faiss
      • fast-cli
      • ffmpeg
      • ffte
      • fftw
      • fhourstones
      • financebench
      • furmark
      • gcrypt
      • gegl
      • gimp
      • git
      • glibc-bench
      • gmpbench
      • gnupg
      • gnuradio
      • go-benchmark
      • gpaw
      • graph500
      • graphics-magick
      • gromacs
      • hackbench
      • hadoop
      • heffte
      • helsing
      • himeno
      • hmmer
      • hpcg
      • incompact3d
      • indigobench
      • inkscape
      • ipc-benchmark
      • java-jmh
      • java-scimark2
      • john-the-ripper
      • jpegxl
      • jpegxl-decode
      • kripke
      • kvazaar
      • lammps
      • lczero
      • libraw
      • libreoffice
      • libxsmm
      • liquid-dsp
      • llama.cpp
      • llamafile
      • lulesh
      • lzbench
      • m-queens
      • mbw
      • memcached
      • minibude
      • minife
      • mnn
      • mpcbench
      • mrbayes
      • mt-dgemm
      • mutex
      • n-queens
      • namd
      • ncnn
      • neat
      • nettle
      • nginx
      • ngspice
      • node-octane
      • node-web-tooling
      • npb
      • numpy
      • nwchem
      • octave-benchmark
      • oidn
      • onednn
      • onnx
      • opencv
      • openfoam
      • openjpeg
      • openradioss
      • openscad
      • openssl
      • openvino
      • openvkl
      • ospray
      • ospray-studio
      • palabos
      • parboil
      • pennant
      • perl-benchmark
      • pgbench
      • phpbench
      • pjsip
      • polybench-c
      • polyhedron
      • povray
      • primesieve
      • pybench
      • pyhpc
      • pyperformance
      • pytorch
      • qe
      • qmcpack
      • quadray
      • quantlib
      • quicksilver
      • ramspeed
      • rav1e
      • rawtherapee
      • rays1bench
      • rbenchmark
      • redis
      • renaissance
      • rnnoise
      • rocksdb
      • rodinia
      • rsvg
      • schbench
      • scikit-learn
      • scimark2
      • scylladb
      • securemark
      • selenium
      • simdjson
      • smallpt
      • smhasher
      • spark
      • spark-tpcds
      • specfem3d
      • speedb
      • sqlite
      • srsran
      • stargate
      • stockfish
      • stream
      • stress-ng
      • sudokut
      • svt-av1
      • svt-hevc
      • svt-vp9
      • synthmark
      • sysbench
      • tensorflow
      • tensorflow-lite
      • tesseract
      • tjbench
      • tnn
      • toybrot
      • tscp
      • ttsiod-renderer
      • tungsten
      • uvg266
      • v-ray
      • vkpeak
      • vpxenc
      • vvenc
      • webp
      • webp2
      • whisper.cpp
      • whisperfile
      • wireguard
      • x264
      • x265
      • xmrig
      • xnnpack
      • y-cruncher
      • z3
    • stream
  • Tools
    • Compilers
    • likwid
    • perf
    • trace-cmd and kernelshark
    • wspy
  • Experiments
Home→Tags benchmarks

Tag Archives: benchmarks

CachyOS optimized packages

Performance analysis, tools and experiments Posted on July 15, 2024 by mevJuly 15, 2024

I have a Zen4 7940HS system where I have installed CachyOS. This is an Arch-based Linux OS with a focus on performance. In particular, the Why CachyOS? page cites an optimized scheduler as well as packages compiled for the particular architecture. For example, rather than packages compiled to run on the lowest-common-denominator architecture, CachyOS has optimized packages for different “levels” The “-v3” level enables architectures newer than Intel Haswell or AMD Excavator and the “-v4” level enables use of AVX-512.

The July 2024 release notes highlight the addition of a Zen4 optimized repository

This is our 8th release this year, and we are very proud to announce a new optimized repository. Starting with this release, we are providing a Zen4 optimized repository. This repository will be automatically used at new installation for Zen4 and Zen5 CPUs, to provide the best performance.

The znver4 target provides a bunch of extra avx512 extensions and also other instructions. Here you can find a list of the additional used instructions by the compiler compared to the x86-64-v4 target: abm, adx, aes, avx512bf16, avx512bitalg, avx512ifma, avx512vbmi, avx512vbmi2, avx512vnni, avx512vpopctndq, clflushopt, clwb, clzero, fsgsbase, gfni, mwaitx, pclmul, pku. prfchw, rpdid, rdrnd, rdseed, sha, sse4a, vaes, vockmulqdq, wbnoinvd, savec, xsaveopt, xsaves

This seemed intriguing so I decided to try a somewhat random collection of Phoronix tests using this repository. I compare the performance running CachyOS with Zen4 vs Ubuntu 22.04. A summary table follows:

MetricDirectionCachyOS Zen4Ubuntu 22.04Ratio
coremarkhigher410390 iterations/sec438579 iterations/sec0.936
build-linux-kernellower114.035 seconds116.25 seconds1.019
openssl: SHA256higher13254334897 / second13420827833 / second0.988
openssl: SHA512higher4539223377 / second4413779170 / second1.028
openssl: RSA4096higher5949.0 sign/s5713.1 sign/s1.041
openssl: ChaCha20higher56376095723 byte/second55209170900 byte/second1.021
openssl: AES-128-GCMhigher108245069000 byte/second106316322953 byte/second1.018
openssl: AES-256-GCMhigher93611141897 byte/second91688082637 byte/second1.021
openssl: ChaCha20-Poly1035higher40096535620 byte/second39278150087 byte/second1.021
phpbenchhigher2243593 score1055625 score2.125
ospray: particle_volume/aohigher3.73062 / second3.64172 / second1.024
ospray: particle_volume/scivishigher3.685 / second3.62516 / second1.017
ospray: particle_volume/pathtracerhigher120.966 / second121.287 / second0.997
ospray: gravity/aohigher2.62208 / second2.79344 / second0.939
ospray: gravity/scvishigher2.66757 / second2.76412 / second0.965
ospray: gravity/pathtracerhigher3.58901 / second3.45087 / second1.040
rawtherapeelower51.331 seconds51.189 seconds0.997
namd: ATPasehigher1.30961 / day1.26995 / day1.031
namd: STMVhigher0.391640.37621 / day1.048

The benchmarks selected were a subset of those from this Phoronix article. That article compares the various CachyOS repositories against each other while I am doing a comparison vs. Ubuntu. Overall there was a smaller increase than I expected/hoped. Some particular items I will note from the able above:

  • One additional difference is that CachyOS has a new gcc version 14.1 while Ubuntu 22.04 has gcc version 11.4
  • The coremark benchmark compiles the coremark code with gcc -O2. Not sure why that became slower but related to compiler?
  • The build-linux-kernel particularly measures the time of the kernel itself. I was pleasantly surprised to see this faster as my guess would have been that compile speed could have slowed
  • The various OpenSSL benchmarks might be specific to the underlying instructions and again nice to see them slightly faster
  • Phpbench is the particular outlier with a 2x performance improvement.
  • Ospray is mixed with a few benchmarks faster and a few slower.
  • rawtherapee is just a slight bit slower
  • namd also shows a small improvement

Overall, it is nice to have one system running CachyOS as a dynamic updated system. Occasionally it is slightly more difficult to get benchmarks to run than Ubuntu. Presumably this is because that tends to be the default choice. So I don’t expect to shift everything over to CachyOS.

Posted in experiment | Tagged benchmarks, cachyos, phoronix | Leave a reply

clustering

Performance analysis, tools and experiments Posted on June 8, 2024 by mevJune 8, 2024

Following are affinity clusters of similar benchmarks based on on_cpu (# of cores used) and topdown metrics (retirement, fronted stall, backend stall, speculation)

Cluster 0 (16 entries): 500.perlbench_r 508.namd_r 511.povray_r 525.x264_r 544.nab_r brl-cad c-ray john-the-ripper namd openssl povray qe quicksilver specfem3d svt-hevc vvenc
Cluster 1 (23 entries): 503.bwaves_r 507.cactuBSSN_r 510.parest_r 519.lbm_r 520.omnetpp_r 521.wrf_r 549.fotonik3d_r 554.roms_r ai-benchmark cloverleaf easywave kripke libxsmm minibud
e mt-dgemm ncnn oidn onednn openvino stream tensorflow xmrig y-cruncher
Cluster 2 (4 entries): 541.leela_r compress-7zip m-queens n-queens
Cluster 3 (5 entries): aobench compress-lz4 gnupg lammps lzbench
Cluster 4 (7 entries): 538.imagick_r 548.exchange2_r avifenc helsing hmmer rays1bench uvg266
Cluster 5 (6 entries): build-eigen build-python compress-gzip hadoop inkscape tscp
Cluster 6 (8 entries): 505.mcf_r 531.deepsjeng_r appleseed asmfish blender primesieve stockfish v-ray
Cluster 7 (14 entries): build2 build-erlang build-ffmpeg build-gcc build-gdb build-gem5 build-godot build-imagemagick build-linux-kernel build-llvm build-mesa build-mplayer build-php
 build-wasmer
Cluster 8 (10 entries): apache cassandra compilebench ctx-clock dbench fast-cli ipc-benchmark memcached sqlite wireguard
Cluster 9 (8 entries): cp2k graphics-magick qmcpack rodinia smallpt stargate vpxenc x264
Cluster 10 (6 entries): blosc clickhouse core-latency daphne gimp mbw
Cluster 11 (6 entries): arrayfire dragonflydb nginx pgbench pjsip rbenchmark
Cluster 12 (5 entries): espeak phpbench pybench securemark smhasher
Cluster 13 (7 entries): apache-iotdb encode-wavpack fftw gcrypt java-scimark2 polybench-c synthmark
Cluster 14 (12 entries): 523.xalancbmk_r 526.blender_r 527.cam4_r 557.xz_r embree graph500 lczero openvkl ospray-studio quadray sysbench tensorflow-lite
Cluster 15 (10 entries): amg askap heffte hpcg incompact3d onnx openfoam parboil pytorch ramspeed
Cluster 16 (9 entries): compress-zstd cpp-perf-bench draco indigobench jpegxl jpegxl-decode polyhedron scimark2 z3
Cluster 17 (12 entries): clomp darktable deepsparse deepspeech ffte himeno llama.cpp llamafile lulesh ngspice npb palabos
Cluster 18 (6 entries): 502.gcc_r build-nodejs ebizzy faiss minife mnn
Cluster 19 (5 entries): botan cachebench glibc-bench gnuradio nettle
Cluster 20 (8 entries): blake2 build-apache build-clash octave-benchmark openjpeg selenium tungsten vkpeak
Cluster 21 (5 entries): mpcbench node-octane openscad pyperformance rav1e
Cluster 22 (7 entries): bork byte libreoffice numpy perl-benchmark rsvg sudokut
Cluster 23 (9 entries): compress-rar dacapobench duckdb ffmpeg gegl node-web-tooling pyhpc renaissance spark-tpcds
Cluster 24 (11 entries): aircrack-ng astcenc basis coremark cpuminer-opt java-jmh kvazaar mrbayes quantlib toybrot webp2
Cluster 25 (6 entries): dav1d opencv ospray schbench svt-av1 svt-vp9
Cluster 26 (9 entries): cryptopp dolfyn encode-flac gmpbench mutex nwchem rnnoise simdjson spark
Cluster 27 (6 entries): cockroach hackbench rocksdb scylladb speedb stress-ng
Cluster 28 (13 entries): aom-av1 financebench gpaw gromacs liquid-dsp neat openradioss pennant rawtherapee srsran tnn whisper.cpp x265
Cluster 29 (11 entries): bullet compress-pbzip2 crafty cython-bench encode-mp3 encode-opus etcpak fhourstones git libraw webp

Experimented some and settled on the following approach for clustering.

Attributes of interest

First question was “clustering based on what”. Following is a more complete set of metrics that range anywhere from the amount of I/O to floating density to counter-based instrumentation. These also have rather different ranges though can be normalized using the mean and standard deviation. I experimented first using a set of 10 metrics (on_cpu, retire, frontend, backend, speculation, IPC, GHz, float-density, branch-density, smt-contention) before settling on just the first five.

Some of these metrics are correlated and why I figured it wouldn’t add value to use both AMD and Intel. Some such as I/O metrics are very orthogonal and likely make sense in broader context but I don’t have enough study to clearly characterize.

metriccountminmaxmedianmeanstddev
elapsed2473.538.76e+035321.15e+031.53e+03
on_cpu247015.96.346.685.42
inblock24704.46e+0503.65e+033.92e+04
onblock2470.468.97e+051311.57e+047.91e+04
page-fault2472.61.31e+052.1e+031.17e+042.1e+04
context-switch2472.884.77e+04662.49e+037.58e+03
IPC2470.014.591.511.640.888
GHz24705.231.871.821.4
retire-rate2470.776.229.232.116
frontend-stall2470.172.816.321.718
backend-stall2474.197.438.242.921.9
spec-stall247021.323.283.78
retire-ucode24701.200.07690.141
retire-fastpath2470.776.225.527.714.5
float-density2470.01667662.9129151
frontend-latency2470.157.88.413.413.6
frontend-bandwidth247031.74.85.834.8
opcache-miss247065.96.113.415.1
icache-miss2470.16913.416.811.5
backend-cpu2470.7649.412.811.4
backend-memory2470.295.919.924.618.1
amd-l2-miss2470.0567.517.118.313
amd-l2-density2470.0224.29e+0435.12292.72e+03
spec-branch247021.21.72.823.6
spec-pipeline2470200.1130.249
branch-miss247014.81.852.722.99
branch-density2474.931712813061.8
branch-cond2474.531192.398.348.8
branch-ind2470.00328.72.854.445.12
smt-contention247048.49.612.713.3
elapsed238161.69e+046551.66e+032.5e+03
on_cpu238015.86.7775.54
inblock2380.013.86e+0565.86.31e+033.54e+04
onblock2380.376.35e+0519.91.1e+045.18e+04
page-fault2384.071.21e+051.68e+031.06e+042.02e+04
context-switch2382.236.44e+0462.42.49e+038.92e+03
IPC2380.015.531.892.021.01
GHz23803.041.321.20.88
retire-rate2383.787.343.243.415.4
frontend-stall2380.55215.917.811
backend-stall2381.395.326.230.920.2
spec-stall238046.76.18.388.3
retire-ucode238016.72.93.262.26
retire-fastpath2382.483.239.640.214.9
frontend-latency2380.339.28.69.576.73
frontend-bandwidth2380.125.87.38.25.84
backend-cpu2380.676.111.515.512.4
backend-memory2380.389.910.715.416.1
l1-stall238029.64.15.535.66
l2-stall238058.87.17.848.18
l3-stall2380352.33.985.42
dram-stall238086.93.37.4611.7
store-stall238028.30.81.73.11
intel-l2-miss2380.3592.729.130.118.2
intel-l2-density2380.0192.12e+0421.91261.37e+03
spec-branch238046.75.77.958.3
spec-pipeline23806.20.30.4270.652
branch-miss238020.40.831.572.32
branch-density2386.2432012912961
branch-cond2386.2432012912961
branch-ind2380.0278320.822.117.4

Clustering algorithm

After a web search, I ended up with a variation of Lloyd’s Algorithm. This seems relatively straightforward and settles fairly quickly on a set of clusters. As a summary, this goes through the following steps:

  • Start with a set of randomly chosen cluster points. I picked every Nth benchmark as my starting point
  • Now assign each benchmark to the closest cluster point. I used a simple Nth dimensional distance as sqrt(distance1 ^2 + distance2 ^2 + distance3 ^2 … distanceN^2).
  • Now recompute the cluster points based on a center of the points assigned to the cluster
  • Iterate the last two steps until it converges on a set of clusters.

This takes ~8 iterations when I tried it on ~240 phoronix tests along with 23 SPEC CPU 2017 benchmarks.

Now with these clusters, I can both use them to spread out a benchmark analysis across a wide range of different benchmarks and also to think of more similar benchmarks that might substitute for each other. For example cluster #7 from the list above collects a set of build benchmarks while cluster #1 looks like a set of backend bound benchmarks running on all cores.

Posted in experiment | Tagged benchmarks, cluster | Leave a reply

cpu2017

Performance analysis, tools and experiments Posted on June 6, 2024 by mevJune 6, 2024

I have reached the point of diminishing returns for Phoronix tests. Reached 240 workloads analyzed and another ~30+ workloads that are skipped, mostly as GPU centric tests. These 270+ tests fully cover the 56 Phoronix benchmark articles so far this year. It has also become increasingly rare for a new article to reference an uncharacterized Phoronix test. When this happens, I will add to my analysis but I don’t see as much point in going through other, often obsolete workload examples. So I expect this to slowly creep up but not that quickly.

SPEC CPU is an interesting workload both as a workload and as a study of performance counters. There were three general issues that kept me from jumping full-bore into adding SPEC CPU until now:

  • The suite is expensive, ~$1000
  • My point is to characterize it as a workload, not to create hardware measurements. There are both detailed reporting rules and an emphasis on publicizing SPEC numbers to measure/compare hardware. While I would like the code to be somewhat optimized, I am also not trying for the absolute highest scores. So I will refrain from creating specific numbers with my “estimates” and use compiler options that I generally find without searching for optimal
  • SPEC CPU is a good measure of processor, memory and compiler. So for these measurements, I created config files with AMD AOCC compiler suite.

SPEC CPU has both rate benchmarks and speed benchmarks. The rate benchmarks maximize throughput, running multiple copies typically one per logical core. The speed benchmarks minimize latency, sometimes one copy but now also using OpenMP as it makes sense. I have concentrated first on the rate benchmarks. Looking at their profiles, I see some commonality between them and occasional variation with many Phoronix benchmarks.

benchmarkstatuselapsedon_cpuinblockonblockpage-faultcontext-switchIPCGHzretire-ratefrontend-stallbackend-stallspec-stallretire-ucoderetire-fastpathfloat-densityfrontend-latencyfrontend-bandwidthopcache-missicache-missbackend-cpubackend-memoryamd-l2-missamd-l2-densityspec-branchspec-pipelinebranch-missbranch-densitybranch-condbranch-indsmt-contention
500.perlbench_rCPU1272.53115.750.02542.18389.43710.7211.584.1033.319.744.72.30.125.216.8038.86.25.642.53.130.811.6215.7361.70.10.63184.569132.59412.99623.9
502.gcc_rCPU1280.59715.700.004845.6126600.97110.9480.604.3912.725.659.81.90.09.67.94312.66.915.229.73.142.529.3760.2711.40.01.99222.751170.0475.59423.8
503.bwaves_rCPU4632.33215.850.004.72565.44511.0530.134.472.31.496.30.10.02.2260.2371.20.29.76.56.886.549.20132.9880.00.01.3819.17214.9751.3983.0
505.mcf_rCPU706.48815.730.0015.391495.20210.4970.984.0619.926.943.99.30.016.30.68713.18.90.416.74.331.616.3353.4017.60.04.98169.000147.3120.01718.2
507.cactuBSSN_rCPU609.92215.690.00495.301135.26013.4910.234.214.34.890.90.10.04.155.4723.61.147.741.89.178.510.76476.4670.00.00.6448.78733.4033.7343.6
508.namd_rCPU705.98315.810.0045.01239.91811.3851.833.7451.34.840.93.00.031.3395.6222.20.80.218.817.27.71.3563.0101.80.04.4026.58724.2580.02139.1
510.parest_rCPU4550.51715.800.0016.88213.12510.9740.294.415.84.289.70.30.04.8338.5041.91.61.423.36.867.434.7793.1160.20.01.59106.36290.9643.24917.3
511.povray_rCPU1207.79715.600.001449.6863.68310.4021.813.7852.46.638.92.10.331.4244.8322.51.53.720.911.212.40.0663.1871.10.20.25157.689109.66510.90039.5
519.lbm_rCPU1416.59915.530.007.13256.96910.5880.264.424.72.193.20.00.04.551.3541.11.015.36.02.288.924.72172.6470.00.00.08138.203137.7280.0262.3
520.omnetpp_rCPU2024.58315.830.00870.95107.7209.9940.284.585.69.681.93.00.04.916.3754.43.92.328.44.866.744.6272.9902.50.13.02196.963143.76711.78512.8
521.wrf_rCPU1953.56115.790.001454.42198.47912.1690.434.357.87.983.80.50.07.4278.0726.21.23.127.315.463.526.1577.3260.40.00.90113.37977.67713.9455.9
523.xalancbmk_rCPU716.42215.720.008293.00930.34610.4890.774.3219.08.971.10.90.112.234.3413.62.24.020.73.042.915.9675.2290.50.10.34267.659234.5837.44935.4
525.x264_rCPU427.19515.000.0016419.201583.99612.8721.593.6938.021.038.62.50.127.2189.56011.23.928.642.411.416.36.1637.1801.80.02.7666.29648.4903.84528.0
526.blender_rCPU1003.54315.620.00655.151981.57611.5641.083.8824.96.662.36.20.017.9397.6083.71.11.028.313.631.215.5631.2254.50.02.09134.627120.4141.25628.1
527.cam4_rCPU1312.50815.817.50935.893656.41310.9190.844.0716.416.066.90.70.014.5189.5889.34.98.025.919.540.020.7862.6950.60.01.15124.38389.3218.77211.1
531.deepsjeng_rCPU811.22315.780.0037.06718.78210.1811.424.0830.229.434.95.50.023.621.27415.07.917.716.43.723.64.8523.5374.20.13.99123.84097.5200.90721.9
538.imagick_rCPU371.22314.330.001329.381026.20110.7572.193.6356.011.125.27.70.033.8149.8553.82.90.114.48.36.96.1311.9784.60.00.89182.276175.1610.18739.6
541.leela_rCPU1062.97415.760.00144.03156.44610.4671.054.0723.650.012.713.70.018.281.30227.511.22.68.33.06.84.1418.02110.50.112.17141.333118.8330.18722.7
544.nab_rCPU581.06615.530.0023.20301.41911.1741.273.7736.98.752.12.30.125.5318.5305.01.11.415.122.813.34.9452.5781.50.01.3083.28972.1751.77430.8
548.exchange2_rCPU557.81915.770.0023.7671.04710.6701.893.9946.436.914.22.60.032.3126.18212.013.61.617.74.65.30.760.8271.80.01.30165.361157.6891.02230.3
549.fotonik3d_rCPU4829.37115.910.0032.94144.71811.5100.114.542.01.996.10.10.01.9286.3861.30.56.812.82.791.844.60137.6450.00.00.4536.51833.9070.4711.7
554.roms_rCPU2913.60415.820.008.12600.29711.8520.164.452.82.394.90.10.02.8129.6701.80.44.421.77.485.636.36196.4550.00.00.4376.56657.3276.6642.0
557.xz_rCPU1413.26215.760.0011.481363.3769.4840.774.4119.49.564.86.30.012.621.3564.51.71.017.82.239.830.1123.2534.10.04.53115.607104.9401.34035.2
  • The on_cpu values are high. This is very much a test of a CPU-dominated workload. There are not as many delays waiting for I/O, networking, graphics or other parts of the system. So there is an intensity to the mix that isn’t always as present with a more generic set of applications. Correspondingly the “GHz” values as a number of clock cycles per second are also high.
  • Most of the floating point benchmarks are dominated by backend stalls. On my 7840 processor, the memory subsystem more often becomes a limiter.

I have gone through fprate and am in process of working through intrate. While I have run the intspeed and fpspeed benchmarks, those are lower on my list to characterize. This sets me up for two later exercises to follow (a) after zen5 processors are available, I can use the benchmarks to see how the workloads compare on a zen5 vs zen4 core and (b) I am thinking of a “clustering” exercise to look for similarities between both phoronix and SPEC CPU.

Posted in experiment | Tagged benchmarks, compiler, cpu2017 | Leave a reply

graphics-magick sharpen, compiler improvements

Performance analysis, tools and experiments Posted on March 19, 2024 by mevMarch 19, 2024

The following Phoronix Article – https://www.phoronix.com/review/nvidia-gh200-compilers compares GCC 13.2 with Clang 17.0.2 on an ARM platform. On the discussions attached the improvement for graphics-magick sharpen benchmark particularly stand out. So I thought I would see if I could see a … Continue reading →

Posted in experiment | Tagged benchmarks, compiler, phoronix | Leave a reply

200 phoronix tests

Performance analysis, tools and experiments Posted on March 4, 2024 by mevMarch 4, 2024

I passed over 200 Phoronix tests added. There were a little over 10 benchmark articles in February. I seem to have most all the benchmarks when an article comes out and only needed to add one or two for some … Continue reading →

Posted in experiment | Tagged benchmarks, phoronix | Leave a reply

Histograms

Performance analysis, tools and experiments Posted on February 11, 2024 by mevFebruary 11, 2024

I now have the ability to create summary histograms characterizing the workloads. These are (re)-generated as I update performance reports, but following is values with ~170 workloads added. Walking through the histograms and what they describe… Most of the runs … Continue reading →

Posted in experiment, website | Tagged benchmarks, gnuplot, website | Leave a reply

Adding summary statistics for all benchmarks

Performance analysis, tools and experiments Posted on February 10, 2024 by mevFebruary 11, 2024

After adding general parsing of measurement statistics, I can now also create a statistical summary across all ~170 benchmarks as shown below. This lets me see for example the minimum IPC, maximum IPC, mean IPC and standard deviation. This will … Continue reading →

Posted in experiment, website | Tagged benchmarks, metrics | Leave a reply

phoronix – January 2024

Performance analysis, tools and experiments Posted on February 2, 2024 by mevFebruary 2, 2024

Phoronix has published its roundup of benchmark/performance/review articles – https://www.phoronix.com/news/January-2024-Highlights Included were 10 articles with reviews and benchmarks. I’ve been keeping up with CPU workloads listed and now >130 workloads total. I haven’t added GPU/graphics tests because I haven’t developed … Continue reading →

Posted in experiment, website | Tagged benchmarks, phoronix | Leave a reply

50 phoronix workloads…

Performance analysis, tools and experiments Posted on January 14, 2024 by mevJanuary 14, 2024

I am now up to 50 phoronix workloads as summarized on the workloads page. For each one I have a graph and some pages of information. My general idea is to take the benchmark-based Phoronix articles and see if I … Continue reading →

Posted in experiment | Tagged benchmarks, phoronix | Leave a reply

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Archives

  • November 2024
  • October 2024
  • September 2024
  • July 2024
  • June 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • February 2023

Tags

7840HS bad data benchmarks cachyos cluster compiler coremark cpu2017 data fabric getrusage gnuplot i5-13500H icache ipc kernel l3 metrics namd opcache perf performance counters perf_event_open phoronix Ryzen AI 9 HX 370 Ryzen AI 365 scaling stream threshold topdown tree virtualization website wsl Zen5

Recent Posts

  • Virtualization comparisons
  • Updating to a new kernel and graphics driver
  • SPEC CPU2017 Ryzen AI HX 370 vs. Ryzen 7840 HS
  • phoronix – Ryzen AI HX 370 vs Ryzen 7840 HS
  • New Ryzen AI 9 HX 370 machine
©2026 - Performance analysis, tools and experiments - Weaver Xtreme Theme
↑