↓
 

Performance analysis, tools and experiments

An eclectic collection

  • Overview
  • Blog
  • Workloads
    • cpu2017
      • 500.perlbench_r
      • 502.gcc_r
      • 503.bwaves_r
      • 505.mcf_r
      • 507.cactuBSSN_r
      • 508.namd_r
      • 510.parest_r
      • 511.povray_r
      • 519.lbm_r
      • 520.omnetpp_r
      • 521.wrf_r
      • 523.xalancbmk_r
      • 525.x264_r
      • 526.blender_r
      • 527.cam4_r
      • 531.deepsjeng_r
      • 538.imagick_r
      • 541.leela_r
      • 544.nab_r
      • 548.exchange2_r
      • 549.fotonik3d_r
      • 554.roms_r
      • 557.xz_r
    • geekbench
    • lmbench
    • passmark
    • pbbs
    • phoronix
      • ai-benchmark
      • aircrack-ng
      • amg
      • aobench
      • aom-av1
      • apache
      • apache-iotdb
      • appleseed
      • arrayfire
      • askap
      • asmfish
      • astcenc
      • avifenc
      • basis
      • blake2
      • blogbench
      • blender
      • blosc
      • bork
      • botan
      • brl-cad
      • build-apache
      • build-clash
      • build-eigen
      • build-erlang
      • build-ffmpeg
      • build-gcc
      • build-gdb
      • build-gem5
      • build-godot
      • build-imagemagick
      • build-linux-kernel
      • build-llvm
      • build-mesa
      • build-mplayer
      • build-nodejs
      • build-php
      • build-python
      • build-wasmer
      • build2
      • bullet
      • byte
      • cachebench
      • cassandra
      • clickhouse
      • clomp
      • cloverleaf
      • cockroach
      • compilebench
      • compress-7zip
      • compress-gzip
      • compress-lz4
      • compress-pbzip2
      • compress-rar
      • compress-xz
      • compress-zstd
      • core-latency
      • coremark
      • cp2k
      • cpp-perf-bench
      • cpuminer-opt
      • crafty
      • c-ray
      • cryptopp
      • cryptsetup
      • ctx-clock
      • cython-bench
      • dacapobench
      • daphne
      • darktable
      • dav1d
      • dbench
      • deepsparse
      • deepspeech
      • dolfyn
      • draco
      • dragonflydb
      • duckdb
      • easywave
      • ebizzy
      • embree
      • encode-flac
      • encode-mp3
      • encode-opus
      • encode-wavpack
      • espeak
      • etcpak
      • faiss
      • fast-cli
      • ffmpeg
      • ffte
      • fftw
      • fhourstones
      • financebench
      • furmark
      • gcrypt
      • gegl
      • gimp
      • git
      • glibc-bench
      • gmpbench
      • gnupg
      • gnuradio
      • go-benchmark
      • gpaw
      • graph500
      • graphics-magick
      • gromacs
      • hackbench
      • hadoop
      • heffte
      • helsing
      • himeno
      • hmmer
      • hpcg
      • incompact3d
      • indigobench
      • inkscape
      • ipc-benchmark
      • java-jmh
      • java-scimark2
      • john-the-ripper
      • jpegxl
      • jpegxl-decode
      • kvazaar
      • kripke
      • lammps
      • lczero
      • libraw
      • libreoffice
      • libxsmm
      • liquid-dsp
      • llama.cpp
      • llamafile
      • lulesh
      • lzbench
      • mbw
      • memcached
      • minibude
      • minife
      • mnn
      • mpcbench
      • m-queens
      • mrbayes
      • mutex
      • namd
      • mt-dgemm
      • ncnn
      • neat
      • nettle
      • nginx
      • ngspice
      • node-octane
      • node-web-tooling
      • npb
      • n-queens
      • numpy
      • nwchem
      • oidn
      • onednn
      • octave-benchmark
      • onnx
      • opencv
      • openfoam
      • openjpeg
      • openssl
      • openradioss
      • openscad
      • openvino
      • openvkl
      • ospray
      • ospray-studio
      • palabos
      • parboil
      • pennant
      • perl-benchmark
      • pgbench
      • phpbench
      • pjsip
      • polybench-c
      • polyhedron
      • povray
      • primesieve
      • pybench
      • pyhpc
      • pyperformance
      • pytorch
      • quadray
      • qe
      • qmcpack
      • quantlib
      • quicksilver
      • ramspeed
      • rav1e
      • rawtherapee
      • rbenchmark
      • redis
      • renaissance
      • rnnoise
      • rocksdb
      • rodinia
      • rsvg
      • schbench
      • scikit-learn
      • scimark2
      • scylladb
      • securemark
      • selenium
      • simdjson
      • smallpt
      • smhasher
      • spark
      • spark-tpcds
      • speedb
      • specfem3d
      • sqlite
      • srsran
      • stargate
      • stockfish
      • stream
      • stress-ng
      • svt-av1
      • svt-hevc
      • svt-vp9
      • sudokut
      • synthmark
      • sysbench
      • tensorflow
      • tensorflow-lite
      • tesseract
      • tjbench
      • tnn
      • toybrot
      • tscp
      • ttsiod-renderer
      • tungsten
      • uvg266
      • vkpeak
      • vpxenc
      • v-ray
      • vvenc
      • webp
      • webp2
      • whisper.cpp
      • whisperfile
      • wireguard
      • x264
      • x265
      • xmrig
      • xnnpack
      • y-cruncher
      • z3
    • stream
  • Tools
    • Compilers
    • likwid
    • perf
    • trace-cmd and kernelshark
    • wspy
  • Experiments
    • Histograms
    • clustering
    • Adding summary statistics for all benchmarks
  • Home
  • Blog
  • Workloads
    • cpu2017
      • 500.perlbench_r
      • 502.gcc_r
      • 503.bwaves_r
      • 505.mcf_r
      • 507.cactuBSSN_r
      • 508.namd_r
      • 510.parest_r
      • 511.povray_r
      • 519.lbm_r
      • 520.omnetpp_r
      • 521.wrf_r
      • 523.xalancbmk_r
      • 525.x264_r
      • 526.blender_r
      • 527.cam4_r
      • 531.deepsjeng_r
      • 538.imagick_r
      • 541.leela_r
      • 544.nab_r
      • 548.exchange2_r
      • 549.fotonik3d_r
      • 554.roms_r
      • 557.xz_r
    • geekbench
    • lmbench
    • passmark
    • pbbs
    • phoronix
      • ai-benchmark
      • aircrack-ng
      • amg
      • aobench
      • aom-av1
      • apache
      • apache-iotdb
      • appleseed
      • arrayfire
      • askap
      • asmfish
      • astcenc
      • avifenc
      • b
      • basis
      • blake2
      • blender
      • blogbench
      • blosc
      • bork
      • botan
      • brl-cad
      • build-apache
      • build-clash
      • build-eigen
      • build-erlang
      • build-ffmpeg
      • build-gcc
      • build-gdb
      • build-gem5
      • build-godot
      • build-imagemagick
      • build-linux-kernel
      • build-llvm
      • build-mesa
      • build-mplayer
      • build-nodejs
      • build-php
      • build-python
      • build-wasmer
      • build2
      • bullet
      • byte
      • c-ray
      • cachebench
      • cassandra
      • clickhouse
      • clomp
      • cloverleaf
      • cockroach
      • compilebench
      • compress-7zip
      • compress-gzip
      • compress-lz4
      • compress-pbzip2
      • compress-rar
      • compress-xz
      • compress-zstd
      • core-latency
      • coremark
      • cp2k
      • cpp-perf-bench
      • cpuminer-opt
      • crafty
      • cryptopp
      • cryptsetup
      • ctx-clock
      • cython-bench
      • dacapobench
      • daphne
      • darktable
      • dav1d
      • dbench
      • deepsparse
      • deepspeech
      • dolfyn
      • draco
      • dragonflydb
      • duckdb
      • easywave
      • ebizzy
      • embree
      • encode-flac
      • encode-mp3
      • encode-opus
      • encode-wavpack
      • espeak
      • etcpak
      • faiss
      • fast-cli
      • ffmpeg
      • ffte
      • fftw
      • fhourstones
      • financebench
      • furmark
      • gcrypt
      • gegl
      • gimp
      • git
      • glibc-bench
      • gmpbench
      • gnupg
      • gnuradio
      • go-benchmark
      • gpaw
      • graph500
      • graphics-magick
      • gromacs
      • hackbench
      • hadoop
      • heffte
      • helsing
      • himeno
      • hmmer
      • hpcg
      • incompact3d
      • indigobench
      • inkscape
      • ipc-benchmark
      • java-jmh
      • java-scimark2
      • john-the-ripper
      • jpegxl
      • jpegxl-decode
      • kripke
      • kvazaar
      • lammps
      • lczero
      • libraw
      • libreoffice
      • libxsmm
      • liquid-dsp
      • llama.cpp
      • llamafile
      • lulesh
      • lzbench
      • m-queens
      • mbw
      • memcached
      • minibude
      • minife
      • mnn
      • mpcbench
      • mrbayes
      • mt-dgemm
      • mutex
      • n-queens
      • namd
      • ncnn
      • neat
      • nettle
      • nginx
      • ngspice
      • node-octane
      • node-web-tooling
      • npb
      • numpy
      • nwchem
      • octave-benchmark
      • oidn
      • onednn
      • onnx
      • opencv
      • openfoam
      • openjpeg
      • openradioss
      • openscad
      • openssl
      • openvino
      • openvkl
      • ospray
      • ospray-studio
      • palabos
      • parboil
      • pennant
      • perl-benchmark
      • pgbench
      • phpbench
      • pjsip
      • polybench-c
      • polyhedron
      • povray
      • primesieve
      • pybench
      • pyhpc
      • pyperformance
      • pytorch
      • qe
      • qmcpack
      • quadray
      • quantlib
      • quicksilver
      • ramspeed
      • rav1e
      • rawtherapee
      • rays1bench
      • rbenchmark
      • redis
      • renaissance
      • rnnoise
      • rocksdb
      • rodinia
      • rsvg
      • schbench
      • scikit-learn
      • scimark2
      • scylladb
      • securemark
      • selenium
      • simdjson
      • smallpt
      • smhasher
      • spark
      • spark-tpcds
      • specfem3d
      • speedb
      • sqlite
      • srsran
      • stargate
      • stockfish
      • stream
      • stress-ng
      • sudokut
      • svt-av1
      • svt-hevc
      • svt-vp9
      • synthmark
      • sysbench
      • tensorflow
      • tensorflow-lite
      • tesseract
      • tjbench
      • tnn
      • toybrot
      • tscp
      • ttsiod-renderer
      • tungsten
      • uvg266
      • v-ray
      • vkpeak
      • vpxenc
      • vvenc
      • webp
      • webp2
      • whisper.cpp
      • whisperfile
      • wireguard
      • x264
      • x265
      • xmrig
      • xnnpack
      • y-cruncher
      • z3
    • stream
  • Tools
    • Compilers
    • likwid
    • perf
    • trace-cmd and kernelshark
    • wspy
  • Experiments
Home→Tags cpu2017

Tag Archives: cpu2017

SPEC CPU2017 Ryzen AI HX 370 vs. Ryzen 7840 HS

Performance analysis, tools and experiments Posted on October 10, 2024 by mevOctober 11, 2024

As a follow up to previous posting looking at Ryzen AI HX 370, I have also done some SPEC CPU2017 experiments. My general idea is to compare the two processors with a few caveats:

  • I have used a configuration file roughly based on AMD Server configuration files and using the AMD AOCC compiler. However, because I am not trying to publish the absolute best results for hardware (and haven’t tuned to do so) – I will report relative comparison results rather than absolute numbers.
  • I expect AMD to release a new version of AMD AOCC for the Zen5 core. I didn’t have it when I did these comparisons and like using the same flags on both systems so these comparisons used the same flags for both Zen4 and Zen5 systems.
  • SPEC CPU2017 guidelines give a requirement of 2 GB of memory per core. My Ryzen 370 system has 24 cores and only 32 GB of memory. So I expect some benchmarks might run out of memory. For this reason and trying to get an overall comparison I’ve thus done two runs:
    • A 16-copy run on both systems. This uses all (hyperthreaded) cores on the Ryzen 7840 HS and a mix of hyperthreading of Zen5 cores + non-hyperthreading of Zen5C cores.
    • A 24-copy run on the Ryzen 370 system.

Relative results are shown in the tables below. This gives me some opportunities to drill a little deeper on why some benchmarks have larger gains than others.

Overall the differences between 16 threads and 24 threads are interesting. Using 24 threads seems to mostly help the intrate benchmarks with the geomean going from +12% to +21% and every benchmark improving vs 7840. Overall, using 24 threads seems to be more mixed with fprate. On average slightly slower than 16-threads. In both cases, the individual benchmarks also differ.

16-thread24-thread
500.perlbench_r1.121.24
502.gcc_r1.171.15
505.mcf_r1.091.21
520.omnetpp_r1.071.16
523.xalancbmk_r1.351.23
525.x264_r1.191.31
531.deepsjeng_r1.111.18
541.leela_r0.941.07
548.exchange_r1.241.38
557.xz_r0.961.16
geomean1.121.21

My intrate comparisons range from -6% to +35% with a geometric mean of +12%

16-thread24-thread
503.bwaves_r1.111.09
507.cactuBSSN_r1.301.25
508.namd_r1.221.34
510.parest_r1.531.10
511.povray_r1.191.30
519.lbm_r1.631.59
521.wrf_r1.321.17
526.blender_r1.241.27
527.cam4_r1.611.45
538.imagick_r1.191.32
544.nab_r1.191.31
549.fotonik_r1.111.09
554.roms_r1.431.15
geomean1.301.26

My fprate comparisons range from +11% to +63% with a geometric mean of +30%

Posted in experiment, hardware | Tagged 7840HS, cpu2017, Ryzen AI 9 HX 370, Zen5 | Leave a reply

cpu2017

Performance analysis, tools and experiments Posted on June 6, 2024 by mevJune 6, 2024

I have reached the point of diminishing returns for Phoronix tests. Reached 240 workloads analyzed and another ~30+ workloads that are skipped, mostly as GPU centric tests. These 270+ tests fully cover the 56 Phoronix benchmark articles so far this year. It has also become increasingly rare for a new article to reference an uncharacterized Phoronix test. When this happens, I will add to my analysis but I don’t see as much point in going through other, often obsolete workload examples. So I expect this to slowly creep up but not that quickly.

SPEC CPU is an interesting workload both as a workload and as a study of performance counters. There were three general issues that kept me from jumping full-bore into adding SPEC CPU until now:

  • The suite is expensive, ~$1000
  • My point is to characterize it as a workload, not to create hardware measurements. There are both detailed reporting rules and an emphasis on publicizing SPEC numbers to measure/compare hardware. While I would like the code to be somewhat optimized, I am also not trying for the absolute highest scores. So I will refrain from creating specific numbers with my “estimates” and use compiler options that I generally find without searching for optimal
  • SPEC CPU is a good measure of processor, memory and compiler. So for these measurements, I created config files with AMD AOCC compiler suite.

SPEC CPU has both rate benchmarks and speed benchmarks. The rate benchmarks maximize throughput, running multiple copies typically one per logical core. The speed benchmarks minimize latency, sometimes one copy but now also using OpenMP as it makes sense. I have concentrated first on the rate benchmarks. Looking at their profiles, I see some commonality between them and occasional variation with many Phoronix benchmarks.

benchmarkstatuselapsedon_cpuinblockonblockpage-faultcontext-switchIPCGHzretire-ratefrontend-stallbackend-stallspec-stallretire-ucoderetire-fastpathfloat-densityfrontend-latencyfrontend-bandwidthopcache-missicache-missbackend-cpubackend-memoryamd-l2-missamd-l2-densityspec-branchspec-pipelinebranch-missbranch-densitybranch-condbranch-indsmt-contention
500.perlbench_rCPU1272.53115.750.02542.18389.43710.7211.584.1033.319.744.72.30.125.216.8038.86.25.642.53.130.811.6215.7361.70.10.63184.569132.59412.99623.9
502.gcc_rCPU1280.59715.700.004845.6126600.97110.9480.604.3912.725.659.81.90.09.67.94312.66.915.229.73.142.529.3760.2711.40.01.99222.751170.0475.59423.8
503.bwaves_rCPU4632.33215.850.004.72565.44511.0530.134.472.31.496.30.10.02.2260.2371.20.29.76.56.886.549.20132.9880.00.01.3819.17214.9751.3983.0
505.mcf_rCPU706.48815.730.0015.391495.20210.4970.984.0619.926.943.99.30.016.30.68713.18.90.416.74.331.616.3353.4017.60.04.98169.000147.3120.01718.2
507.cactuBSSN_rCPU609.92215.690.00495.301135.26013.4910.234.214.34.890.90.10.04.155.4723.61.147.741.89.178.510.76476.4670.00.00.6448.78733.4033.7343.6
508.namd_rCPU705.98315.810.0045.01239.91811.3851.833.7451.34.840.93.00.031.3395.6222.20.80.218.817.27.71.3563.0101.80.04.4026.58724.2580.02139.1
510.parest_rCPU4550.51715.800.0016.88213.12510.9740.294.415.84.289.70.30.04.8338.5041.91.61.423.36.867.434.7793.1160.20.01.59106.36290.9643.24917.3
511.povray_rCPU1207.79715.600.001449.6863.68310.4021.813.7852.46.638.92.10.331.4244.8322.51.53.720.911.212.40.0663.1871.10.20.25157.689109.66510.90039.5
519.lbm_rCPU1416.59915.530.007.13256.96910.5880.264.424.72.193.20.00.04.551.3541.11.015.36.02.288.924.72172.6470.00.00.08138.203137.7280.0262.3
520.omnetpp_rCPU2024.58315.830.00870.95107.7209.9940.284.585.69.681.93.00.04.916.3754.43.92.328.44.866.744.6272.9902.50.13.02196.963143.76711.78512.8
521.wrf_rCPU1953.56115.790.001454.42198.47912.1690.434.357.87.983.80.50.07.4278.0726.21.23.127.315.463.526.1577.3260.40.00.90113.37977.67713.9455.9
523.xalancbmk_rCPU716.42215.720.008293.00930.34610.4890.774.3219.08.971.10.90.112.234.3413.62.24.020.73.042.915.9675.2290.50.10.34267.659234.5837.44935.4
525.x264_rCPU427.19515.000.0016419.201583.99612.8721.593.6938.021.038.62.50.127.2189.56011.23.928.642.411.416.36.1637.1801.80.02.7666.29648.4903.84528.0
526.blender_rCPU1003.54315.620.00655.151981.57611.5641.083.8824.96.662.36.20.017.9397.6083.71.11.028.313.631.215.5631.2254.50.02.09134.627120.4141.25628.1
527.cam4_rCPU1312.50815.817.50935.893656.41310.9190.844.0716.416.066.90.70.014.5189.5889.34.98.025.919.540.020.7862.6950.60.01.15124.38389.3218.77211.1
531.deepsjeng_rCPU811.22315.780.0037.06718.78210.1811.424.0830.229.434.95.50.023.621.27415.07.917.716.43.723.64.8523.5374.20.13.99123.84097.5200.90721.9
538.imagick_rCPU371.22314.330.001329.381026.20110.7572.193.6356.011.125.27.70.033.8149.8553.82.90.114.48.36.96.1311.9784.60.00.89182.276175.1610.18739.6
541.leela_rCPU1062.97415.760.00144.03156.44610.4671.054.0723.650.012.713.70.018.281.30227.511.22.68.33.06.84.1418.02110.50.112.17141.333118.8330.18722.7
544.nab_rCPU581.06615.530.0023.20301.41911.1741.273.7736.98.752.12.30.125.5318.5305.01.11.415.122.813.34.9452.5781.50.01.3083.28972.1751.77430.8
548.exchange2_rCPU557.81915.770.0023.7671.04710.6701.893.9946.436.914.22.60.032.3126.18212.013.61.617.74.65.30.760.8271.80.01.30165.361157.6891.02230.3
549.fotonik3d_rCPU4829.37115.910.0032.94144.71811.5100.114.542.01.996.10.10.01.9286.3861.30.56.812.82.791.844.60137.6450.00.00.4536.51833.9070.4711.7
554.roms_rCPU2913.60415.820.008.12600.29711.8520.164.452.82.394.90.10.02.8129.6701.80.44.421.77.485.636.36196.4550.00.00.4376.56657.3276.6642.0
557.xz_rCPU1413.26215.760.0011.481363.3769.4840.774.4119.49.564.86.30.012.621.3564.51.71.017.82.239.830.1123.2534.10.04.53115.607104.9401.34035.2
  • The on_cpu values are high. This is very much a test of a CPU-dominated workload. There are not as many delays waiting for I/O, networking, graphics or other parts of the system. So there is an intensity to the mix that isn’t always as present with a more generic set of applications. Correspondingly the “GHz” values as a number of clock cycles per second are also high.
  • Most of the floating point benchmarks are dominated by backend stalls. On my 7840 processor, the memory subsystem more often becomes a limiter.

I have gone through fprate and am in process of working through intrate. While I have run the intspeed and fpspeed benchmarks, those are lower on my list to characterize. This sets me up for two later exercises to follow (a) after zen5 processors are available, I can use the benchmarks to see how the workloads compare on a zen5 vs zen4 core and (b) I am thinking of a “clustering” exercise to look for similarities between both phoronix and SPEC CPU.

Posted in experiment | Tagged benchmarks, compiler, cpu2017 | Leave a reply

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Archives

  • November 2024
  • October 2024
  • September 2024
  • July 2024
  • June 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • February 2023

Tags

7840HS bad data benchmarks cachyos cluster compiler coremark cpu2017 data fabric getrusage gnuplot i5-13500H icache ipc kernel l3 metrics namd opcache perf performance counters perf_event_open phoronix Ryzen AI 9 HX 370 Ryzen AI 365 scaling stream threshold topdown tree virtualization website wsl Zen5

Recent Posts

  • Virtualization comparisons
  • Updating to a new kernel and graphics driver
  • SPEC CPU2017 Ryzen AI HX 370 vs. Ryzen 7840 HS
  • phoronix – Ryzen AI HX 370 vs Ryzen 7840 HS
  • New Ryzen AI 9 HX 370 machine
©2026 - Performance analysis, tools and experiments - Weaver Xtreme Theme
↑