↓
 

Performance analysis, tools and experiments

An eclectic collection

  • Overview
  • Blog
  • Workloads
    • cpu2017
      • 500.perlbench_r
      • 502.gcc_r
      • 503.bwaves_r
      • 505.mcf_r
      • 507.cactuBSSN_r
      • 508.namd_r
      • 510.parest_r
      • 511.povray_r
      • 519.lbm_r
      • 520.omnetpp_r
      • 521.wrf_r
      • 523.xalancbmk_r
      • 525.x264_r
      • 526.blender_r
      • 527.cam4_r
      • 531.deepsjeng_r
      • 538.imagick_r
      • 541.leela_r
      • 544.nab_r
      • 548.exchange2_r
      • 549.fotonik3d_r
      • 554.roms_r
      • 557.xz_r
    • geekbench
    • lmbench
    • passmark
    • pbbs
    • phoronix
      • ai-benchmark
      • aircrack-ng
      • amg
      • aobench
      • aom-av1
      • apache
      • apache-iotdb
      • appleseed
      • arrayfire
      • askap
      • asmfish
      • astcenc
      • avifenc
      • basis
      • blake2
      • blogbench
      • blender
      • blosc
      • bork
      • botan
      • brl-cad
      • build-apache
      • build-clash
      • build-eigen
      • build-erlang
      • build-ffmpeg
      • build-gcc
      • build-gdb
      • build-gem5
      • build-godot
      • build-imagemagick
      • build-linux-kernel
      • build-llvm
      • build-mesa
      • build-mplayer
      • build-nodejs
      • build-php
      • build-python
      • build-wasmer
      • build2
      • bullet
      • byte
      • cachebench
      • cassandra
      • clickhouse
      • clomp
      • cloverleaf
      • cockroach
      • compilebench
      • compress-7zip
      • compress-gzip
      • compress-lz4
      • compress-pbzip2
      • compress-rar
      • compress-xz
      • compress-zstd
      • core-latency
      • coremark
      • cp2k
      • cpp-perf-bench
      • cpuminer-opt
      • crafty
      • c-ray
      • cryptopp
      • cryptsetup
      • ctx-clock
      • cython-bench
      • dacapobench
      • daphne
      • darktable
      • dav1d
      • dbench
      • deepsparse
      • deepspeech
      • dolfyn
      • draco
      • dragonflydb
      • duckdb
      • easywave
      • ebizzy
      • embree
      • encode-flac
      • encode-mp3
      • encode-opus
      • encode-wavpack
      • espeak
      • etcpak
      • faiss
      • fast-cli
      • ffmpeg
      • ffte
      • fftw
      • fhourstones
      • financebench
      • furmark
      • gcrypt
      • gegl
      • gimp
      • git
      • glibc-bench
      • gmpbench
      • gnupg
      • gnuradio
      • go-benchmark
      • gpaw
      • graph500
      • graphics-magick
      • gromacs
      • hackbench
      • hadoop
      • heffte
      • helsing
      • himeno
      • hmmer
      • hpcg
      • incompact3d
      • indigobench
      • inkscape
      • ipc-benchmark
      • java-jmh
      • java-scimark2
      • john-the-ripper
      • jpegxl
      • jpegxl-decode
      • kvazaar
      • kripke
      • lammps
      • lczero
      • libraw
      • libreoffice
      • libxsmm
      • liquid-dsp
      • llama.cpp
      • llamafile
      • lulesh
      • lzbench
      • mbw
      • memcached
      • minibude
      • minife
      • mnn
      • mpcbench
      • m-queens
      • mrbayes
      • mutex
      • namd
      • mt-dgemm
      • ncnn
      • neat
      • nettle
      • nginx
      • ngspice
      • node-octane
      • node-web-tooling
      • npb
      • n-queens
      • numpy
      • nwchem
      • oidn
      • onednn
      • octave-benchmark
      • onnx
      • opencv
      • openfoam
      • openjpeg
      • openssl
      • openradioss
      • openscad
      • openvino
      • openvkl
      • ospray
      • ospray-studio
      • palabos
      • parboil
      • pennant
      • perl-benchmark
      • pgbench
      • phpbench
      • pjsip
      • polybench-c
      • polyhedron
      • povray
      • primesieve
      • pybench
      • pyhpc
      • pyperformance
      • pytorch
      • quadray
      • qe
      • qmcpack
      • quantlib
      • quicksilver
      • ramspeed
      • rav1e
      • rawtherapee
      • rbenchmark
      • redis
      • renaissance
      • rnnoise
      • rocksdb
      • rodinia
      • rsvg
      • schbench
      • scikit-learn
      • scimark2
      • scylladb
      • securemark
      • selenium
      • simdjson
      • smallpt
      • smhasher
      • spark
      • spark-tpcds
      • speedb
      • specfem3d
      • sqlite
      • srsran
      • stargate
      • stockfish
      • stream
      • stress-ng
      • svt-av1
      • svt-hevc
      • svt-vp9
      • sudokut
      • synthmark
      • sysbench
      • tensorflow
      • tensorflow-lite
      • tesseract
      • tjbench
      • tnn
      • toybrot
      • tscp
      • ttsiod-renderer
      • tungsten
      • uvg266
      • vkpeak
      • vpxenc
      • v-ray
      • vvenc
      • webp
      • webp2
      • whisper.cpp
      • whisperfile
      • wireguard
      • x264
      • x265
      • xmrig
      • xnnpack
      • y-cruncher
      • z3
    • stream
  • Tools
    • Compilers
    • likwid
    • perf
    • trace-cmd and kernelshark
    • wspy
  • Experiments
    • Histograms
    • clustering
    • Adding summary statistics for all benchmarks
  • Home
  • Blog
  • Workloads
    • cpu2017
      • 500.perlbench_r
      • 502.gcc_r
      • 503.bwaves_r
      • 505.mcf_r
      • 507.cactuBSSN_r
      • 508.namd_r
      • 510.parest_r
      • 511.povray_r
      • 519.lbm_r
      • 520.omnetpp_r
      • 521.wrf_r
      • 523.xalancbmk_r
      • 525.x264_r
      • 526.blender_r
      • 527.cam4_r
      • 531.deepsjeng_r
      • 538.imagick_r
      • 541.leela_r
      • 544.nab_r
      • 548.exchange2_r
      • 549.fotonik3d_r
      • 554.roms_r
      • 557.xz_r
    • geekbench
    • lmbench
    • passmark
    • pbbs
    • phoronix
      • ai-benchmark
      • aircrack-ng
      • amg
      • aobench
      • aom-av1
      • apache
      • apache-iotdb
      • appleseed
      • arrayfire
      • askap
      • asmfish
      • astcenc
      • avifenc
      • b
      • basis
      • blake2
      • blender
      • blogbench
      • blosc
      • bork
      • botan
      • brl-cad
      • build-apache
      • build-clash
      • build-eigen
      • build-erlang
      • build-ffmpeg
      • build-gcc
      • build-gdb
      • build-gem5
      • build-godot
      • build-imagemagick
      • build-linux-kernel
      • build-llvm
      • build-mesa
      • build-mplayer
      • build-nodejs
      • build-php
      • build-python
      • build-wasmer
      • build2
      • bullet
      • byte
      • c-ray
      • cachebench
      • cassandra
      • clickhouse
      • clomp
      • cloverleaf
      • cockroach
      • compilebench
      • compress-7zip
      • compress-gzip
      • compress-lz4
      • compress-pbzip2
      • compress-rar
      • compress-xz
      • compress-zstd
      • core-latency
      • coremark
      • cp2k
      • cpp-perf-bench
      • cpuminer-opt
      • crafty
      • cryptopp
      • cryptsetup
      • ctx-clock
      • cython-bench
      • dacapobench
      • daphne
      • darktable
      • dav1d
      • dbench
      • deepsparse
      • deepspeech
      • dolfyn
      • draco
      • dragonflydb
      • duckdb
      • easywave
      • ebizzy
      • embree
      • encode-flac
      • encode-mp3
      • encode-opus
      • encode-wavpack
      • espeak
      • etcpak
      • faiss
      • fast-cli
      • ffmpeg
      • ffte
      • fftw
      • fhourstones
      • financebench
      • furmark
      • gcrypt
      • gegl
      • gimp
      • git
      • glibc-bench
      • gmpbench
      • gnupg
      • gnuradio
      • go-benchmark
      • gpaw
      • graph500
      • graphics-magick
      • gromacs
      • hackbench
      • hadoop
      • heffte
      • helsing
      • himeno
      • hmmer
      • hpcg
      • incompact3d
      • indigobench
      • inkscape
      • ipc-benchmark
      • java-jmh
      • java-scimark2
      • john-the-ripper
      • jpegxl
      • jpegxl-decode
      • kripke
      • kvazaar
      • lammps
      • lczero
      • libraw
      • libreoffice
      • libxsmm
      • liquid-dsp
      • llama.cpp
      • llamafile
      • lulesh
      • lzbench
      • m-queens
      • mbw
      • memcached
      • minibude
      • minife
      • mnn
      • mpcbench
      • mrbayes
      • mt-dgemm
      • mutex
      • n-queens
      • namd
      • ncnn
      • neat
      • nettle
      • nginx
      • ngspice
      • node-octane
      • node-web-tooling
      • npb
      • numpy
      • nwchem
      • octave-benchmark
      • oidn
      • onednn
      • onnx
      • opencv
      • openfoam
      • openjpeg
      • openradioss
      • openscad
      • openssl
      • openvino
      • openvkl
      • ospray
      • ospray-studio
      • palabos
      • parboil
      • pennant
      • perl-benchmark
      • pgbench
      • phpbench
      • pjsip
      • polybench-c
      • polyhedron
      • povray
      • primesieve
      • pybench
      • pyhpc
      • pyperformance
      • pytorch
      • qe
      • qmcpack
      • quadray
      • quantlib
      • quicksilver
      • ramspeed
      • rav1e
      • rawtherapee
      • rays1bench
      • rbenchmark
      • redis
      • renaissance
      • rnnoise
      • rocksdb
      • rodinia
      • rsvg
      • schbench
      • scikit-learn
      • scimark2
      • scylladb
      • securemark
      • selenium
      • simdjson
      • smallpt
      • smhasher
      • spark
      • spark-tpcds
      • specfem3d
      • speedb
      • sqlite
      • srsran
      • stargate
      • stockfish
      • stream
      • stress-ng
      • sudokut
      • svt-av1
      • svt-hevc
      • svt-vp9
      • synthmark
      • sysbench
      • tensorflow
      • tensorflow-lite
      • tesseract
      • tjbench
      • tnn
      • toybrot
      • tscp
      • ttsiod-renderer
      • tungsten
      • uvg266
      • v-ray
      • vkpeak
      • vpxenc
      • vvenc
      • webp
      • webp2
      • whisper.cpp
      • whisperfile
      • wireguard
      • x264
      • x265
      • xmrig
      • xnnpack
      • y-cruncher
      • z3
    • stream
  • Tools
    • Compilers
    • likwid
    • perf
    • trace-cmd and kernelshark
    • wspy
  • Experiments
Home→Tags l3

Tag Archives: l3

Turning on counters for l3 and data fabric measurements on AMD

Performance analysis, tools and experiments Posted on December 29, 2023 by mevDecember 29, 2023

By default, counters were not available to measure l3 and df counters on AMD. With some help from likwid documentation I figured out what is going on and how to get it enabled.

The first thing to do is see if the perf subsystem knows about l3 and df areas. This can be done by doing

prompt% ls /sys/devices/*/format
/sys/devices/amd_iommu_0/format:
csource  devid  devid_mask  domid  domid_mask  pasid  pasid_mask

/sys/devices/cpu/format:
cmask  edge  event  inv  umask

/sys/devices/ibs_fetch/format:
l3missonly  rand_en

/sys/devices/ibs_op/format:
cnt_ctl  l3missonly

/sys/devices/kprobe/format:
retprobe

/sys/devices/msr/format:
event

/sys/devices/power/format:
event

/sys/devices/uprobe/format:
ref_ctr_offset  retprobe

Only devices that are available will show up here. My example is missing, so next one needs to see what is compiled into the running kernel. This can be done by doing:

prompt% /home/mev/source/wspy# grep -i perf_events /boot/config-$(uname -r)
CONFIG_HAVE_PERF_EVENTS=y
CONFIG_GUEST_PERF_EVENTS=y
CONFIG_PERF_EVENTS=y
CONFIG_PERF_EVENTS_INTEL_UNCORE=y
CONFIG_PERF_EVENTS_INTEL_RAPL=m
CONFIG_PERF_EVENTS_INTEL_CSTATE=m
# CONFIG_PERF_EVENTS_AMD_POWER is not set
CONFIG_PERF_EVENTS_AMD_UNCORE=m
CONFIG_PERF_EVENTS_AMD_BRS=y
CONFIG_HAVE_PERF_EVENTS_NMI=y
CONFIG_SECURITY_PERF_EVENTS_RESTRICT=y

The l3 and df counters are uncore counters and can be loaded as a module. So we load this module using the following command

prompt% /home/mev/source/wspy# insmod /lib/modules/$(uname -r)/kernel/arch/x86/events/amd/amd-uncore.ko

This loads the module and the command above shows /sys/devices/amd_l3/format and /sys/devices/amd_df/format. Once this is enabled, the perf list command can give relevant counters. The command and useful parts of the output are included below:

prompt% perf list -v --detail
l3_cache:
  l3_cache_accesses
       [l3_lookup_state.all_coherent_accesses_to_l3]
  l3_misses
       [l3_lookup_state.l3_miss]
  l3_read_miss_latency
       [l3_xi_sampled_latency.all * 10 / l3_xi_sampled_latency_requests.all]

Now using “perf stat” we can try the l3 counters and make sure they work.

prompt% perf stat -e l3_lookup_state.all_coherent_accesses_to_l3,l3_lookup_state.l3_hit /bin/ls
cpumask  format  perf_event_mux_interval_ms  power  subsystem  type  uevent

 Performance counter stats for 'system wide':

            80,264      l3_lookup_state.all_coherent_accesses_to_l3                                      
            70,798      l3_lookup_state.l3_hit                                                

       0.001688959 seconds time elapsed

What remains is figuring out the right “config” flags to make the equivalent call to perf_event_open. We can look these up with strace. This tells me the type field for the struct perf_event_attr is 0xe. This also happens to be shown in /sys/devices/amd_l3/type file. I can figure this out for l3 access but not quite sure which event to use for the data fabric to measure memory.

Success!

Posted in experiment | Tagged data fabric, kernel, l3, perf, performance counters | Leave a reply

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Archives

  • November 2024
  • October 2024
  • September 2024
  • July 2024
  • June 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • February 2023

Tags

7840HS bad data benchmarks cachyos cluster compiler coremark cpu2017 data fabric getrusage gnuplot i5-13500H icache ipc kernel l3 metrics namd opcache perf performance counters perf_event_open phoronix Ryzen AI 9 HX 370 Ryzen AI 365 scaling stream threshold topdown tree virtualization website wsl Zen5

Recent Posts

  • Virtualization comparisons
  • Updating to a new kernel and graphics driver
  • SPEC CPU2017 Ryzen AI HX 370 vs. Ryzen 7840 HS
  • phoronix – Ryzen AI HX 370 vs Ryzen 7840 HS
  • New Ryzen AI 9 HX 370 machine
©2026 - Performance analysis, tools and experiments - Weaver Xtreme Theme
↑