↓
 

Performance analysis, tools and experiments

An eclectic collection

  • Overview
  • Blog
  • Workloads
    • cpu2017
      • 500.perlbench_r
      • 502.gcc_r
      • 503.bwaves_r
      • 505.mcf_r
      • 507.cactuBSSN_r
      • 508.namd_r
      • 510.parest_r
      • 511.povray_r
      • 519.lbm_r
      • 520.omnetpp_r
      • 521.wrf_r
      • 523.xalancbmk_r
      • 525.x264_r
      • 526.blender_r
      • 527.cam4_r
      • 531.deepsjeng_r
      • 538.imagick_r
      • 541.leela_r
      • 544.nab_r
      • 548.exchange2_r
      • 549.fotonik3d_r
      • 554.roms_r
      • 557.xz_r
    • geekbench
    • lmbench
    • passmark
    • pbbs
    • phoronix
      • ai-benchmark
      • aircrack-ng
      • amg
      • aobench
      • aom-av1
      • apache
      • apache-iotdb
      • appleseed
      • arrayfire
      • askap
      • asmfish
      • astcenc
      • avifenc
      • basis
      • blake2
      • blogbench
      • blender
      • blosc
      • bork
      • botan
      • brl-cad
      • build-apache
      • build-clash
      • build-eigen
      • build-erlang
      • build-ffmpeg
      • build-gcc
      • build-gdb
      • build-gem5
      • build-godot
      • build-imagemagick
      • build-linux-kernel
      • build-llvm
      • build-mesa
      • build-mplayer
      • build-nodejs
      • build-php
      • build-python
      • build-wasmer
      • build2
      • bullet
      • byte
      • cachebench
      • cassandra
      • clickhouse
      • clomp
      • cloverleaf
      • cockroach
      • compilebench
      • compress-7zip
      • compress-gzip
      • compress-lz4
      • compress-pbzip2
      • compress-rar
      • compress-xz
      • compress-zstd
      • core-latency
      • coremark
      • cp2k
      • cpp-perf-bench
      • cpuminer-opt
      • crafty
      • c-ray
      • cryptopp
      • cryptsetup
      • ctx-clock
      • cython-bench
      • dacapobench
      • daphne
      • darktable
      • dav1d
      • dbench
      • deepsparse
      • deepspeech
      • dolfyn
      • draco
      • dragonflydb
      • duckdb
      • easywave
      • ebizzy
      • embree
      • encode-flac
      • encode-mp3
      • encode-opus
      • encode-wavpack
      • espeak
      • etcpak
      • faiss
      • fast-cli
      • ffmpeg
      • ffte
      • fftw
      • fhourstones
      • financebench
      • furmark
      • gcrypt
      • gegl
      • gimp
      • git
      • glibc-bench
      • gmpbench
      • gnupg
      • gnuradio
      • go-benchmark
      • gpaw
      • graph500
      • graphics-magick
      • gromacs
      • hackbench
      • hadoop
      • heffte
      • helsing
      • himeno
      • hmmer
      • hpcg
      • incompact3d
      • indigobench
      • inkscape
      • ipc-benchmark
      • java-jmh
      • java-scimark2
      • john-the-ripper
      • jpegxl
      • jpegxl-decode
      • kvazaar
      • kripke
      • lammps
      • lczero
      • libraw
      • libreoffice
      • libxsmm
      • liquid-dsp
      • llama.cpp
      • llamafile
      • lulesh
      • lzbench
      • mbw
      • memcached
      • minibude
      • minife
      • mnn
      • mpcbench
      • m-queens
      • mrbayes
      • mutex
      • namd
      • mt-dgemm
      • ncnn
      • neat
      • nettle
      • nginx
      • ngspice
      • node-octane
      • node-web-tooling
      • npb
      • n-queens
      • numpy
      • nwchem
      • oidn
      • onednn
      • octave-benchmark
      • onnx
      • opencv
      • openfoam
      • openjpeg
      • openssl
      • openradioss
      • openscad
      • openvino
      • openvkl
      • ospray
      • ospray-studio
      • palabos
      • parboil
      • pennant
      • perl-benchmark
      • pgbench
      • phpbench
      • pjsip
      • polybench-c
      • polyhedron
      • povray
      • primesieve
      • pybench
      • pyhpc
      • pyperformance
      • pytorch
      • quadray
      • qe
      • qmcpack
      • quantlib
      • quicksilver
      • ramspeed
      • rav1e
      • rawtherapee
      • rbenchmark
      • redis
      • renaissance
      • rnnoise
      • rocksdb
      • rodinia
      • rsvg
      • schbench
      • scikit-learn
      • scimark2
      • scylladb
      • securemark
      • selenium
      • simdjson
      • smallpt
      • smhasher
      • spark
      • spark-tpcds
      • speedb
      • specfem3d
      • sqlite
      • srsran
      • stargate
      • stockfish
      • stream
      • stress-ng
      • svt-av1
      • svt-hevc
      • svt-vp9
      • sudokut
      • synthmark
      • sysbench
      • tensorflow
      • tensorflow-lite
      • tesseract
      • tjbench
      • tnn
      • toybrot
      • tscp
      • ttsiod-renderer
      • tungsten
      • uvg266
      • vkpeak
      • vpxenc
      • v-ray
      • vvenc
      • webp
      • webp2
      • whisper.cpp
      • whisperfile
      • wireguard
      • x264
      • x265
      • xmrig
      • xnnpack
      • y-cruncher
      • z3
    • stream
  • Tools
    • Compilers
    • likwid
    • perf
    • trace-cmd and kernelshark
    • wspy
  • Experiments
    • Histograms
    • clustering
    • Adding summary statistics for all benchmarks
  • Home
  • Blog
  • Workloads
    • cpu2017
      • 500.perlbench_r
      • 502.gcc_r
      • 503.bwaves_r
      • 505.mcf_r
      • 507.cactuBSSN_r
      • 508.namd_r
      • 510.parest_r
      • 511.povray_r
      • 519.lbm_r
      • 520.omnetpp_r
      • 521.wrf_r
      • 523.xalancbmk_r
      • 525.x264_r
      • 526.blender_r
      • 527.cam4_r
      • 531.deepsjeng_r
      • 538.imagick_r
      • 541.leela_r
      • 544.nab_r
      • 548.exchange2_r
      • 549.fotonik3d_r
      • 554.roms_r
      • 557.xz_r
    • geekbench
    • lmbench
    • passmark
    • pbbs
    • phoronix
      • ai-benchmark
      • aircrack-ng
      • amg
      • aobench
      • aom-av1
      • apache
      • apache-iotdb
      • appleseed
      • arrayfire
      • askap
      • asmfish
      • astcenc
      • avifenc
      • b
      • basis
      • blake2
      • blender
      • blogbench
      • blosc
      • bork
      • botan
      • brl-cad
      • build-apache
      • build-clash
      • build-eigen
      • build-erlang
      • build-ffmpeg
      • build-gcc
      • build-gdb
      • build-gem5
      • build-godot
      • build-imagemagick
      • build-linux-kernel
      • build-llvm
      • build-mesa
      • build-mplayer
      • build-nodejs
      • build-php
      • build-python
      • build-wasmer
      • build2
      • bullet
      • byte
      • c-ray
      • cachebench
      • cassandra
      • clickhouse
      • clomp
      • cloverleaf
      • cockroach
      • compilebench
      • compress-7zip
      • compress-gzip
      • compress-lz4
      • compress-pbzip2
      • compress-rar
      • compress-xz
      • compress-zstd
      • core-latency
      • coremark
      • cp2k
      • cpp-perf-bench
      • cpuminer-opt
      • crafty
      • cryptopp
      • cryptsetup
      • ctx-clock
      • cython-bench
      • dacapobench
      • daphne
      • darktable
      • dav1d
      • dbench
      • deepsparse
      • deepspeech
      • dolfyn
      • draco
      • dragonflydb
      • duckdb
      • easywave
      • ebizzy
      • embree
      • encode-flac
      • encode-mp3
      • encode-opus
      • encode-wavpack
      • espeak
      • etcpak
      • faiss
      • fast-cli
      • ffmpeg
      • ffte
      • fftw
      • fhourstones
      • financebench
      • furmark
      • gcrypt
      • gegl
      • gimp
      • git
      • glibc-bench
      • gmpbench
      • gnupg
      • gnuradio
      • go-benchmark
      • gpaw
      • graph500
      • graphics-magick
      • gromacs
      • hackbench
      • hadoop
      • heffte
      • helsing
      • himeno
      • hmmer
      • hpcg
      • incompact3d
      • indigobench
      • inkscape
      • ipc-benchmark
      • java-jmh
      • java-scimark2
      • john-the-ripper
      • jpegxl
      • jpegxl-decode
      • kripke
      • kvazaar
      • lammps
      • lczero
      • libraw
      • libreoffice
      • libxsmm
      • liquid-dsp
      • llama.cpp
      • llamafile
      • lulesh
      • lzbench
      • m-queens
      • mbw
      • memcached
      • minibude
      • minife
      • mnn
      • mpcbench
      • mrbayes
      • mt-dgemm
      • mutex
      • n-queens
      • namd
      • ncnn
      • neat
      • nettle
      • nginx
      • ngspice
      • node-octane
      • node-web-tooling
      • npb
      • numpy
      • nwchem
      • octave-benchmark
      • oidn
      • onednn
      • onnx
      • opencv
      • openfoam
      • openjpeg
      • openradioss
      • openscad
      • openssl
      • openvino
      • openvkl
      • ospray
      • ospray-studio
      • palabos
      • parboil
      • pennant
      • perl-benchmark
      • pgbench
      • phpbench
      • pjsip
      • polybench-c
      • polyhedron
      • povray
      • primesieve
      • pybench
      • pyhpc
      • pyperformance
      • pytorch
      • qe
      • qmcpack
      • quadray
      • quantlib
      • quicksilver
      • ramspeed
      • rav1e
      • rawtherapee
      • rays1bench
      • rbenchmark
      • redis
      • renaissance
      • rnnoise
      • rocksdb
      • rodinia
      • rsvg
      • schbench
      • scikit-learn
      • scimark2
      • scylladb
      • securemark
      • selenium
      • simdjson
      • smallpt
      • smhasher
      • spark
      • spark-tpcds
      • specfem3d
      • speedb
      • sqlite
      • srsran
      • stargate
      • stockfish
      • stream
      • stress-ng
      • sudokut
      • svt-av1
      • svt-hevc
      • svt-vp9
      • synthmark
      • sysbench
      • tensorflow
      • tensorflow-lite
      • tesseract
      • tjbench
      • tnn
      • toybrot
      • tscp
      • ttsiod-renderer
      • tungsten
      • uvg266
      • v-ray
      • vkpeak
      • vpxenc
      • vvenc
      • webp
      • webp2
      • whisper.cpp
      • whisperfile
      • wireguard
      • x264
      • x265
      • xmrig
      • xnnpack
      • y-cruncher
      • z3
    • stream
  • Tools
    • Compilers
    • likwid
    • perf
    • trace-cmd and kernelshark
    • wspy
  • Experiments
Home→Tags wsl

Tag Archives: wsl

wsl and performance counters?

Performance analysis, tools and experiments Posted on July 30, 2024 by mevJuly 30, 2024

I have seen some references that it might be possible to have performance counters in WSL, like this page.

If I type

perf stat ls

Then WSL tells me

Command 'perf' not found, but can be installed with:
sudo apt install linux-tools-common        # version 6.8.0-38.38, or
sudo apt install linux-laptop-tools-common # version 6.5.0-1004.7

This seems both encouraging and discouraging. Encouraging that it references a standard version of an ubuntu package. Discouraging because the kernel versions listed don’t match my WSL 5.15.131 kernel. The second is not available but the first does install. However, now WSL tells me

WARNING: perf not found for kernel 5.15.153-1-microsoft

You may need to install the following packages for this specific kernel:
   linux-tools-5.15.153.1-microsoft-standard-WSL2
   linux-cloud-tools-5.15.153-1-microsoft-standard-WSL2

You may also want to install one of the following packages to keep up to date:
   linux-tools-standard-WSL2
   linux-cloud-tools-standard-WSL2

None of these packages exist. What I am able to do is install the following package

apt install linux-tools-generic

This gets me the following path

/usr/lib/linux-tools-6.8-39/perf

With these tools, I am able to get some basic counters.

Performance counter stats for 'ls':

              0.66 msec task-clock:u                     #    0.397 CPUs utilized
                 0      context-switches:u               #    0.000 /sec
                 0      cpu-migrations:u                 #    0.000 /sec
                97      page-faults:u                    #  147.148 K/sec
           1113650      cycles:u                         #    1.689 GHz
             44775      stalled-cycles-frontend:u        #    4.02% frontend cycles idle
             85883      stalled-cycles-backend:u         #    7.71% backend cycles idle
            536486      instructions:u                   #    0.48  insn per cycle
                                                  #    0.16  stalled cycles per insn
            109474      branches:u                       #  166.071 M/sec
              6643      branch-misses:u                  #    6.07% of all branches

       0.001661844 seconds time elapsed

       0.000214000 seconds user
       0.000000000 seconds sys

In particular, the output of perf list gives me the following generic events

  branch-instructions OR branches                    [Hardware event]
  branch-misses                                      [Hardware event]
  cache-misses                                       [Hardware event]
  cache-references                                   [Hardware event]
  cpu-cycles OR cycles                               [Hardware event]
  instructions                                       [Hardware event]
  stalled-cycles-backend OR idle-cycles-backend      [Hardware event]
  stalled-cycles-frontend OR idle-cycles-frontend    [Hardware event]
  alignment-faults                                   [Software event]
  bpf-output                                         [Software event]
  cgroup-switches                                    [Software event]
  context-switches OR cs                             [Software event]
  cpu-clock                                          [Software event]
  cpu-migrations OR migrations                       [Software event]
  dummy                                              [Software event]
  emulation-faults                                   [Software event]
  major-faults                                       [Software event]
  minor-faults                                       [Software event]
  page-faults OR faults                              [Software event]
  task-clock                                         [Software event]
  duration_time                                      [Tool event]
  user_time                                          [Tool event]
  system_time                                        [Tool event]

cpu:
  L1-dcache-loads OR cpu/L1-dcache-loads/
  L1-dcache-load-misses OR cpu/L1-dcache-load-misses/
  L1-dcache-prefetches OR cpu/L1-dcache-prefetches/
  L1-icache-loads OR cpu/L1-icache-loads/
  L1-icache-load-misses OR cpu/L1-icache-load-misses/
  dTLB-loads OR cpu/dTLB-loads/
  dTLB-load-misses OR cpu/dTLB-load-misses/
  iTLB-loads OR cpu/iTLB-loads/
  iTLB-load-misses OR cpu/iTLB-load-misses/
  branch-loads OR cpu/branch-loads/
  branch-load-misses OR cpu/branch-load-misses/
  branch-instructions OR cpu/branch-instructions/    [Kernel PMU event]
  branch-misses OR cpu/branch-misses/                [Kernel PMU event]
  cache-misses OR cpu/cache-misses/                  [Kernel PMU event]
  cache-references OR cpu/cache-references/          [Kernel PMU event]
  cpu-cycles OR cpu/cpu-cycles/                      [Kernel PMU event]
  instructions OR cpu/instructions/                  [Kernel PMU event]
  stalled-cycles-backend OR cpu/stalled-cycles-backend/[Kernel PMU event]
  stalled-cycles-frontend OR cpu/stalled-cycles-frontend/[Kernel PMU event]
  msr/tsc/                                           [Kernel PMU event]
  rNNN                                               [Raw hardware event descriptor]
  cpu/t1=v1[,t2=v2,t3 ...]/modifier                  [Raw hardware event descriptor]
       [(see 'man perf-list' on how to encode it)]
  mem:<addr>[/len][:access]                          [Hardware breakpoint]

This is encouraging since it at least shows the more generic hardware events like cycles, instructions and branches. What is missing from this list are counters specific to my Zen5 core such as the topdown performance counters used by wspy to look at microarchitecture differences.

There is one possible way I might get closer to having these counters. This would be to update my WSL kernel/distribution with the following

wsl --update --pre-release

I believe this updates WSL with the following repository: https://github.com/microsoft/WSL/releases At present this is the 2.3.13 release which has a 6.6.36.3 kernel. Unfortunately, according to this phoronix article Zen5 performance events were posted in March 2024. At that time work was underway for a Linux 6.9 kernel. Ubuntu 24.04 shipped with a Linux 6.8 kernel so it is unclear to me if stock Ubuntu 24.04 will support Zen5 topdown counters and the WSL pre-release is even older than that. So at this point, I think I want to try Ubuntu 24.04 first to see what Zen5 counters are available before updating my WSL to a release that might not be new enough.

A step at a time, but this might be sufficient to get some basic Zen5 IPC comparisons with Zen4 even if not the more complete topdown performance counters.

Posted in experiment | Tagged performance counters, wsl, Zen5 | Leave a reply

Ryzen AI, Zen5, article and laptop

Performance analysis, tools and experiments Posted on July 28, 2024 by mevJuly 28, 2024

Zen5 mobile processors have been released.

I had ordered an ASUS Zenbook S16 laptop with Ryzen 9 AI 365 processor and it arrived today. Full tech specifications are at the link but include:

  • Ryzen AI 365 processor with 10 cores: four Zen 5 and six Zen 5c, 20 threads, base clock=2.0 GHz, boost clock = 5 GHz, 10MB of L2 and 24 MB of L3.
  • Navi 3.5 integrated graphics
  • 24 GB of DDR5 memory
  • 1 TB NVMe

So far I have only run Windows 11 Home and not tried to install Linux. Part of the reason why is I also want to try the “AI PC” features and hence am cautious since changing to Linux would be mostly a one-way proposition and the 1 TB disk is not particularly large for a dual boot.

I did try three “lighter” variations to run some Linux workloads:

  • Oracle VirtualBox can install Ubuntu 24.04 but I seem to have strange hangs with what I tried.
  • VMWare Player downloads seem to be down pointing to a Cloudflare page. I can try this later.
  • WSL does install with Ubuntu 24.04. I was able to install phoronix test suite and got 413542 a score.

I haven’t yet done a lot of other testing. Some of this might involve running a more complete Linux install but I’ll wait a little to see my options (e.g. a mini-PC, this laptop, etc).

At the same time my laptop arrived they also appear to have arrived for Phoronix who did an article with Linux-based testing of an ASUS laptop. Their tests used the Ryzen AI 9 370 processor so 12 cores instead of 10 cores. The article is here.

A few interesting things I will note from the Phoronix tests and my own testing

  • Coremark scores
    • Ryzen AI 365 (my laptop with WSL) – 413542
    • Ryzen 7840 HS (my mini-PC) – 464076
    • Ryzen AI 370 (Phoronix laptop) – 426538
    • Ryzen 7840 HS (Phoronix laptop) – 443276
  • Overall, the Phoronix article picks a set of benchmarks where their Ryzen AI 370 is ~10% faster than their Ryzen 7840HS

By itself, my interest is more in exploring unique features of the Zen5 vs Zen5c cores and in seeing how the topdown core performance varies for different subsystems like branch prediction or execution units. In addition, understanding how the extra two (365) or four (370) cores contribute to workloads. So, I won’t necessarily run that many other workloads since they appear at least as dependent on other factors (e.g. power, memory speed) and are more a system-level than core-level comparison. Nevertheless still interesting to see this data start to come out.

Also note there is an AnandTech Review as well. Full review at the link, but some things that caught my attention

  • The Zenbook S16 is configured to run the Ryzen 9 AI HX 370 at just 17W, so this explains in part differences with the mini-PC running at higher power. “ASUS has taken what’s nominally a 28W chip and dialed it down to 17W for it’s out-of-the-box experience”. There are other modes that consume more power with higher TDP. Testing was done at 28W.
  • The mobile chips have a 256-bit SIMD so expect AVX-512 codes to run faster on desktop than equivalent mobile processors
  • Core-to-Core latencies are printed, looks like a useful tool…also rather intriguing differences they show with Ryzen 9 7940HS.
  • They run specrate with WSL but run with 1 core. This looks potentially interesting way to probe different core types.

Posted in hardware | Tagged phoronix, Ryzen AI 365, wsl, Zen5 | Leave a reply

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Archives

  • November 2024
  • October 2024
  • September 2024
  • July 2024
  • June 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • February 2023

Tags

7840HS bad data benchmarks cachyos cluster compiler coremark cpu2017 data fabric getrusage gnuplot i5-13500H icache ipc kernel l3 metrics namd opcache perf performance counters perf_event_open phoronix Ryzen AI 9 HX 370 Ryzen AI 365 scaling stream threshold topdown tree virtualization website wsl Zen5

Recent Posts

  • Virtualization comparisons
  • Updating to a new kernel and graphics driver
  • SPEC CPU2017 Ryzen AI HX 370 vs. Ryzen 7840 HS
  • phoronix – Ryzen AI HX 370 vs Ryzen 7840 HS
  • New Ryzen AI 9 HX 370 machine
©2026 - Performance analysis, tools and experiments - Weaver Xtreme Theme
↑