bwaves is a SPEC CPU(R) benchmark described here. This C++ workload runs consistently on all logical cores.

Topdown profile shows a high retirement rate with some backend stalls.

AMD metrics show backend stalls are more cpu stalls than memory stalls. While there are ~60 L2 access per 1000 instructions, the L2 miss rate is low. The opcache has a very low miss rate.

elapsed              705.983
on_cpu               0.988          # 15.81 / 16 cores
utime                11155.218
stime                8.698
nvcsw                16557          # 12.97%
nivcsw               111119         # 87.03%
inblock              0              # 0.00/sec
onblock              31776          # 45.01/sec
cpu-clock            11164522824524 # 11164.523 seconds
task-clock           11164605623683 # 11164.606 seconds
page faults          2678594        # 239.918/sec
context switches     127114         # 11.385/sec
cpu migrations       164            # 0.015/sec
major page faults    1011           # 0.091/sec
minor page faults    2677583        # 239.828/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             2064334682529  # 26.587 branches per 1000 inst
branch misses        90764635496    # 4.40% branch miss
conditional          1883524841799  # 24.258 conditional branches per 1000 inst
indirect             1615455461     # 0.021 indirect branches per 1000 inst
cpu-cycles           42377952252120 # 3.74 GHz
instructions         77659634049358 # 1.83 IPC
slots                84735322972980 #
retiring             26508259685731 # 31.3% (51.3%)
-- ucode             499617852      #     0.0%
-- fastpath          26507760067879 #    31.3%
frontend             2468780097885  #  2.9% ( 4.8%) low
-- latency           1830776219160  #     2.2%
-- bandwidth         638003878725   #     0.8%
backend              21111048379065 # 24.9% (40.9%)
-- cpu               14580511723887 #    17.2%
-- memory            6530536655178  #     7.7%
speculation          1549961155223  #  1.8% ( 3.0%)
-- branch mispredict 1535782887883  #     1.8%
-- pipeline restart  14178267340    #     0.0%
smt-contention       33097226855820 # 39.1% ( 0.0%)
cpu-cycles           42363963062501 # 3.74 GHz
instructions         77682975707638 # 1.83 IPC
instructions         25886326183295 # 63.010 l2 access per 1000 inst
l2 hit from l1       1142360918883  # 1.35% l2 miss
l2 miss from l1      4871222345     #
l2 hit from l2 pf    471669934470   #
l3 hit from l2 pf    3651612350     #
l3 miss from l2 pf   13423911862    #
instructions         25865520397898 # 395.622 float per 1000 inst
float 512            137            # 0.000 AVX-512 per 1000 inst
float 256            16868          # 0.000 AVX-256 per 1000 inst
float 128            10232976318229 # 395.622 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         0              # 0.000 scalar per 1000 inst
instructions         77643808589000 #
opcache              7329400605488  # 94.398 opcache per 1000 inst
opcache miss         17462126951    #  0.2% opcache miss rate
l1 dTLB miss         10400596229    # 0.134 L1 dTLB per 1000 inst
l2 dTLB miss         757260290      # 0.010 L2 dTLB per 1000 inst
instructions         77643844021012 #
icache               28943916756    # 0.373 icache per 1000 inst
icache miss          5435095843     # 18.8% icache miss rate
l1 iTLB miss         238400540      # 0.003 L1 iTLB per 1000 inst
l2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst
tlb flush            78639          # 0.000 TLB flush per 1000 inst

The process overviews shows almost all time spent in namd_r_base.mev

579 processes
	 48 namd_r_base.mev      11114.35     4.70
	 69 specperl                12.33     1.47
	  1 clang++                  0.01     0.00
	  1 lsb_release              0.01     0.00
	 10 ps                       0.00     0.01
	172 sh                       0.00     0.00
	 54 specrxp                  0.00     0.00
	 48 bash                     0.00     0.00
	 41 specinvoke               0.00     0.00
	 21 grep                     0.00     0.00
	 20 cat                      0.00     0.00
	 12 uniq                     0.00     0.00
	 11 sort                     0.00     0.00
	 10 expand                   0.00     0.00
	  6 pwd                      0.00     0.00
	  5 basename                 0.00     0.00
	  5 specmake                 0.00     0.00
	  5 systemctl                0.00     0.00
	  4 specpp                   0.00     0.00
	  4 uname                    0.00     0.00
	  3 dirname                  0.00     0.00
	  3 dmidecode                0.00     0.00
	  3 lscpu                    0.00     0.00
	  2 df                       0.00     0.00
	  2 dpkg                     0.00     0.00
	  2 rm                       0.00     0.00
	  2 runcpu                   0.00     0.00
	  2 specsha512sum            0.00     0.00
	  2 specxz                   0.00     0.00
	  2 who                      0.00     0.00
	  1 cpupower                 0.00     0.00
	  1 head                     0.00     0.00
	  1 logname                  0.00     0.00
	  1 ls                       0.00     0.00
	  1 numactl                  0.00     0.00
	  1 sysctl                   0.00     0.00
	  1 w                        0.00     0.00
	  1 wc                       0.00     0.00
	  1 which                    0.00     0.00
0 processes running
53 maximum processes

Specinvoke fires up separate processes for each core.

    377379) specinvoke       cpu=14 start=3.25  finish=235.52
      377381) sh               cpu=4 start=3.25  finish=235.10
        377387) bash             cpu=0 start=3.25  finish=235.10
          377412) namd_r_base.mev  cpu=0 start=3.26  finish=235.09
      377382) sh               cpu=1 start=3.25  finish=234.94
        377392) bash             cpu=1 start=3.25  finish=234.94
          377416) namd_r_base.mev  cpu=1 start=3.26  finish=234.92
      377383) sh               cpu=10 start=3.25  finish=234.73
        377389) bash             cpu=2 start=3.25  finish=234.73
          377411) namd_r_base.mev  cpu=2 start=3.26  finish=234.72
      377384) sh               cpu=9 start=3.25  finish=235.38
        377394) bash             cpu=3 start=3.25  finish=235.38
          377415) namd_r_base.mev  cpu=3 start=3.26  finish=235.37
      377385) sh               cpu=9 start=3.25  finish=234.94
        377395) bash             cpu=4 start=3.25  finish=234.94
          377418) namd_r_base.mev  cpu=4 start=3.26  finish=234.92
      377386) sh               cpu=9 start=3.25  finish=235.37
        377397) bash             cpu=5 start=3.25  finish=235.37
          377417) namd_r_base.mev  cpu=5 start=3.26  finish=235.35
      377388) sh               cpu=4 start=3.25  finish=235.25
        377396) bash             cpu=6 start=3.25  finish=235.25
          377420) namd_r_base.mev  cpu=6 start=3.26  finish=235.24
      377390) sh               cpu=12 start=3.25  finish=235.12
        377400) bash             cpu=7 start=3.25  finish=235.12
          377419) namd_r_base.mev  cpu=7 start=3.26  finish=235.11
      377391) sh               cpu=1 start=3.25  finish=235.02
        377402) bash             cpu=8 start=3.26  finish=235.01
          377421) namd_r_base.mev  cpu=8 start=3.26  finish=235.00
      377393) sh               cpu=9 start=3.25  finish=234.92
        377404) bash             cpu=9 start=3.26  finish=234.92
          377422) namd_r_base.mev  cpu=9 start=3.26  finish=234.90
      377398) sh               cpu=10 start=3.25  finish=234.24
        377407) bash             cpu=10 start=3.26  finish=234.24
          377424) namd_r_base.mev  cpu=10 start=3.26  finish=234.22
      377399) sh               cpu=8 start=3.25  finish=235.36
        377408) bash             cpu=11 start=3.26  finish=235.36
          377423) namd_r_base.mev  cpu=11 start=3.26  finish=235.34
      377401) sh               cpu=4 start=3.26  finish=235.10
        377409) bash             cpu=12 start=3.26  finish=235.10
          377425) namd_r_base.mev  cpu=12 start=3.26  finish=235.09
      377403) sh               cpu=11 start=3.26  finish=235.52
        377410) bash             cpu=13 start=3.26  finish=235.52
          377426) namd_r_base.mev  cpu=13 start=3.26  finish=235.51
      377405) sh               cpu=10 start=3.26  finish=234.90
        377413) bash             cpu=14 start=3.26  finish=234.90
          377427) namd_r_base.mev  cpu=14 start=3.26  finish=234.88
      377406) sh               cpu=15 start=3.26  finish=235.15
        377414) bash             cpu=15 start=3.26  finish=235.15
          377428) namd_r_base.mev  cpu=15 start=3.26  finish=235.14