bwaves is a SPEC CPU(R) benchmark described here. This C++ workload runs consistently on all logical cores.

Topdown profile shows a high retirement rate with some backend stalls.

AMD metrics show backend stalls are more cpu stalls than memory stalls. While there are ~60 L2 access per 1000 instructions, the L2 miss rate is low. The opcache has a very low miss rate.
elapsed 705.983
on_cpu 0.988 # 15.81 / 16 cores
utime 11155.218
stime 8.698
nvcsw 16557 # 12.97%
nivcsw 111119 # 87.03%
inblock 0 # 0.00/sec
onblock 31776 # 45.01/sec
cpu-clock 11164522824524 # 11164.523 seconds
task-clock 11164605623683 # 11164.606 seconds
page faults 2678594 # 239.918/sec
context switches 127114 # 11.385/sec
cpu migrations 164 # 0.015/sec
major page faults 1011 # 0.091/sec
minor page faults 2677583 # 239.828/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 2064334682529 # 26.587 branches per 1000 inst
branch misses 90764635496 # 4.40% branch miss
conditional 1883524841799 # 24.258 conditional branches per 1000 inst
indirect 1615455461 # 0.021 indirect branches per 1000 inst
cpu-cycles 42377952252120 # 3.74 GHz
instructions 77659634049358 # 1.83 IPC
slots 84735322972980 #
retiring 26508259685731 # 31.3% (51.3%)
-- ucode 499617852 # 0.0%
-- fastpath 26507760067879 # 31.3%
frontend 2468780097885 # 2.9% ( 4.8%) low
-- latency 1830776219160 # 2.2%
-- bandwidth 638003878725 # 0.8%
backend 21111048379065 # 24.9% (40.9%)
-- cpu 14580511723887 # 17.2%
-- memory 6530536655178 # 7.7%
speculation 1549961155223 # 1.8% ( 3.0%)
-- branch mispredict 1535782887883 # 1.8%
-- pipeline restart 14178267340 # 0.0%
smt-contention 33097226855820 # 39.1% ( 0.0%)
cpu-cycles 42363963062501 # 3.74 GHz
instructions 77682975707638 # 1.83 IPC
instructions 25886326183295 # 63.010 l2 access per 1000 inst
l2 hit from l1 1142360918883 # 1.35% l2 miss
l2 miss from l1 4871222345 #
l2 hit from l2 pf 471669934470 #
l3 hit from l2 pf 3651612350 #
l3 miss from l2 pf 13423911862 #
instructions 25865520397898 # 395.622 float per 1000 inst
float 512 137 # 0.000 AVX-512 per 1000 inst
float 256 16868 # 0.000 AVX-256 per 1000 inst
float 128 10232976318229 # 395.622 AVX-128 per 1000 inst
float MMX 0 # 0.000 MMX per 1000 inst
float scalar 0 # 0.000 scalar per 1000 inst
instructions 77643808589000 #
opcache 7329400605488 # 94.398 opcache per 1000 inst
opcache miss 17462126951 # 0.2% opcache miss rate
l1 dTLB miss 10400596229 # 0.134 L1 dTLB per 1000 inst
l2 dTLB miss 757260290 # 0.010 L2 dTLB per 1000 inst
instructions 77643844021012 #
icache 28943916756 # 0.373 icache per 1000 inst
icache miss 5435095843 # 18.8% icache miss rate
l1 iTLB miss 238400540 # 0.003 L1 iTLB per 1000 inst
l2 iTLB miss 0 # 0.000 L2 iTLB per 1000 inst
tlb flush 78639 # 0.000 TLB flush per 1000 inst
The process overviews shows almost all time spent in namd_r_base.mev
579 processes
48 namd_r_base.mev 11114.35 4.70
69 specperl 12.33 1.47
1 clang++ 0.01 0.00
1 lsb_release 0.01 0.00
10 ps 0.00 0.01
172 sh 0.00 0.00
54 specrxp 0.00 0.00
48 bash 0.00 0.00
41 specinvoke 0.00 0.00
21 grep 0.00 0.00
20 cat 0.00 0.00
12 uniq 0.00 0.00
11 sort 0.00 0.00
10 expand 0.00 0.00
6 pwd 0.00 0.00
5 basename 0.00 0.00
5 specmake 0.00 0.00
5 systemctl 0.00 0.00
4 specpp 0.00 0.00
4 uname 0.00 0.00
3 dirname 0.00 0.00
3 dmidecode 0.00 0.00
3 lscpu 0.00 0.00
2 df 0.00 0.00
2 dpkg 0.00 0.00
2 rm 0.00 0.00
2 runcpu 0.00 0.00
2 specsha512sum 0.00 0.00
2 specxz 0.00 0.00
2 who 0.00 0.00
1 cpupower 0.00 0.00
1 head 0.00 0.00
1 logname 0.00 0.00
1 ls 0.00 0.00
1 numactl 0.00 0.00
1 sysctl 0.00 0.00
1 w 0.00 0.00
1 wc 0.00 0.00
1 which 0.00 0.00
0 processes running
53 maximum processes
Specinvoke fires up separate processes for each core.
377379) specinvoke cpu=14 start=3.25 finish=235.52
377381) sh cpu=4 start=3.25 finish=235.10
377387) bash cpu=0 start=3.25 finish=235.10
377412) namd_r_base.mev cpu=0 start=3.26 finish=235.09
377382) sh cpu=1 start=3.25 finish=234.94
377392) bash cpu=1 start=3.25 finish=234.94
377416) namd_r_base.mev cpu=1 start=3.26 finish=234.92
377383) sh cpu=10 start=3.25 finish=234.73
377389) bash cpu=2 start=3.25 finish=234.73
377411) namd_r_base.mev cpu=2 start=3.26 finish=234.72
377384) sh cpu=9 start=3.25 finish=235.38
377394) bash cpu=3 start=3.25 finish=235.38
377415) namd_r_base.mev cpu=3 start=3.26 finish=235.37
377385) sh cpu=9 start=3.25 finish=234.94
377395) bash cpu=4 start=3.25 finish=234.94
377418) namd_r_base.mev cpu=4 start=3.26 finish=234.92
377386) sh cpu=9 start=3.25 finish=235.37
377397) bash cpu=5 start=3.25 finish=235.37
377417) namd_r_base.mev cpu=5 start=3.26 finish=235.35
377388) sh cpu=4 start=3.25 finish=235.25
377396) bash cpu=6 start=3.25 finish=235.25
377420) namd_r_base.mev cpu=6 start=3.26 finish=235.24
377390) sh cpu=12 start=3.25 finish=235.12
377400) bash cpu=7 start=3.25 finish=235.12
377419) namd_r_base.mev cpu=7 start=3.26 finish=235.11
377391) sh cpu=1 start=3.25 finish=235.02
377402) bash cpu=8 start=3.26 finish=235.01
377421) namd_r_base.mev cpu=8 start=3.26 finish=235.00
377393) sh cpu=9 start=3.25 finish=234.92
377404) bash cpu=9 start=3.26 finish=234.92
377422) namd_r_base.mev cpu=9 start=3.26 finish=234.90
377398) sh cpu=10 start=3.25 finish=234.24
377407) bash cpu=10 start=3.26 finish=234.24
377424) namd_r_base.mev cpu=10 start=3.26 finish=234.22
377399) sh cpu=8 start=3.25 finish=235.36
377408) bash cpu=11 start=3.26 finish=235.36
377423) namd_r_base.mev cpu=11 start=3.26 finish=235.34
377401) sh cpu=4 start=3.26 finish=235.10
377409) bash cpu=12 start=3.26 finish=235.10
377425) namd_r_base.mev cpu=12 start=3.26 finish=235.09
377403) sh cpu=11 start=3.26 finish=235.52
377410) bash cpu=13 start=3.26 finish=235.52
377426) namd_r_base.mev cpu=13 start=3.26 finish=235.51
377405) sh cpu=10 start=3.26 finish=234.90
377413) bash cpu=14 start=3.26 finish=234.90
377427) namd_r_base.mev cpu=14 start=3.26 finish=234.88
377406) sh cpu=15 start=3.26 finish=235.15
377414) bash cpu=15 start=3.26 finish=235.15
377428) namd_r_base.mev cpu=15 start=3.26 finish=235.14
