lbm is a SPEC CPU(R) benchmark written in C and described here. The workload runs on all logical cores with a slight decrease at end of the run

Topdown profile shows this as a backend bound workload.

AMD metrics confirm this is backend-bound with stalls predominantly due to memory. There are ~175 L2 accesses per 1000 instructions and a 25% miss rate.
elapsed 1416.599
on_cpu 0.970 # 15.53 / 16 cores
utime 21972.055
stime 23.825
nvcsw 30569 # 13.09%
nivcsw 202931 # 86.91%
inblock 0 # 0.00/sec
onblock 10096 # 7.13/sec
cpu-clock 22000514289041 # 22000.514 seconds
task-clock 22000903635495 # 22000.904 seconds
page faults 5653553 # 256.969/sec
context switches 232950 # 10.588/sec
cpu migrations 159 # 0.007/sec
major page faults 967 # 0.044/sec
minor page faults 5652586 # 256.925/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 3618987222588 # 138.203 branches per 1000 inst
branch misses 2898200956 # 0.08% branch miss
conditional 3606565747126 # 137.728 conditional branches per 1000 inst
indirect 676460163 # 0.026 indirect branches per 1000 inst
cpu-cycles 100871453520770 # 4.42 GHz
instructions 26185191570506 # 0.26 IPC low
slots 201718434237312 #
retiring 9174297118733 # 4.5% ( 4.7%) low
-- ucode 275434458 # 0.0%
-- fastpath 9174021684275 # 4.5%
frontend 4164703812385 # 2.1% ( 2.1%) low
-- latency 2149956707448 # 1.1%
-- bandwidth 2014747104937 # 1.0%
backend 183730128254661 # 91.1% (93.2%) high
-- cpu 4395824123476 # 2.2%
-- memory 179334304131185 # 88.9%
speculation 79406452189 # 0.0% ( 0.0%) low
-- branch mispredict 43379310136 # 0.0%
-- pipeline restart 36027142053 # 0.0%
smt-contention 4569836071252 # 2.3% ( 0.0%)
cpu-cycles 101787311448863 # 4.45 GHz
instructions 26186712477699 # 0.26 IPC low
instructions 8732031506039 # 172.647 l2 access per 1000 inst
l2 hit from l1 1295773889975 # 24.72% l2 miss
l2 miss from l1 213380182619 #
l2 hit from l2 pf 52568231880 #
l3 hit from l2 pf 107403392082 #
l3 miss from l2 pf 51815243240 #
instructions 8727315001190 # 51.354 float per 1000 inst
float 512 211 # 0.000 AVX-512 per 1000 inst
float 256 428894495414 # 49.144 AVX-256 per 1000 inst
float 128 19289027765 # 2.210 AVX-128 per 1000 inst
float MMX 0 # 0.000 MMX per 1000 inst
float scalar 1 # 0.000 scalar per 1000 inst
instructions 26183673451995 #
opcache 1202720385896 # 45.934 opcache per 1000 inst
opcache miss 183734119286 # 15.3% opcache miss rate
l1 dTLB miss 344774133836 # 13.168 L1 dTLB per 1000 inst
l2 dTLB miss 14788093995 # 0.565 L2 dTLB per 1000 inst
instructions 26184096721565 #
icache 210993424923 # 8.058 icache per 1000 inst
icache miss 12611848176 # 6.0% icache miss rate
l1 iTLB miss 158643848 # 0.006 L1 iTLB per 1000 inst
l2 iTLB miss 0 # 0.000 L2 iTLB per 1000 inst
tlb flush 87594 # 0.000 TLB flush per 1000 inst
Process overview shows almost all time spent in lbm_r_base.mev
573 processes
48 lbm_r_base.mev- 22168.48 14.21
69 specperl 15.44 3.37
1 clang 0.01 0.00
7 ps 0.00 0.01
1 lsb_release 0.00 0.01
169 sh 0.00 0.00
54 specrxp 0.00 0.00
48 bash 0.00 0.00
41 specinvoke 0.00 0.00
21 grep 0.00 0.00
20 cat 0.00 0.00
12 uniq 0.00 0.00
11 sort 0.00 0.00
10 expand 0.00 0.00
6 pwd 0.00 0.00
5 basename 0.00 0.00
5 specmake 0.00 0.00
5 systemctl 0.00 0.00
4 specpp 0.00 0.00
4 uname 0.00 0.00
3 dirname 0.00 0.00
3 dmidecode 0.00 0.00
3 lscpu 0.00 0.00
2 df 0.00 0.00
2 dpkg 0.00 0.00
2 rm 0.00 0.00
2 runcpu 0.00 0.00
2 specsha512sum 0.00 0.00
2 specxz 0.00 0.00
2 who 0.00 0.00
1 cpupower 0.00 0.00
1 head 0.00 0.00
1 logname 0.00 0.00
1 ls 0.00 0.00
1 numactl 0.00 0.00
1 sysctl 0.00 0.00
1 w 0.00 0.00
1 wc 0.00 0.00
1 which 0.00 0.00
0 processes running
53 maximum processes
Computation blocks show specinvoke firing off separate copies on each logical core.
407711) specinvoke cpu=1 start=3.12 finish=474.61
407713) sh cpu=5 start=3.12 finish=469.35
407719) bash cpu=0 start=3.12 finish=469.35
407745) lbm_r_base.mev- cpu=0 start=3.12 finish=469.26
407714) sh cpu=15 start=3.12 finish=437.70
407724) bash cpu=1 start=3.12 finish=437.69
407747) lbm_r_base.mev- cpu=1 start=3.12 finish=437.56
407715) sh cpu=2 start=3.12 finish=453.48
407725) bash cpu=2 start=3.12 finish=453.48
407750) lbm_r_base.mev- cpu=2 start=3.12 finish=453.36
407716) sh cpu=1 start=3.12 finish=465.12
407726) bash cpu=3 start=3.12 finish=465.12
407746) lbm_r_base.mev- cpu=3 start=3.12 finish=465.02
407717) sh cpu=10 start=3.12 finish=471.82
407728) bash cpu=4 start=3.12 finish=471.82
407749) lbm_r_base.mev- cpu=4 start=3.12 finish=471.77
407718) sh cpu=10 start=3.12 finish=468.23
407731) bash cpu=5 start=3.12 finish=468.23
407748) lbm_r_base.mev- cpu=5 start=3.12 finish=468.16
407720) sh cpu=14 start=3.12 finish=474.61
407730) bash cpu=6 start=3.12 finish=474.61
407753) lbm_r_base.mev- cpu=6 start=3.12 finish=474.58
407721) sh cpu=7 start=3.12 finish=473.20
407733) bash cpu=7 start=3.12 finish=473.20
407751) lbm_r_base.mev- cpu=7 start=3.12 finish=473.16
407722) sh cpu=3 start=3.12 finish=473.33
407735) bash cpu=8 start=3.12 finish=473.33
407752) lbm_r_base.mev- cpu=8 start=3.12 finish=473.30
407723) sh cpu=9 start=3.12 finish=471.65
407737) bash cpu=9 start=3.12 finish=471.65
407754) lbm_r_base.mev- cpu=9 start=3.12 finish=471.60
407727) sh cpu=10 start=3.12 finish=461.01
407739) bash cpu=10 start=3.12 finish=461.01
407755) lbm_r_base.mev- cpu=10 start=3.12 finish=460.94
407729) sh cpu=1 start=3.12 finish=471.46
407740) bash cpu=11 start=3.12 finish=471.46
407756) lbm_r_base.mev- cpu=11 start=3.12 finish=471.40
407732) sh cpu=15 start=3.12 finish=445.70
407741) bash cpu=12 start=3.12 finish=445.70
407760) lbm_r_base.mev- cpu=12 start=3.12 finish=445.57
407734) sh cpu=15 start=3.12 finish=447.07
407742) bash cpu=13 start=3.12 finish=447.07
407757) lbm_r_base.mev- cpu=13 start=3.12 finish=446.95
407736) sh cpu=13 start=3.12 finish=473.05
407744) bash cpu=14 start=3.12 finish=473.05
407759) lbm_r_base.mev- cpu=14 start=3.12 finish=472.98
407738) sh cpu=15 start=3.12 finish=431.52
407743) bash cpu=15 start=3.12 finish=431.52
407758) lbm_r_base.mev- cpu=15 start=3.12 finish=431.38
