leela is a SPEC CPU(R) benchmark written in C++ and described here. The workload runs on all logical cores.

Topdown profile shows a high rate of frontend stalls and lower backend stalls. Branch misprediction is also surprisingly high.

AMD metrics on 7840 confirms a moderate branch count. Unclear exactly why the frontend stalls are as high as they are other than missing 12% of the branches.
elapsed 1062.974
on_cpu 0.985 # 15.76 / 16 cores
utime 16746.664
stime 10.843
nvcsw 23905 # 13.58%
nivcsw 152068 # 86.42%
inblock 0 # 0.00/sec
onblock 153096 # 144.03/sec
cpu-clock 16757954170204 # 16757.954 seconds
task-clock 16758041250804 # 16758.041 seconds
page faults 2621721 # 156.446/sec
context switches 175408 # 10.467/sec
cpu migrations 136 # 0.008/sec
major page faults 1134 # 0.068/sec
minor page faults 2620587 # 156.378/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 10306429534367 # 141.333 branches per 1000 inst
branch misses 1254180631762 # 12.17% branch miss
conditional 8665643643389 # 118.833 conditional branches per 1000 inst
indirect 13613050061 # 0.187 indirect branches per 1000 inst
cpu-cycles 69276351926448 # 4.07 GHz
instructions 72927455351535 # 1.05 IPC
slots 138563812553184 #
retiring 25252882198147 # 18.2% (23.6%)
-- ucode 365836893 # 0.0%
-- fastpath 25252516361254 # 18.2%
frontend 53606931164145 # 38.7% (50.0%) high
-- latency 38042108363622 # 27.5%
-- bandwidth 15564822800523 # 11.2%
backend 13591797279860 # 9.8% (12.7%) low
-- cpu 4113205360771 # 3.0%
-- memory 9478591919089 # 6.8%
speculation 14685401691972 # 10.6% (13.7%) high
-- branch mispredict 14528567000506 # 10.5%
-- pipeline restart 156834691466 # 0.1%
smt-contention 31426664933396 # 22.7% ( 0.0%)
cpu-cycles 69260152059467 # 4.08 GHz
instructions 72944694306125 # 1.05 IPC
instructions 24309175729662 # 18.021 l2 access per 1000 inst
l2 hit from l1 319328782407 # 4.14% l2 miss
l2 miss from l1 9776623105 #
l2 hit from l2 pf 110379924818 #
l3 hit from l2 pf 6132748151 #
l3 miss from l2 pf 2234093578 #
instructions 24301522652778 # 81.302 float per 1000 inst
float 512 217 # 0.000 AVX-512 per 1000 inst
float 256 2388431966 # 0.098 AVX-256 per 1000 inst
float 128 1973374916489 # 81.204 AVX-128 per 1000 inst
float MMX 0 # 0.000 MMX per 1000 inst
float scalar 20 # 0.000 scalar per 1000 inst
instructions 72921672042387 #
opcache 19636741457450 # 269.285 opcache per 1000 inst
opcache miss 503412573061 # 2.6% opcache miss rate
l1 dTLB miss 71515941686 # 0.981 L1 dTLB per 1000 inst
l2 dTLB miss 3218929660 # 0.044 L2 dTLB per 1000 inst
instructions 72921543862704 #
icache 572134287027 # 7.846 icache per 1000 inst
icache miss 47743495211 # 8.3% icache miss rate
l1 iTLB miss 255447231 # 0.004 L1 iTLB per 1000 inst
l2 iTLB miss 0 # 0.000 L2 iTLB per 1000 inst
tlb flush 101123 # 0.000 TLB flush per 1000 inst
Process profile shows time spent in leela_r_base.me
581 processes
48 leela_r_base.me 16706.75 7.13
69 specperl 9.26 1.64
1 clang++ 0.01 0.00
1 lsb_release 0.01 0.00
11 ps 0.00 0.01
173 sh 0.00 0.00
54 specrxp 0.00 0.00
48 bash 0.00 0.00
41 specinvoke 0.00 0.00
21 grep 0.00 0.00
20 cat 0.00 0.00
12 uniq 0.00 0.00
11 sort 0.00 0.00
10 expand 0.00 0.00
6 pwd 0.00 0.00
5 basename 0.00 0.00
5 specmake 0.00 0.00
5 systemctl 0.00 0.00
4 specpp 0.00 0.00
4 uname 0.00 0.00
3 dirname 0.00 0.00
3 dmidecode 0.00 0.00
3 lscpu 0.00 0.00
2 df 0.00 0.00
2 dpkg 0.00 0.00
2 rm 0.00 0.00
2 runcpu 0.00 0.00
2 specsha512sum 0.00 0.00
2 specxz 0.00 0.00
2 who 0.00 0.00
1 cpupower 0.00 0.00
1 head 0.00 0.00
1 logname 0.00 0.00
1 ls 0.00 0.00
1 numactl 0.00 0.00
1 sysctl 0.00 0.00
1 w 0.00 0.00
1 wc 0.00 0.00
1 which 0.00 0.00
0 processes running
53 maximum processes
specinvoke fires up separate copies on each logical processor
58477) specinvoke cpu=5 start=3.33 finish=354.44
58479) sh cpu=9 start=3.33 finish=351.88
58488) bash cpu=0 start=3.33 finish=351.88
58513) leela_r_base.me cpu=0 start=3.33 finish=351.87
58480) sh cpu=4 start=3.33 finish=351.79
58490) bash cpu=1 start=3.33 finish=351.79
58511) leela_r_base.me cpu=1 start=3.33 finish=351.79
58481) sh cpu=5 start=3.33 finish=354.44
58491) bash cpu=2 start=3.33 finish=354.44
58517) leela_r_base.me cpu=2 start=3.33 finish=354.44
58482) sh cpu=3 start=3.33 finish=350.76
58494) bash cpu=3 start=3.33 finish=350.76
58515) leela_r_base.me cpu=3 start=3.33 finish=350.75
58483) sh cpu=3 start=3.33 finish=350.83
58505) bash cpu=4 start=3.33 finish=350.83
58518) leela_r_base.me cpu=4 start=3.33 finish=350.83
58484) sh cpu=5 start=3.33 finish=351.25
58492) bash cpu=5 start=3.33 finish=351.25
58516) leela_r_base.me cpu=5 start=3.33 finish=351.25
58485) sh cpu=14 start=3.33 finish=352.85
58496) bash cpu=6 start=3.33 finish=352.85
58512) leela_r_base.me cpu=6 start=3.33 finish=352.85
58486) sh cpu=9 start=3.33 finish=352.07
58503) bash cpu=7 start=3.33 finish=352.07
58520) leela_r_base.me cpu=7 start=3.33 finish=352.06
58487) sh cpu=12 start=3.33 finish=352.24
58499) bash cpu=8 start=3.33 finish=352.24
58514) leela_r_base.me cpu=8 start=3.33 finish=352.24
58489) sh cpu=13 start=3.33 finish=351.74
58502) bash cpu=9 start=3.33 finish=351.74
58519) leela_r_base.me cpu=9 start=3.33 finish=351.73
58493) sh cpu=4 start=3.33 finish=351.78
58501) bash cpu=10 start=3.33 finish=351.78
58521) leela_r_base.me cpu=10 start=3.33 finish=351.77
58495) sh cpu=4 start=3.33 finish=351.60
58506) bash cpu=11 start=3.33 finish=351.60
58522) leela_r_base.me cpu=11 start=3.33 finish=351.60
58497) sh cpu=4 start=3.33 finish=352.05
58507) bash cpu=12 start=3.33 finish=352.05
58523) leela_r_base.me cpu=12 start=3.33 finish=352.04
58498) sh cpu=3 start=3.33 finish=350.95
58508) bash cpu=13 start=3.33 finish=350.95
58525) leela_r_base.me cpu=13 start=3.33 finish=350.94
58500) sh cpu=0 start=3.33 finish=352.84
58509) bash cpu=14 start=3.33 finish=352.84
58524) leela_r_base.me cpu=14 start=3.33 finish=352.83
58504) sh cpu=15 start=3.33 finish=352.67
58510) bash cpu=15 start=3.33 finish=352.67
58526) leela_r_base.me cpu=15 start=3.33 finish=352.66
