leela is a SPEC CPU(R) benchmark written in C++ and described here. The workload runs on all logical cores.

Topdown profile shows a high rate of frontend stalls and lower backend stalls. Branch misprediction is also surprisingly high.

AMD metrics on 7840 confirms a moderate branch count. Unclear exactly why the frontend stalls are as high as they are other than missing 12% of the branches.

elapsed              1062.974
on_cpu               0.985          # 15.76 / 16 cores
utime                16746.664
stime                10.843
nvcsw                23905          # 13.58%
nivcsw               152068         # 86.42%
inblock              0              # 0.00/sec
onblock              153096         # 144.03/sec
cpu-clock            16757954170204 # 16757.954 seconds
task-clock           16758041250804 # 16758.041 seconds
page faults          2621721        # 156.446/sec
context switches     175408         # 10.467/sec
cpu migrations       136            # 0.008/sec
major page faults    1134           # 0.068/sec
minor page faults    2620587        # 156.378/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             10306429534367 # 141.333 branches per 1000 inst
branch misses        1254180631762  # 12.17% branch miss
conditional          8665643643389  # 118.833 conditional branches per 1000 inst
indirect             13613050061    # 0.187 indirect branches per 1000 inst
cpu-cycles           69276351926448 # 4.07 GHz
instructions         72927455351535 # 1.05 IPC
slots                138563812553184 #
retiring             25252882198147 # 18.2% (23.6%)
-- ucode             365836893      #     0.0%
-- fastpath          25252516361254 #    18.2%
frontend             53606931164145 # 38.7% (50.0%) high
-- latency           38042108363622 #    27.5%
-- bandwidth         15564822800523 #    11.2%
backend              13591797279860 #  9.8% (12.7%) low
-- cpu               4113205360771  #     3.0%
-- memory            9478591919089  #     6.8%
speculation          14685401691972 # 10.6% (13.7%) high
-- branch mispredict 14528567000506 #    10.5%
-- pipeline restart  156834691466   #     0.1%
smt-contention       31426664933396 # 22.7% ( 0.0%)
cpu-cycles           69260152059467 # 4.08 GHz
instructions         72944694306125 # 1.05 IPC
instructions         24309175729662 # 18.021 l2 access per 1000 inst
l2 hit from l1       319328782407   # 4.14% l2 miss
l2 miss from l1      9776623105     #
l2 hit from l2 pf    110379924818   #
l3 hit from l2 pf    6132748151     #
l3 miss from l2 pf   2234093578     #
instructions         24301522652778 # 81.302 float per 1000 inst
float 512            217            # 0.000 AVX-512 per 1000 inst
float 256            2388431966     # 0.098 AVX-256 per 1000 inst
float 128            1973374916489  # 81.204 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         20             # 0.000 scalar per 1000 inst
instructions         72921672042387 #
opcache              19636741457450 # 269.285 opcache per 1000 inst
opcache miss         503412573061   #  2.6% opcache miss rate
l1 dTLB miss         71515941686    # 0.981 L1 dTLB per 1000 inst
l2 dTLB miss         3218929660     # 0.044 L2 dTLB per 1000 inst
instructions         72921543862704 #
icache               572134287027   # 7.846 icache per 1000 inst
icache miss          47743495211    #  8.3% icache miss rate
l1 iTLB miss         255447231      # 0.004 L1 iTLB per 1000 inst
l2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst
tlb flush            101123         # 0.000 TLB flush per 1000 inst

Process profile shows time spent in leela_r_base.me

581 processes
	 48 leela_r_base.me      16706.75     7.13
	 69 specperl                 9.26     1.64
	  1 clang++                  0.01     0.00
	  1 lsb_release              0.01     0.00
	 11 ps                       0.00     0.01
	173 sh                       0.00     0.00
	 54 specrxp                  0.00     0.00
	 48 bash                     0.00     0.00
	 41 specinvoke               0.00     0.00
	 21 grep                     0.00     0.00
	 20 cat                      0.00     0.00
	 12 uniq                     0.00     0.00
	 11 sort                     0.00     0.00
	 10 expand                   0.00     0.00
	  6 pwd                      0.00     0.00
	  5 basename                 0.00     0.00
	  5 specmake                 0.00     0.00
	  5 systemctl                0.00     0.00
	  4 specpp                   0.00     0.00
	  4 uname                    0.00     0.00
	  3 dirname                  0.00     0.00
	  3 dmidecode                0.00     0.00
	  3 lscpu                    0.00     0.00
	  2 df                       0.00     0.00
	  2 dpkg                     0.00     0.00
	  2 rm                       0.00     0.00
	  2 runcpu                   0.00     0.00
	  2 specsha512sum            0.00     0.00
	  2 specxz                   0.00     0.00
	  2 who                      0.00     0.00
	  1 cpupower                 0.00     0.00
	  1 head                     0.00     0.00
	  1 logname                  0.00     0.00
	  1 ls                       0.00     0.00
	  1 numactl                  0.00     0.00
	  1 sysctl                   0.00     0.00
	  1 w                        0.00     0.00
	  1 wc                       0.00     0.00
	  1 which                    0.00     0.00
0 processes running
53 maximum processes

specinvoke fires up separate copies on each logical processor

    58477) specinvoke       cpu=5 start=3.33  finish=354.44
      58479) sh               cpu=9 start=3.33  finish=351.88
        58488) bash             cpu=0 start=3.33  finish=351.88
          58513) leela_r_base.me  cpu=0 start=3.33  finish=351.87
      58480) sh               cpu=4 start=3.33  finish=351.79
        58490) bash             cpu=1 start=3.33  finish=351.79
          58511) leela_r_base.me  cpu=1 start=3.33  finish=351.79
      58481) sh               cpu=5 start=3.33  finish=354.44
        58491) bash             cpu=2 start=3.33  finish=354.44
          58517) leela_r_base.me  cpu=2 start=3.33  finish=354.44
      58482) sh               cpu=3 start=3.33  finish=350.76
        58494) bash             cpu=3 start=3.33  finish=350.76
          58515) leela_r_base.me  cpu=3 start=3.33  finish=350.75
      58483) sh               cpu=3 start=3.33  finish=350.83
        58505) bash             cpu=4 start=3.33  finish=350.83
          58518) leela_r_base.me  cpu=4 start=3.33  finish=350.83
      58484) sh               cpu=5 start=3.33  finish=351.25
        58492) bash             cpu=5 start=3.33  finish=351.25
          58516) leela_r_base.me  cpu=5 start=3.33  finish=351.25
      58485) sh               cpu=14 start=3.33  finish=352.85
        58496) bash             cpu=6 start=3.33  finish=352.85
          58512) leela_r_base.me  cpu=6 start=3.33  finish=352.85
      58486) sh               cpu=9 start=3.33  finish=352.07
        58503) bash             cpu=7 start=3.33  finish=352.07
          58520) leela_r_base.me  cpu=7 start=3.33  finish=352.06
      58487) sh               cpu=12 start=3.33  finish=352.24
        58499) bash             cpu=8 start=3.33  finish=352.24
          58514) leela_r_base.me  cpu=8 start=3.33  finish=352.24
      58489) sh               cpu=13 start=3.33  finish=351.74
        58502) bash             cpu=9 start=3.33  finish=351.74
          58519) leela_r_base.me  cpu=9 start=3.33  finish=351.73
      58493) sh               cpu=4 start=3.33  finish=351.78
        58501) bash             cpu=10 start=3.33  finish=351.78
          58521) leela_r_base.me  cpu=10 start=3.33  finish=351.77
      58495) sh               cpu=4 start=3.33  finish=351.60
        58506) bash             cpu=11 start=3.33  finish=351.60
          58522) leela_r_base.me  cpu=11 start=3.33  finish=351.60
      58497) sh               cpu=4 start=3.33  finish=352.05
        58507) bash             cpu=12 start=3.33  finish=352.05
          58523) leela_r_base.me  cpu=12 start=3.33  finish=352.04
      58498) sh               cpu=3 start=3.33  finish=350.95
        58508) bash             cpu=13 start=3.33  finish=350.95
          58525) leela_r_base.me  cpu=13 start=3.33  finish=350.94
      58500) sh               cpu=0 start=3.33  finish=352.84
        58509) bash             cpu=14 start=3.33  finish=352.84
          58524) leela_r_base.me  cpu=14 start=3.33  finish=352.83
      58504) sh               cpu=15 start=3.33  finish=352.67
        58510) bash             cpu=15 start=3.33  finish=352.67
          58526) leela_r_base.me  cpu=15 start=3.33  finish=352.66