LeelaChessZero is a chess program automated via neural networks. There are three backends: BLAS, Eigen and OpenCL. The OpenCL backend didn’t run because OpenCL wasn’t configured. The BLAS implementation is much faster on my AMD CPU than my Intel CPU. Th e profile shows a lot of variation in runnable processes.I expect we will see many short processes running.

Frontend stalls periodically spike up and retirement rate is in mid 30s.

AMD metrics show a lot of context switches and page faults. Waiting on both CPU and memory.

elapsed              2286.628
on_cpu               0.788          # 12.61 / 16 cores
utime                28433.206
stime                398.719
nvcsw                3234284        # 4.57%
nivcsw               67581710       # 95.43%
inblock              0              # 0.00/sec
onblock              28176          # 12.32/sec
cpu-clock            28817910710262 # 28817.911 seconds
task-clock           28822250831254 # 28822.251 seconds
page faults          13958000       # 484.279/sec
context switches     70827221       # 2457.380/sec
cpu migrations       638731         # 22.161/sec
major page faults    6              # 0.000/sec
minor page faults    13957994       # 484.278/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             6068063340836  # 48.790 branches per 1000 inst
branch misses        161506622845   # 2.66% branch miss
conditional          4022477007429  # 32.343 conditional branches per 1000 inst
indirect             635544091025   # 5.110 indirect branches per 1000 inst
cpu-cycles           127880209993342 # 3.00 GHz
instructions         136039223127335 # 1.06 IPC
slots                255788277627072 #
retiring             46024278712982 # 18.0% (23.6%)
-- ucode             148061056873   #     0.1%
-- fastpath          45876217656109 #    17.9%
frontend             16172927052638 #  6.3% ( 8.3%)
-- latency           8743803494448  #     3.4%
-- bandwidth         7429123558190  #     2.9%
backend              128450684661770 # 50.2% (66.0%)
-- cpu               72082918052603 #    28.2%
-- memory            56367766609167 #    22.0%
speculation          4004300785800  #  1.6% ( 2.1%)
-- branch mispredict 3950326280615  #     1.5%
-- pipeline restart  53974505185    #     0.0%
smt-contention       61135597919753 # 23.9% ( 0.0%)
cpu-cycles           109299872593817 # 2.99 GHz
instructions         123409306628789 # 1.13 IPC
instructions         41152895861252 # 95.467 l2 access per 1000 inst
l2 hit from l1       2977220324102  # 7.87% l2 miss
l2 miss from l1      82623493543    #
l2 hit from l2 pf    725059870544   #
l3 hit from l2 pf    106665422650   #
l3 miss from l2 pf   119803286779   #
instructions         41153957789364 # 89.041 float per 1000 inst
float 512            85             # 0.000 AVX-512 per 1000 inst
float 256            11641          # 0.000 AVX-256 per 1000 inst
float 128            3664377273313  # 89.041 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         0              # 0.000 scalar per 1000 inst

Intel metrics.

elapsed              4689.232
on_cpu               0.669          # 10.71 / 16 cores
utime                49944.869
stime                265.709
nvcsw                10542389       # 27.33%
nivcsw               28035944       # 72.67%
inblock              88096          # 18.79/sec
onblock              4136           # 0.88/sec
cpu-clock            50157070996008 # 50157.071 seconds
task-clock           50166215178468 # 50166.215 seconds
page faults          27500955       # 548.197/sec
context switches     38601556       # 769.473/sec
cpu migrations       2041465        # 40.694/sec
major page faults    46             # 0.001/sec
minor page faults    27500909       # 548.196/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             6812057398539  # 21.566 branches per 1000 inst
branch misses        149765412660   # 2.20% branch miss
conditional          6812058478571  # 21.566 conditional branches per 1000 inst
indirect             2202417120955  # 6.973 indirect branches per 1000 inst
slots                306536509111490 #
retiring             140232905670943 # 45.7% (45.7%)
-- ucode             2752895743197  #     0.9%
-- fastpath          137480009927746 #    44.8%
frontend             34306137327121 # 11.2% (11.2%)
-- latency           26258728548842 #     8.6%
-- bandwidth         8047408778279  #     2.6%
backend              118976275623436 # 38.8% (38.8%)
-- cpu               58208559972379 #    19.0%
-- memory            60767715651057 #    19.8%
speculation          13128862043712 #  4.3% ( 4.3%)
-- branch mispredict 13020891939040 #     4.2%
-- pipeline restart  107970104672   #     0.0%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           136668286409257 # 1.86 GHz
instructions         242039315669235 # 1.77 IPC
l2 access            6583853889433  # 46.933 l2 access per 1000 inst
l2 miss              1277645992049  # 19.41% l2 miss

Process overview as expected a large number of processes and at times many runnable processes so this benchmark is a good test of process creation for short-lived processes.

17100 processes
	16675 lc0                  41020472.95 418368.36
	136 clinfo                  37.67    14.71
	 38 vulkaninfo               1.14     1.52
	  6 php                      0.21     0.38
	  6 glxinfo:gdrv0            0.16     0.10
	  4 vulkani:disk$0           0.12     0.16
	  2 glxinfo                  0.08     0.04
	  2 glxinfo:cs0              0.08     0.04
	  2 glxinfo:disk$0           0.08     0.04
	  2 glxinfo:sh0              0.08     0.04
	  2 glxinfo:shlo0            0.08     0.04
	  2 llvmpipe-0               0.06     0.08
	  2 llvmpipe-1               0.06     0.08
	  2 llvmpipe-10              0.06     0.08
	  2 llvmpipe-11              0.06     0.08
	  2 llvmpipe-12              0.06     0.08
	  2 llvmpipe-13              0.06     0.08
	  2 llvmpipe-14              0.06     0.08
	  2 llvmpipe-15              0.06     0.08
	  2 llvmpipe-2               0.06     0.08
	  2 llvmpipe-3               0.06     0.08
	  2 llvmpipe-4               0.06     0.08
	  2 llvmpipe-5               0.06     0.08
	  2 llvmpipe-6               0.06     0.08
	  2 llvmpipe-7               0.06     0.08
	  2 llvmpipe-8               0.06     0.08
	  2 llvmpipe-9               0.06     0.08
	  6 clang                    0.06     0.02
	  3 rocminfo                 0.03     0.00
	  1 lspci                    0.01     0.02
	 86 sh                       0.00     0.00
	 13 gcc                      0.00     0.00
	 12 gsettings                0.00     0.00
	  9 lczero                   0.00     0.00
	  8 stat                     0.00     0.00
	  8 systemd-detect-          0.00     0.00
	  6 llvm-link                0.00     0.00
	  5 phoronix-test-s          0.00     0.00
	  3 gmain                    0.00     0.00
	  2 cc                       0.00     0.00
	  2 lscpu                    0.00     0.00
	  2 uname                    0.00     0.00
	  2 which                    0.00     0.00
	  2 xset                     0.00     0.00
	  1 date                     0.00     0.00
	  1 dconf worker             0.00     0.00
	  1 dirname                  0.00     0.00
	  1 dmesg                    0.00     0.00
	  1 dmidecode                0.00     0.00
	  1 grep                     0.00     0.00
	  1 ifconfig                 0.00     0.00
	  1 ip                       0.00     0.00
	  1 lsmod                    0.00     0.00
	  1 mktemp                   0.00     0.00
	  1 ps                       0.00     0.00
	  1 qdbus                    0.00     0.00
	  1 readlink                 0.00     0.00
	  1 realpath                 0.00     0.00
	  1 sed                      0.00     0.00
	  1 sort                     0.00     0.00
	  1 stty                     0.00     0.00
	  1 systemctl                0.00     0.00
	  1 template.sh              0.00     0.00
	  1 wc                       0.00     0.00
	  1 xrandr                   0.00     0.00
0 processes running
108 maximum processes