A Linux kernel scheduler benchmark. There are nine workloads measuring latency with increasing numbers of threads. Plot below reflects both increased runable threads and usage.

Topdown profile shows mostly memory bound with some frontend stalls.

AMD metrics show backend stalls. There is little floating point code and little L2 access.

elapsed              1556.050
on_cpu               0.645          # 10.32 / 16 cores
utime                16042.943
stime                11.160
nvcsw                1661935        # 45.05%
nivcsw               2026894        # 54.95%
inblock              0              # 0.00/sec
onblock              14568          # 9.36/sec
cpu-clock            16053848559152 # 16053.849 seconds
task-clock           16054114142431 # 16054.114 seconds
page faults          242257         # 15.090/sec
context switches     3696293        # 230.240/sec
cpu migrations       790049         # 49.212/sec
major page faults    57             # 0.004/sec
minor page faults    242200         # 15.086/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             8363015807661  # 181.011 branches per 1000 inst
branch misses        1751506043     # 0.02% branch miss
conditional          6129106510907  # 132.659 conditional branches per 1000 inst
indirect             555221078517   # 12.017 indirect branches per 1000 inst
cpu-cycles           73140054874996 # 2.54 GHz
instructions         46237826856780 # 0.63 IPC low
slots                146432173835976 #
retiring             24129100695098 # 16.5% (24.2%)
-- ucode             370520955370   #     0.3%
-- fastpath          23758579739728 #    16.2%
frontend             29102512339367 # 19.9% (29.2%)
-- latency           24587855340780 #    16.8%
-- bandwidth         4514656998587  #     3.1%
backend              46483315674898 # 31.7% (46.6%)
-- cpu               10093079433097 #     6.9%
-- memory            36390236241801 #    24.9%
speculation          2955527432     #  0.0% ( 0.0%) low
-- branch mispredict 2946103259     #     0.0%
-- pipeline restart  9424173        #     0.0%
smt-contention       46714117571170 # 31.9% ( 0.0%)
cpu-cycles           67822818429201 # 2.35 GHz
instructions         43003680601254 # 0.63 IPC low
instructions         14340971853956 # 0.122 l2 access per 1000 inst
l2 hit from l1       1599372168     # 15.28% l2 miss
l2 miss from l1      197105566      #
l2 hit from l2 pf    77075632       #
l3 hit from l2 pf    55117623       #
l3 miss from l2 pf   14657946       #
instructions         14335290807670 # 4.475 float per 1000 inst
float 512            108            # 0.000 AVX-512 per 1000 inst
float 256            480            # 0.000 AVX-256 per 1000 inst
float 128            64152822075    # 4.475 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         0              # 0.000 scalar per 1000 inst
instructions         26647384037721 #
opcache              3861767750251  # 144.921 opcache per 1000 inst
opcache miss         7442788516     #  0.2% opcache miss rate
l1 dTLB miss         139727119      # 0.005 L1 dTLB per 1000 inst
l2 dTLB miss         18498881       # 0.001 L2 dTLB per 1000 inst
instructions         52883862220061 #
icache               25470270754    # 0.482 icache per 1000 inst
icache miss          3143824182     # 12.3% icache miss rate
l1 iTLB miss         9913687        # 0.000 L1 iTLB per 1000 inst
l2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst
tlb flush            146204         # 0.000 TLB flush per 1000 inst

Intel metrics show backend stalls as more CPU-based

elapsed              2139.355
on_cpu               0.565          # 9.04 / 16 cores
utime                19329.698
stime                10.853
nvcsw                1642438        # 46.31%
nivcsw               1904138        # 53.69%
inblock              288            # 0.13/sec
onblock              3760           # 1.76/sec
cpu-clock            19342627391967 # 19342.627 seconds
task-clock           19343100659394 # 19343.101 seconds
page faults          180818         # 9.348/sec
context switches     3556927        # 183.886/sec
cpu migrations       785457         # 40.607/sec
major page faults    66             # 0.003/sec
minor page faults    180752         # 9.345/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             4562313428524  # 180.964 branches per 1000 inst
branch misses        259291716      # 0.01% branch miss
conditional          4562313732684  # 180.964 conditional branches per 1000 inst
indirect             1397283320799  # 55.423 indirect branches per 1000 inst
slots                89163987381800 #
retiring             12228322389427 # 13.7% (13.7%) low
-- ucode             4221981936017  #     4.7%
-- fastpath          8006340453410  #     9.0%
frontend             6991394912521  #  7.8% ( 7.8%)
-- latency           5126137258625  #     5.7%
-- bandwidth         1865257653896  #     2.1%
backend              69975878002186 # 78.5% (78.5%) high
-- cpu               67849553454305 #    76.1%
-- memory            2126324547881  #     2.4%
speculation          2900478872     #  0.0% ( 0.0%) low
-- branch mispredict 2772785413     #     0.0%
-- pipeline restart  127693459      #     0.0%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           61862121549034 # 1.75 GHz
instructions         20391443056364 # 0.33 IPC low
l2 access            2429707648     # 0.224 l2 access per 1000 inst
l2 miss              786985989      # 32.39% l2 miss
cpu-cycles           23582529091664 #  3.7% memory latency
load stalls          864295341452   #  3.6% l1 bound
l1 miss              9987194866     #  0.0% l2 bound
l2 miss              6326008394     #  0.0% l3 bound
l3 miss              926737004      #  0.0% dram bound
store_stalls         276316533      #  0.0% store bound

Process summary

8083 processes
	7664 schbench             3752749.00  1085.68
	 68 clinfo                  16.21     5.99
	 38 vulkaninfo               1.52     0.96
	  4 vulkani:disk$0           0.16     0.10
	  6 php                      0.14     0.23
	  6 glxinfo:gdrv0            0.11     0.05
	  6 glxinfo:gl0              0.11     0.05
	  2 llvmpipe-0               0.08     0.05
	  2 llvmpipe-1               0.08     0.05
	  2 llvmpipe-10              0.08     0.05
	  2 llvmpipe-11              0.08     0.05
	  2 llvmpipe-12              0.08     0.05
	  2 llvmpipe-13              0.08     0.05
	  2 llvmpipe-14              0.08     0.05
	  2 llvmpipe-15              0.08     0.05
	  2 llvmpipe-2               0.08     0.05
	  2 llvmpipe-3               0.08     0.05
	  2 llvmpipe-4               0.08     0.05
	  2 llvmpipe-5               0.08     0.05
	  2 llvmpipe-6               0.08     0.05
	  2 llvmpipe-7               0.08     0.05
	  2 llvmpipe-8               0.08     0.05
	  2 llvmpipe-9               0.08     0.05
	  6 clang                    0.06     0.04
	  2 glxinfo                  0.05     0.03
	  2 glxinfo:cs0              0.05     0.03
	  2 glxinfo:disk$0           0.05     0.03
	  2 glxinfo:sh0              0.05     0.03
	  2 glxinfo:shlo0            0.05     0.03
	  3 rocminfo                 0.03     0.00
	  1 lspci                    0.00     0.02
	  1 ps                       0.00     0.01
	 98 sh                       0.00     0.00
	 13 gcc                      0.00     0.00
	 11 gsettings                0.00     0.00
	  8 stat                     0.00     0.00
	  8 systemd-detect-          0.00     0.00
	  6 llvm-link                0.00     0.00
	  5 phoronix-test-s          0.00     0.00
	  3 gmain                    0.00     0.00
	  2 cc                       0.00     0.00
	  2 dconf worker             0.00     0.00
	  2 lscpu                    0.00     0.00
	  2 uname                    0.00     0.00
	  2 which                    0.00     0.00
	  2 xset                     0.00     0.00
	  1 date                     0.00     0.00
	  1 dirname                  0.00     0.00
	  1 dmesg                    0.00     0.00
	  1 dmidecode                0.00     0.00
	  1 grep                     0.00     0.00
	  1 ifconfig                 0.00     0.00
	  1 ip                       0.00     0.00
	  1 lsmod                    0.00     0.00
	  1 mktemp                   0.00     0.00
	  1 qdbus                    0.00     0.00
	  1 readlink                 0.00     0.00
	  1 realpath                 0.00     0.00
	  1 sed                      0.00     0.00
	  1 sort                     0.00     0.00
	  1 stty                     0.00     0.00
	  1 systemctl                0.00     0.00
	  1 template.sh              0.00     0.00
	  1 wc                       0.00     0.00
	  1 xrandr                   0.00     0.00
0 processes running
1291 maximum processes

Computation structures

      87468) schbench         cpu=6 start=148.37 finish=178.43
        87469) schbench         cpu=7 start=148.37 finish=178.43
          87470) schbench         cpu=2 start=148.37 finish=178.43
            87472) schbench         cpu=10 start=148.37 finish=178.43
            87474) schbench         cpu=12 start=148.37 finish=178.43
            87476) schbench         cpu=15 start=148.37 finish=178.43
            87477) schbench         cpu=6 start=148.37 finish=178.43
          87471) schbench         cpu=1 start=148.37 finish=178.43
            87473) schbench         cpu=4 start=148.37 finish=178.43
            87475) schbench         cpu=5 start=148.37 finish=178.43
            87478) schbench         cpu=11 start=148.37 finish=178.43
            87479) schbench         cpu=9 start=148.37 finish=178.43