A test of the kernel scheduler. This uses varying amounts of threads and processes as shown in the progression below.

Topdown profile is dominated by frontend stalls with not many backend stalls. Retirement rate is consistent through different numbers of threads/processes.

AMD metrics show little floating point or L2 access.

elapsed              3103.839
on_cpu               0.881          # 14.10 / 16 cores
utime                2834.004
stime                40925.639
nvcsw                685247592      # 76.91%
nivcsw               205745161      # 23.09%
inblock              0              # 0.00/sec
onblock              15128          # 4.87/sec
cpu-clock            43755689957976 # 43755.690 seconds
task-clock           43759705281147 # 43759.705 seconds
page faults          689472         # 15.756/sec
context switches     890984992      # 20360.854/sec
cpu migrations       82144359       # 1877.169/sec
major page faults    46             # 0.001/sec
minor page faults    689426         # 15.755/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             23501436824532 # 203.654 branches per 1000 inst
branch misses        3032522200893  # 12.90% branch miss
conditional          10698516228519 # 92.709 conditional branches per 1000 inst
indirect             239505493842   # 2.075 indirect branches per 1000 inst
cpu-cycles           91347428654789 # 3.38 GHz
instructions         60895739342932 # 0.67 IPC low
slots                181699842904956 #
retiring             24675428706414 # 13.6% (15.3%)
-- ucode             217739037199   #     0.1%
-- fastpath          24457689669215 #    13.5%
frontend             105015576147955 # 57.8% (65.2%) high
-- latency           88990652940450 #    49.0%
-- bandwidth         16024923207505 #     8.8%
backend              30804556660061 # 17.0% (19.1%)
-- cpu               7074311060381  #     3.9%
-- memory            23730245599680 #    13.1%
speculation          515827690784   #  0.3% ( 0.3%) low
-- branch mispredict 515481404909   #     0.3%
-- pipeline restart  346285875      #     0.0%
smt-contention       20687259190202 # 11.4% ( 0.0%)
cpu-cycles           90504608662935 # 3.21 GHz
instructions         61341111308824 # 0.68 IPC low
instructions         20314482395291 # 35.427 l2 access per 1000 inst
l2 hit from l1       575062511723   # 16.81% l2 miss
l2 miss from l1      70863633140    #
l2 hit from l2 pf    94475719768    #
l3 hit from l2 pf    35652924877    #
l3 miss from l2 pf   14483398129    #
instructions         20307336146846 # 21.070 float per 1000 inst
float 512            103            # 0.000 AVX-512 per 1000 inst
float 256            498            # 0.000 AVX-256 per 1000 inst
float 128            427884882765   # 21.070 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         0              # 0.000 scalar per 1000 inst

Intel metrics

elapsed              1286.246
on_cpu               0.786          # 12.58 / 16 cores
utime                1244.302
stime                14940.373
nvcsw                276770340      # 81.39%
nivcsw               63265868       # 18.61%
inblock              600            # 0.47/sec
onblock              3176           # 2.47/sec
cpu-clock            16176082544622 # 16176.083 seconds
task-clock           16179007940539 # 16179.008 seconds
page faults          431645         # 26.679/sec
context switches     340031467      # 21016.830/sec
cpu migrations       41546880       # 2567.950/sec
major page faults    48             # 0.003/sec
minor page faults    431597         # 26.676/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             9419060412929  # 164.638 branches per 1000 inst
branch misses        44100542658    # 0.47% branch miss
conditional          9419061039425  # 164.638 conditional branches per 1000 inst
indirect             2849081581423  # 49.800 indirect branches per 1000 inst
slots                93413724946706 #
retiring             39970625421293 # 42.8% (42.8%)
-- ucode             7326098617840  #     7.8%
-- fastpath          32644526803453 #    34.9%
frontend             40717041467788 # 43.6% (43.6%)
-- latency           18873031629963 #    20.2%
-- bandwidth         21844009837825 #    23.4%
backend              9785979087670  # 10.5% (10.5%) low
-- cpu               3709347493414  #     4.0%
-- memory            6076631594256  #     6.5%
speculation          2982448418427  #  3.2% ( 3.2%)
-- branch mispredict 2689553597192  #     2.9%
-- pipeline restart  292894821235   #     0.3%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           48091671451085 # 2.48 GHz
instructions         53251990904221 # 1.11 IPC
l2 access            909727906941   # 33.186 l2 access per 1000 inst
l2 miss              327415399029   # 35.99% l2 miss

Process overview gives 1291 as maximum number of active processes.

28052 processes
	27628 hackbench_bin        163474.49 2273324.74
	 68 clinfo                  17.20     5.65
	 38 vulkaninfo               0.76     1.52
	  6 php                      0.18     0.41
	  6 glxinfo:gdrv0            0.09     0.10
	  6 glxinfo:gl0              0.09     0.10
	  6 clang                    0.09     0.03
	  4 vulkani:disk$0           0.08     0.16
	  2 glxinfo                  0.05     0.04
	  2 glxinfo:cs0              0.05     0.04
	  2 glxinfo:disk$0           0.05     0.04
	  2 glxinfo:sh0              0.05     0.04
	  2 glxinfo:shlo0            0.05     0.04
	  2 llvmpipe-0               0.04     0.08
	  2 llvmpipe-1               0.04     0.08
	  2 llvmpipe-10              0.04     0.08
	  2 llvmpipe-11              0.04     0.08
	  2 llvmpipe-12              0.04     0.08
	  2 llvmpipe-13              0.04     0.08
	  2 llvmpipe-14              0.04     0.08
	  2 llvmpipe-15              0.04     0.08
	  2 llvmpipe-2               0.04     0.08
	  2 llvmpipe-3               0.04     0.08
	  2 llvmpipe-4               0.04     0.08
	  2 llvmpipe-5               0.04     0.08
	  2 llvmpipe-6               0.04     0.08
	  2 llvmpipe-7               0.04     0.08
	  2 llvmpipe-8               0.04     0.08
	  2 llvmpipe-9               0.04     0.08
	  3 rocminfo                 0.03     0.00
	  1 lspci                    0.00     0.03
	  1 ps                       0.00     0.01
	102 sh                       0.00     0.00
	 56 hackbench                0.00     0.00
	 13 gcc                      0.00     0.00
	  8 gsettings                0.00     0.00
	  8 stat                     0.00     0.00
	  8 systemd-detect-          0.00     0.00
	  6 llvm-link                0.00     0.00
	  5 gmain                    0.00     0.00
	  5 phoronix-test-s          0.00     0.00
	  3 dconf worker             0.00     0.00
	  2 cc                       0.00     0.00
	  2 lscpu                    0.00     0.00
	  2 uname                    0.00     0.00
	  2 which                    0.00     0.00
	  2 xset                     0.00     0.00
	  1 date                     0.00     0.00
	  1 dirname                  0.00     0.00
	  1 dmesg                    0.00     0.00
	  1 dmidecode                0.00     0.00
	  1 grep                     0.00     0.00
	  1 ifconfig                 0.00     0.00
	  1 ip                       0.00     0.00
	  1 lsmod                    0.00     0.00
	  1 mktemp                   0.00     0.00
	  1 qdbus                    0.00     0.00
	  1 readlink                 0.00     0.00
	  1 realpath                 0.00     0.00
	  1 sed                      0.00     0.00
	  1 sort                     0.00     0.00
	  1 stty                     0.00     0.00
	  1 systemctl                0.00     0.00
	  1 template.sh              0.00     0.00
	  1 wc                       0.00     0.00
	  1 xrandr                   0.00     0.00
0 processes running
1291 maximum processes

Computation structure is straightforward

      2544825) hackbench        cpu=7 start=6.59  finish=10.49
        2544826) hackbench_bin    cpu=8 start=6.60  finish=10.49
          2544827) hackbench_bin    cpu=3 start=6.60  finish=10.49
          2544828) hackbench_bin    cpu=6 start=6.60  finish=10.49
          2544829) hackbench_bin    cpu=0 start=6.60  finish=10.49
          2544830) hackbench_bin    cpu=12 start=6.60  finish=10.49
          2544831) hackbench_bin    cpu=13 start=6.60  finish=10.49
          2544832) hackbench_bin    cpu=4 start=6.60  finish=10.49
          2544833) hackbench_bin    cpu=9 start=6.60  finish=10.49
          2544834) hackbench_bin    cpu=5 start=6.60  finish=10.49
          2544835) hackbench_bin    cpu=8 start=6.60  finish=10.49
          2544836) hackbench_bin    cpu=14 start=6.60  finish=10.49
          2544837) hackbench_bin    cpu=15 start=6.60  finish=10.49
          2544838) hackbench_bin    cpu=12 start=6.60  finish=10.49
          2544839) hackbench_bin    cpu=0 start=6.60  finish=10.49
          2544840) hackbench_bin    cpu=4 start=6.60  finish=10.49
          2544841) hackbench_bin    cpu=6 start=6.60  finish=10.49
          2544842) hackbench_bin    cpu=1 start=6.60  finish=10.49
          2544843) hackbench_bin    cpu=11 start=6.60  finish=10.49
          2544844) hackbench_bin    cpu=5 start=6.60  finish=10.49
          2544845) hackbench_bin    cpu=12 start=6.60  finish=10.49
          2544846) hackbench_bin    cpu=8 start=6.60  finish=10.49
          2544847) hackbench_bin    cpu=7 start=6.60  finish=10.49
          2544848) hackbench_bin    cpu=13 start=6.60  finish=10.46
          2544849) hackbench_bin    cpu=2 start=6.60  finish=10.45
          2544850) hackbench_bin    cpu=5 start=6.60  finish=10.48
          2544851) hackbench_bin    cpu=2 start=6.60  finish=10.45
          2544852) hackbench_bin    cpu=10 start=6.60  finish=10.48
          2544853) hackbench_bin    cpu=8 start=6.60  finish=10.41
          2544854) hackbench_bin    cpu=14 start=6.60  finish=10.44
          2544855) hackbench_bin    cpu=11 start=6.60  finish=10.46
          2544856) hackbench_bin    cpu=8 start=6.60  finish=10.46
          2544857) hackbench_bin    cpu=7 start=6.60  finish=10.42
          2544858) hackbench_bin    cpu=11 start=6.60  finish=10.48
          2544859) hackbench_bin    cpu=0 start=6.60  finish=10.40
          2544860) hackbench_bin    cpu=8 start=6.60  finish=10.43
          2544861) hackbench_bin    cpu=14 start=6.60  finish=10.45
          2544862) hackbench_bin    cpu=0 start=6.60  finish=10.44
          2544863) hackbench_bin    cpu=12 start=6.60  finish=10.48
          2544864) hackbench_bin    cpu=7 start=6.60  finish=10.47
          2544865) hackbench_bin    cpu=9 start=6.60  finish=10.49
          2544866) hackbench_bin    cpu=1 start=6.60  finish=10.45