Testing a multi-threaded x264 video encoder. This encodes a 4K image followed by a 1080p image. It seems to bounce between different numbers of runnable processes.

Topdown profile shows an almost even split between retirement and backend stalls.

AMD metrics show an average of ~10 cores busy. This is floating point code with not as many branches.

elapsed              109.681
on_cpu               0.608          # 9.72 / 16 cores
utime                1053.842
stime                12.299
nvcsw                361218         # 81.01%
nivcsw               84696          # 18.99%
inblock              3256           # 29.69/sec
onblock              12888          # 117.50/sec
cpu-clock            1066288166468  # 1066.288 seconds
task-clock           1066416971546  # 1066.417 seconds
page faults          721586         # 676.645/sec
context switches     446284         # 418.489/sec
cpu migrations       157506         # 147.696/sec
major page faults    26             # 0.024/sec
minor page faults    721560         # 676.621/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             472780029627   # 74.946 branches per 1000 inst
branch misses        12203448367    # 2.58% branch miss
conditional          277428552179   # 43.978 conditional branches per 1000 inst
indirect             57936525347    # 9.184 indirect branches per 1000 inst
cpu-cycles           3999823451251  # 2.28 GHz
instructions         6305581355914  # 1.58 IPC
slots                8002833887928  #
retiring             2225805517930  # 27.8% (38.2%)
-- ucode             18730169192    #     0.2%
-- fastpath          2207075348738  #    27.6%
frontend             1162081889243  # 14.5% (20.0%)
-- latency           844391783640   #    10.6%
-- bandwidth         317690105603   #     4.0%
backend              2194344612641  # 27.4% (37.7%)
-- cpu               776316315076   #     9.7%
-- memory            1418028297565  #    17.7%
speculation          239678438231   #  3.0% ( 4.1%)
-- branch mispredict 216097064427   #     2.7%
-- pipeline restart  23581373804    #     0.3%
smt-contention       2180884225504  # 27.3% ( 0.0%)
cpu-cycles           4004643116766  # 2.28 GHz
instructions         6306441819098  # 1.57 IPC
instructions         2101933143976  # 53.068 l2 access per 1000 inst
l2 hit from l1       80012154529    # 6.03% l2 miss
l2 miss from l1      3004489202     #
l2 hit from l2 pf    27808913292    #
l3 hit from l2 pf    1203246849     #
l3 miss from l2 pf   2520651155     #
instructions         2103726837096  # 164.465 float per 1000 inst
float 512            60             # 0.000 AVX-512 per 1000 inst
float 256            3715879476     # 1.766 AVX-256 per 1000 inst
float 128            342273836220   # 162.699 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         0              # 0.000 scalar per 1000 inst

Intel metrics

elapsed              575.678
on_cpu               0.693          # 11.08 / 16 cores
utime                6331.656
stime                47.690
nvcsw                1493407        # 76.25%
nivcsw               465204         # 23.75%
inblock              4939192        # 8579.78/sec
onblock              2408           # 4.18/sec
cpu-clock            6378815768388  # 6378.816 seconds
task-clock           6379289177912  # 6379.289 seconds
page faults          2716174        # 425.780/sec
context switches     1961241        # 307.439/sec
cpu migrations       703420         # 110.266/sec
major page faults    15671          # 2.457/sec
minor page faults    2700503        # 423.323/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             2289382131781  # 69.437 branches per 1000 inst
branch misses        59249358432    # 2.59% branch miss
conditional          2289382171973  # 69.437 conditional branches per 1000 inst
indirect             902145820391   # 27.362 indirect branches per 1000 inst
slots                8778509383352  #
retiring             4708390332421  # 53.6% (53.6%)
-- ucode             289611573500   #     3.3%
-- fastpath          4418778758921  #    50.3%
frontend             2069165496342  # 23.6% (23.6%)
-- latency           824618892139   #     9.4%
-- bandwidth         1244546604203  #    14.2%
backend              1121944522091  # 12.8% (12.8%) low
-- cpu               574049221261   #     6.5%
-- memory            547895300830   #     6.2%
speculation          833840065989   #  9.5% ( 9.5%)
-- branch mispredict 804921122278   #     9.2%
-- pipeline restart  28918943711    #     0.3%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           6445844947005  # 2.15 GHz
instructions         11708266075332 # 1.82 IPC
l2 access            170225224653   # 29.698 l2 access per 1000 inst
l2 miss              16990061571    # 9.98% l2 miss

Process overview

542 processes
	192 x264                 32497.05   319.50
	 68 clinfo                  16.52     6.47
	 38 vulkaninfo               0.94     1.33
	  6 glxinfo:gdrv0            0.16     0.05
	  6 glxinfo:gl0              0.16     0.05
	  4 vulkani:disk$0           0.10     0.14
	  6 php                      0.08     0.07
	  2 glxinfo                  0.06     0.03
	  2 glxinfo:cs0              0.06     0.03
	  2 glxinfo:disk$0           0.06     0.03
	  2 glxinfo:sh0              0.06     0.03
	  2 glxinfo:shlo0            0.06     0.03
	  2 llvmpipe-0               0.05     0.07
	  2 llvmpipe-1               0.05     0.07
	  2 llvmpipe-10              0.05     0.07
	  2 llvmpipe-11              0.05     0.07
	  2 llvmpipe-12              0.05     0.07
	  2 llvmpipe-13              0.05     0.07
	  2 llvmpipe-14              0.05     0.07
	  2 llvmpipe-15              0.05     0.07
	  2 llvmpipe-2               0.05     0.07
	  2 llvmpipe-3               0.05     0.07
	  2 llvmpipe-4               0.05     0.07
	  2 llvmpipe-5               0.05     0.07
	  2 llvmpipe-6               0.05     0.07
	  2 llvmpipe-7               0.05     0.07
	  2 llvmpipe-8               0.05     0.07
	  2 llvmpipe-9               0.05     0.07
	  6 clang                    0.04     0.08
	  3 rocminfo                 0.03     0.00
	  1 lspci                    0.00     0.02
	 84 sh                       0.00     0.00
	 13 gcc                      0.00     0.00
	 12 gsettings                0.00     0.00
	  8 stat                     0.00     0.00
	  8 systemd-detect-          0.00     0.00
	  6 llvm-link                0.00     0.00
	  5 phoronix-test-s          0.00     0.00
	  3 gmain                    0.00     0.00
	  2 cc                       0.00     0.00
	  2 lscpu                    0.00     0.00
	  2 uname                    0.00     0.00
	  2 which                    0.00     0.00
	  2 xset                     0.00     0.00
	  1 date                     0.00     0.00
	  1 dconf worker             0.00     0.00
	  1 dirname                  0.00     0.00
	  1 dmesg                    0.00     0.00
	  1 dmidecode                0.00     0.00
	  1 grep                     0.00     0.00
	  1 ifconfig                 0.00     0.00
	  1 ip                       0.00     0.00
	  1 lsmod                    0.00     0.00
	  1 mktemp                   0.00     0.00
	  1 ps                       0.00     0.00
	  1 qdbus                    0.00     0.00
	  1 readlink                 0.00     0.00
	  1 realpath                 0.00     0.00
	  1 sed                      0.00     0.00
	  1 sort                     0.00     0.00
	  1 stty                     0.00     0.00
	  1 systemctl                0.00     0.00
	  1 template.sh              0.00     0.00
	  1 wc                       0.00     0.00
	  1 xrandr                   0.00     0.00
0 processes running
47 maximum processes

Computation blocks are straightforward

      190664) x264             cpu=15 start=5.57  finish=25.65
        190665) x264             cpu=15 start=5.57  finish=25.65
          190666) x264             cpu=12 start=5.59  finish=25.65
          190667) x264             cpu=3 start=5.59  finish=25.63
          190668) x264             cpu=12 start=5.59  finish=25.63
          190669) x264             cpu=2 start=5.59  finish=25.63
          190670) x264             cpu=9 start=5.59  finish=25.63
          190671) x264             cpu=2 start=5.59  finish=25.63
          190672) x264             cpu=5 start=5.59  finish=25.63
          190673) x264             cpu=11 start=5.59  finish=25.63
          190674) x264             cpu=15 start=5.59  finish=25.63
          190675) x264             cpu=13 start=5.59  finish=25.63
          190676) x264             cpu=0 start=5.59  finish=25.63
          190677) x264             cpu=13 start=5.59  finish=25.63
          190678) x264             cpu=7 start=5.59  finish=25.63
          190679) x264             cpu=14 start=5.59  finish=25.63
          190680) x264             cpu=9 start=5.59  finish=25.63
          190681) x264             cpu=7 start=5.59  finish=25.63
          190682) x264             cpu=8 start=5.59  finish=25.63
          190683) x264             cpu=6 start=5.59  finish=25.63
          190684) x264             cpu=6 start=5.59  finish=25.63
          190685) x264             cpu=5 start=5.59  finish=25.63
          190686) x264             cpu=0 start=5.59  finish=25.63
          190687) x264             cpu=8 start=5.59  finish=25.63
          190688) x264             cpu=10 start=5.59  finish=25.63
          190689) x264             cpu=1 start=5.59  finish=25.63
          190690) x264             cpu=4 start=5.59  finish=25.63
          190691) x264             cpu=0 start=5.59  finish=25.63
          190692) x264             cpu=1 start=5.59  finish=25.63
          190693) x264             cpu=11 start=5.59  finish=25.63
          190694) x264             cpu=9 start=5.59  finish=25.63
          190695) x264             cpu=11 start=5.60  finish=24.33