Testing a multi-threaded x264 video encoder. This encodes a 4K image followed by a 1080p image. It seems to bounce between different numbers of runnable processes.

Topdown profile shows an almost even split between retirement and backend stalls.

AMD metrics show an average of ~10 cores busy. This is floating point code with not as many branches.
elapsed 109.681
on_cpu 0.608 # 9.72 / 16 cores
utime 1053.842
stime 12.299
nvcsw 361218 # 81.01%
nivcsw 84696 # 18.99%
inblock 3256 # 29.69/sec
onblock 12888 # 117.50/sec
cpu-clock 1066288166468 # 1066.288 seconds
task-clock 1066416971546 # 1066.417 seconds
page faults 721586 # 676.645/sec
context switches 446284 # 418.489/sec
cpu migrations 157506 # 147.696/sec
major page faults 26 # 0.024/sec
minor page faults 721560 # 676.621/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 472780029627 # 74.946 branches per 1000 inst
branch misses 12203448367 # 2.58% branch miss
conditional 277428552179 # 43.978 conditional branches per 1000 inst
indirect 57936525347 # 9.184 indirect branches per 1000 inst
cpu-cycles 3999823451251 # 2.28 GHz
instructions 6305581355914 # 1.58 IPC
slots 8002833887928 #
retiring 2225805517930 # 27.8% (38.2%)
-- ucode 18730169192 # 0.2%
-- fastpath 2207075348738 # 27.6%
frontend 1162081889243 # 14.5% (20.0%)
-- latency 844391783640 # 10.6%
-- bandwidth 317690105603 # 4.0%
backend 2194344612641 # 27.4% (37.7%)
-- cpu 776316315076 # 9.7%
-- memory 1418028297565 # 17.7%
speculation 239678438231 # 3.0% ( 4.1%)
-- branch mispredict 216097064427 # 2.7%
-- pipeline restart 23581373804 # 0.3%
smt-contention 2180884225504 # 27.3% ( 0.0%)
cpu-cycles 4004643116766 # 2.28 GHz
instructions 6306441819098 # 1.57 IPC
instructions 2101933143976 # 53.068 l2 access per 1000 inst
l2 hit from l1 80012154529 # 6.03% l2 miss
l2 miss from l1 3004489202 #
l2 hit from l2 pf 27808913292 #
l3 hit from l2 pf 1203246849 #
l3 miss from l2 pf 2520651155 #
instructions 2103726837096 # 164.465 float per 1000 inst
float 512 60 # 0.000 AVX-512 per 1000 inst
float 256 3715879476 # 1.766 AVX-256 per 1000 inst
float 128 342273836220 # 162.699 AVX-128 per 1000 inst
float MMX 0 # 0.000 MMX per 1000 inst
float scalar 0 # 0.000 scalar per 1000 inst
Intel metrics
elapsed 575.678
on_cpu 0.693 # 11.08 / 16 cores
utime 6331.656
stime 47.690
nvcsw 1493407 # 76.25%
nivcsw 465204 # 23.75%
inblock 4939192 # 8579.78/sec
onblock 2408 # 4.18/sec
cpu-clock 6378815768388 # 6378.816 seconds
task-clock 6379289177912 # 6379.289 seconds
page faults 2716174 # 425.780/sec
context switches 1961241 # 307.439/sec
cpu migrations 703420 # 110.266/sec
major page faults 15671 # 2.457/sec
minor page faults 2700503 # 423.323/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 2289382131781 # 69.437 branches per 1000 inst
branch misses 59249358432 # 2.59% branch miss
conditional 2289382171973 # 69.437 conditional branches per 1000 inst
indirect 902145820391 # 27.362 indirect branches per 1000 inst
slots 8778509383352 #
retiring 4708390332421 # 53.6% (53.6%)
-- ucode 289611573500 # 3.3%
-- fastpath 4418778758921 # 50.3%
frontend 2069165496342 # 23.6% (23.6%)
-- latency 824618892139 # 9.4%
-- bandwidth 1244546604203 # 14.2%
backend 1121944522091 # 12.8% (12.8%) low
-- cpu 574049221261 # 6.5%
-- memory 547895300830 # 6.2%
speculation 833840065989 # 9.5% ( 9.5%)
-- branch mispredict 804921122278 # 9.2%
-- pipeline restart 28918943711 # 0.3%
smt-contention 0 # 0.0% ( 0.0%)
cpu-cycles 6445844947005 # 2.15 GHz
instructions 11708266075332 # 1.82 IPC
l2 access 170225224653 # 29.698 l2 access per 1000 inst
l2 miss 16990061571 # 9.98% l2 miss
Process overview
542 processes
192 x264 32497.05 319.50
68 clinfo 16.52 6.47
38 vulkaninfo 0.94 1.33
6 glxinfo:gdrv0 0.16 0.05
6 glxinfo:gl0 0.16 0.05
4 vulkani:disk$0 0.10 0.14
6 php 0.08 0.07
2 glxinfo 0.06 0.03
2 glxinfo:cs0 0.06 0.03
2 glxinfo:disk$0 0.06 0.03
2 glxinfo:sh0 0.06 0.03
2 glxinfo:shlo0 0.06 0.03
2 llvmpipe-0 0.05 0.07
2 llvmpipe-1 0.05 0.07
2 llvmpipe-10 0.05 0.07
2 llvmpipe-11 0.05 0.07
2 llvmpipe-12 0.05 0.07
2 llvmpipe-13 0.05 0.07
2 llvmpipe-14 0.05 0.07
2 llvmpipe-15 0.05 0.07
2 llvmpipe-2 0.05 0.07
2 llvmpipe-3 0.05 0.07
2 llvmpipe-4 0.05 0.07
2 llvmpipe-5 0.05 0.07
2 llvmpipe-6 0.05 0.07
2 llvmpipe-7 0.05 0.07
2 llvmpipe-8 0.05 0.07
2 llvmpipe-9 0.05 0.07
6 clang 0.04 0.08
3 rocminfo 0.03 0.00
1 lspci 0.00 0.02
84 sh 0.00 0.00
13 gcc 0.00 0.00
12 gsettings 0.00 0.00
8 stat 0.00 0.00
8 systemd-detect- 0.00 0.00
6 llvm-link 0.00 0.00
5 phoronix-test-s 0.00 0.00
3 gmain 0.00 0.00
2 cc 0.00 0.00
2 lscpu 0.00 0.00
2 uname 0.00 0.00
2 which 0.00 0.00
2 xset 0.00 0.00
1 date 0.00 0.00
1 dconf worker 0.00 0.00
1 dirname 0.00 0.00
1 dmesg 0.00 0.00
1 dmidecode 0.00 0.00
1 grep 0.00 0.00
1 ifconfig 0.00 0.00
1 ip 0.00 0.00
1 lsmod 0.00 0.00
1 mktemp 0.00 0.00
1 ps 0.00 0.00
1 qdbus 0.00 0.00
1 readlink 0.00 0.00
1 realpath 0.00 0.00
1 sed 0.00 0.00
1 sort 0.00 0.00
1 stty 0.00 0.00
1 systemctl 0.00 0.00
1 template.sh 0.00 0.00
1 wc 0.00 0.00
1 xrandr 0.00 0.00
0 processes running
47 maximum processes
Computation blocks are straightforward
190664) x264 cpu=15 start=5.57 finish=25.65
190665) x264 cpu=15 start=5.57 finish=25.65
190666) x264 cpu=12 start=5.59 finish=25.65
190667) x264 cpu=3 start=5.59 finish=25.63
190668) x264 cpu=12 start=5.59 finish=25.63
190669) x264 cpu=2 start=5.59 finish=25.63
190670) x264 cpu=9 start=5.59 finish=25.63
190671) x264 cpu=2 start=5.59 finish=25.63
190672) x264 cpu=5 start=5.59 finish=25.63
190673) x264 cpu=11 start=5.59 finish=25.63
190674) x264 cpu=15 start=5.59 finish=25.63
190675) x264 cpu=13 start=5.59 finish=25.63
190676) x264 cpu=0 start=5.59 finish=25.63
190677) x264 cpu=13 start=5.59 finish=25.63
190678) x264 cpu=7 start=5.59 finish=25.63
190679) x264 cpu=14 start=5.59 finish=25.63
190680) x264 cpu=9 start=5.59 finish=25.63
190681) x264 cpu=7 start=5.59 finish=25.63
190682) x264 cpu=8 start=5.59 finish=25.63
190683) x264 cpu=6 start=5.59 finish=25.63
190684) x264 cpu=6 start=5.59 finish=25.63
190685) x264 cpu=5 start=5.59 finish=25.63
190686) x264 cpu=0 start=5.59 finish=25.63
190687) x264 cpu=8 start=5.59 finish=25.63
190688) x264 cpu=10 start=5.59 finish=25.63
190689) x264 cpu=1 start=5.59 finish=25.63
190690) x264 cpu=4 start=5.59 finish=25.63
190691) x264 cpu=0 start=5.59 finish=25.63
190692) x264 cpu=1 start=5.59 finish=25.63
190693) x264 cpu=11 start=5.59 finish=25.63
190694) x264 cpu=9 start=5.59 finish=25.63
190695) x264 cpu=11 start=5.60 finish=24.33
