Compress a file using parallel bzip2 compression. There is one quick running application

Topdown profile us sparse ith some backend stalls

AMD metrics show not as many backend stalls as I anticipated, otherwise a higher retirement rate.
elapsed 39.290
on_cpu 0.444 # 7.11 / 16 cores
utime 275.294
stime 4.122
nvcsw 2607 # 54.13%
nivcsw 2209 # 45.87%
inblock 384 # 9.77/sec
onblock 12592 # 320.49/sec
cpu-clock 279549404186 # 279.549 seconds
task-clock 279554227264 # 279.554 seconds
page faults 1526656 # 5461.037/sec
context switches 4845 # 17.331/sec
cpu migrations 402 # 1.438/sec
major page faults 3 # 0.011/sec
minor page faults 1526653 # 5461.026/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 405690431935 # 157.660 branches per 1000 inst
branch misses 7664670190 # 1.89% branch miss
conditional 388947972615 # 151.153 conditional branches per 1000 inst
indirect 48854277 # 0.019 indirect branches per 1000 inst
cpu-cycles 1104017155061 # 1.78 GHz
instructions 2570541869626 # 2.33 IPC
slots 2208850134342 #
retiring 778802931145 # 35.3% (47.0%)
-- ucode 32914703 # 0.0%
-- fastpath 778770016442 # 35.3%
frontend 388951396379 # 17.6% (23.5%)
-- latency 239192378484 # 10.8%
-- bandwidth 149759017895 # 6.8%
backend 351953216116 # 15.9% (21.3%)
-- cpu 76005859665 # 3.4%
-- memory 275947356451 # 12.5%
speculation 135869065026 # 6.2% ( 8.2%)
-- branch mispredict 135099694249 # 6.1%
-- pipeline restart 769370777 # 0.0%
smt-contention 553271289481 # 25.0% ( 0.0%)
cpu-cycles 1108008877477 # 1.79 GHz
instructions 2567804578461 # 2.32 IPC
instructions 857325636839 # 12.626 l2 access per 1000 inst
l2 hit from l1 7810054367 # 29.43% l2 miss
l2 miss from l1 1876069203 #
l2 hit from l2 pf 1705332446 #
l3 hit from l2 pf 987529541 #
l3 miss from l2 pf 321847175 #
instructions 858401905672 # 0.556 float per 1000 inst
float 512 41 # 0.000 AVX-512 per 1000 inst
float 256 2 # 0.000 AVX-256 per 1000 inst
float 128 476845848 # 0.556 AVX-128 per 1000 inst
float MMX 0 # 0.000 MMX per 1000 inst
float scalar 0 # 0.000 scalar per 1000 inst
instructions 2572295572386 #
opcache 334110438004 # 129.888 opcache per 1000 inst
opcache miss 3345500499 # 1.0% opcache miss rate
l1 dTLB miss 10297934436 # 4.003 L1 dTLB per 1000 inst
l2 dTLB miss 12199879 # 0.005 L2 dTLB per 1000 inst
instructions 2572580274663 #
icache 5932167387 # 2.306 icache per 1000 inst
icache miss 339422680 # 5.7% icache miss rate
l1 iTLB miss 8111752 # 0.003 L1 iTLB per 1000 inst
l2 iTLB miss 0 # 0.000 L2 iTLB per 1000 inst
tlb flush 19252 # 0.000 TLB flush per 1000 inst
Intel metrics
elapsed 217.029
on_cpu 0.454 # 7.27 / 16 cores
utime 1564.660
stime 13.047
nvcsw 8655 # 48.53%
nivcsw 9181 # 51.47%
inblock 2888928 # 13311.24/sec
onblock 1624 # 7.48/sec
cpu-clock 1578186070527 # 1578.186 seconds
task-clock 1578198436979 # 1578.198 seconds
page faults 7063707 # 4475.804/sec
context switches 18700 # 11.849/sec
cpu migrations 1288 # 0.816/sec
major page faults 4189 # 2.654/sec
minor page faults 7059518 # 4473.150/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 2019204514918 # 157.422 branches per 1000 inst
branch misses 36464472234 # 1.81% branch miss
conditional 2019204538822 # 157.422 conditional branches per 1000 inst
indirect 670211240004 # 52.251 indirect branches per 1000 inst
slots 3486631743476 #
retiring 1834568352046 # 52.6% (52.6%)
-- ucode 50661114144 # 1.5%
-- fastpath 1783907237902 # 51.2%
frontend 540684834399 # 15.5% (15.5%)
-- latency 217279560079 # 6.2%
-- bandwidth 323405274320 # 9.3%
backend 403493301076 # 11.6% (11.6%) low
-- cpu 171285762594 # 4.9%
-- memory 232207538482 # 6.7%
speculation 712333729923 # 20.4% (20.4%) high
-- branch mispredict 706110802471 # 20.3%
-- pipeline restart 6222927452 # 0.2%
smt-contention 0 # 0.0% ( 0.0%)
cpu-cycles 1087484725530 # 1.48 GHz
instructions 2985719942975 # 2.75 IPC
l2 access 19524502346 # 12.381 l2 access per 1000 inst
l2 miss 5732882298 # 29.36% l2 miss
cpu-cycles 567498203439 # 14.2% memory latency
load stalls 57898117945 # 0.8% l1 bound
l1 miss 53391267178 # 7.6% l2 bound
l2 miss 10124825349 # 1.2% l3 bound
l3 miss 3578476694 # 0.6% dram bound
store_stalls 22732454902 # 4.0% store bound
Process overview shows pbzip as the primary process
411 processes
60 pbzip2 4740.02 56.19
68 clinfo 18.37 6.99
38 vulkaninfo 1.31 1.31
6 glxinfo:gdrv0 0.16 0.06
6 glxinfo:gl0 0.15 0.06
4 vulkani:disk$0 0.13 0.13
2 glxinfo 0.08 0.02
2 glxinfo:cs0 0.08 0.02
2 glxinfo:disk$0 0.08 0.02
2 glxinfo:sh0 0.08 0.02
2 glxinfo:shlo0 0.08 0.02
2 llvmpipe-0 0.07 0.07
2 llvmpipe-1 0.07 0.07
2 llvmpipe-10 0.07 0.07
2 llvmpipe-11 0.07 0.07
2 llvmpipe-12 0.07 0.07
2 llvmpipe-13 0.07 0.07
2 llvmpipe-14 0.07 0.07
2 llvmpipe-15 0.07 0.07
2 llvmpipe-2 0.07 0.07
2 llvmpipe-3 0.07 0.07
2 llvmpipe-4 0.07 0.07
2 llvmpipe-5 0.07 0.07
2 llvmpipe-6 0.07 0.07
2 llvmpipe-7 0.07 0.07
2 llvmpipe-8 0.07 0.07
2 llvmpipe-9 0.07 0.07
6 clang 0.07 0.04
6 php 0.05 0.09
3 rocminfo 0.03 0.00
1 lspci 0.00 0.03
82 sh 0.00 0.00
13 gcc 0.00 0.00
11 gsettings 0.00 0.00
8 stat 0.00 0.00
8 systemd-detect- 0.00 0.00
6 llvm-link 0.00 0.00
5 phoronix-test-s 0.00 0.00
3 compress-pbzip2 0.00 0.00
3 gmain 0.00 0.00
2 cc 0.00 0.00
2 dconf worker 0.00 0.00
2 lscpu 0.00 0.00
2 uname 0.00 0.00
2 which 0.00 0.00
2 xset 0.00 0.00
1 date 0.00 0.00
1 dirname 0.00 0.00
1 dmesg 0.00 0.00
1 dmidecode 0.00 0.00
1 grep 0.00 0.00
1 ifconfig 0.00 0.00
1 ip 0.00 0.00
1 lsmod 0.00 0.00
1 mktemp 0.00 0.00
1 ps 0.00 0.00
1 qdbus 0.00 0.00
1 readlink 0.00 0.00
1 realpath 0.00 0.00
1 sed 0.00 0.00
1 sort 0.00 0.00
1 stty 0.00 0.00
1 systemctl 0.00 0.00
1 template.sh 0.00 0.00
1 wc 0.00 0.00
1 xrandr 0.00 0.00
0 processes running
47 maximum processes
Computation blocks.
68277) compress-pbzip2 cpu=15 start=5.62 finish=13.27
68278) pbzip2 cpu=5 start=5.62 finish=13.27
68279) pbzip2 cpu=12 start=5.62 finish=13.27
68280) pbzip2 cpu=3 start=5.62 finish=13.27
68281) pbzip2 cpu=1 start=5.62 finish=8.53
68282) pbzip2 cpu=13 start=5.62 finish=11.10
68283) pbzip2 cpu=0 start=5.62 finish=10.09
68284) pbzip2 cpu=14 start=5.62 finish=12.34
68285) pbzip2 cpu=5 start=5.62 finish=12.97
68286) pbzip2 cpu=7 start=5.62 finish=13.00
68287) pbzip2 cpu=12 start=5.62 finish=13.13
68288) pbzip2 cpu=10 start=5.62 finish=13.12
68289) pbzip2 cpu=9 start=5.62 finish=11.99
68290) pbzip2 cpu=0 start=5.62 finish=12.53
68291) pbzip2 cpu=11 start=5.62 finish=13.18
68292) pbzip2 cpu=5 start=5.62 finish=10.89
68293) pbzip2 cpu=1 start=5.62 finish=13.26
68294) pbzip2 cpu=10 start=5.62 finish=11.78
68295) pbzip2 cpu=5 start=5.62 finish=11.42
68296) pbzip2 cpu=5 start=5.62 finish=6.90
68297) pbzip2 cpu=9 start=5.62 finish=13.27
