Compress a file using parallel bzip2 compression. There is one quick running application

Topdown profile us sparse ith some backend stalls

AMD metrics show not as many backend stalls as I anticipated, otherwise a higher retirement rate.

elapsed              39.290
on_cpu               0.444          # 7.11 / 16 cores
utime                275.294
stime                4.122
nvcsw                2607           # 54.13%
nivcsw               2209           # 45.87%
inblock              384            # 9.77/sec
onblock              12592          # 320.49/sec
cpu-clock            279549404186   # 279.549 seconds
task-clock           279554227264   # 279.554 seconds
page faults          1526656        # 5461.037/sec
context switches     4845           # 17.331/sec
cpu migrations       402            # 1.438/sec
major page faults    3              # 0.011/sec
minor page faults    1526653        # 5461.026/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             405690431935   # 157.660 branches per 1000 inst
branch misses        7664670190     # 1.89% branch miss
conditional          388947972615   # 151.153 conditional branches per 1000 inst
indirect             48854277       # 0.019 indirect branches per 1000 inst
cpu-cycles           1104017155061  # 1.78 GHz
instructions         2570541869626  # 2.33 IPC
slots                2208850134342  #
retiring             778802931145   # 35.3% (47.0%)
-- ucode             32914703       #     0.0%
-- fastpath          778770016442   #    35.3%
frontend             388951396379   # 17.6% (23.5%)
-- latency           239192378484   #    10.8%
-- bandwidth         149759017895   #     6.8%
backend              351953216116   # 15.9% (21.3%)
-- cpu               76005859665    #     3.4%
-- memory            275947356451   #    12.5%
speculation          135869065026   #  6.2% ( 8.2%)
-- branch mispredict 135099694249   #     6.1%
-- pipeline restart  769370777      #     0.0%
smt-contention       553271289481   # 25.0% ( 0.0%)
cpu-cycles           1108008877477  # 1.79 GHz
instructions         2567804578461  # 2.32 IPC
instructions         857325636839   # 12.626 l2 access per 1000 inst
l2 hit from l1       7810054367     # 29.43% l2 miss
l2 miss from l1      1876069203     #
l2 hit from l2 pf    1705332446     #
l3 hit from l2 pf    987529541      #
l3 miss from l2 pf   321847175      #
instructions         858401905672   # 0.556 float per 1000 inst
float 512            41             # 0.000 AVX-512 per 1000 inst
float 256            2              # 0.000 AVX-256 per 1000 inst
float 128            476845848      # 0.556 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         0              # 0.000 scalar per 1000 inst
instructions         2572295572386  #
opcache              334110438004   # 129.888 opcache per 1000 inst
opcache miss         3345500499     #  1.0% opcache miss rate
l1 dTLB miss         10297934436    # 4.003 L1 dTLB per 1000 inst
l2 dTLB miss         12199879       # 0.005 L2 dTLB per 1000 inst
instructions         2572580274663  #
icache               5932167387     # 2.306 icache per 1000 inst
icache miss          339422680      #  5.7% icache miss rate
l1 iTLB miss         8111752        # 0.003 L1 iTLB per 1000 inst
l2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst
tlb flush            19252          # 0.000 TLB flush per 1000 inst

Intel metrics

elapsed              217.029
on_cpu               0.454          # 7.27 / 16 cores
utime                1564.660
stime                13.047
nvcsw                8655           # 48.53%
nivcsw               9181           # 51.47%
inblock              2888928        # 13311.24/sec
onblock              1624           # 7.48/sec
cpu-clock            1578186070527  # 1578.186 seconds
task-clock           1578198436979  # 1578.198 seconds
page faults          7063707        # 4475.804/sec
context switches     18700          # 11.849/sec
cpu migrations       1288           # 0.816/sec
major page faults    4189           # 2.654/sec
minor page faults    7059518        # 4473.150/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             2019204514918  # 157.422 branches per 1000 inst
branch misses        36464472234    # 1.81% branch miss
conditional          2019204538822  # 157.422 conditional branches per 1000 inst
indirect             670211240004   # 52.251 indirect branches per 1000 inst
slots                3486631743476  #
retiring             1834568352046  # 52.6% (52.6%)
-- ucode             50661114144    #     1.5%
-- fastpath          1783907237902  #    51.2%
frontend             540684834399   # 15.5% (15.5%)
-- latency           217279560079   #     6.2%
-- bandwidth         323405274320   #     9.3%
backend              403493301076   # 11.6% (11.6%) low
-- cpu               171285762594   #     4.9%
-- memory            232207538482   #     6.7%
speculation          712333729923   # 20.4% (20.4%) high
-- branch mispredict 706110802471   #    20.3%
-- pipeline restart  6222927452     #     0.2%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           1087484725530  # 1.48 GHz
instructions         2985719942975  # 2.75 IPC
l2 access            19524502346    # 12.381 l2 access per 1000 inst
l2 miss              5732882298     # 29.36% l2 miss
cpu-cycles           567498203439   # 14.2% memory latency
load stalls          57898117945    #  0.8% l1 bound
l1 miss              53391267178    #  7.6% l2 bound
l2 miss              10124825349    #  1.2% l3 bound
l3 miss              3578476694     #  0.6% dram bound
store_stalls         22732454902    #  4.0% store bound

Process overview shows pbzip as the primary process

411 processes
	 60 pbzip2                4740.02    56.19
	 68 clinfo                  18.37     6.99
	 38 vulkaninfo               1.31     1.31
	  6 glxinfo:gdrv0            0.16     0.06
	  6 glxinfo:gl0              0.15     0.06
	  4 vulkani:disk$0           0.13     0.13
	  2 glxinfo                  0.08     0.02
	  2 glxinfo:cs0              0.08     0.02
	  2 glxinfo:disk$0           0.08     0.02
	  2 glxinfo:sh0              0.08     0.02
	  2 glxinfo:shlo0            0.08     0.02
	  2 llvmpipe-0               0.07     0.07
	  2 llvmpipe-1               0.07     0.07
	  2 llvmpipe-10              0.07     0.07
	  2 llvmpipe-11              0.07     0.07
	  2 llvmpipe-12              0.07     0.07
	  2 llvmpipe-13              0.07     0.07
	  2 llvmpipe-14              0.07     0.07
	  2 llvmpipe-15              0.07     0.07
	  2 llvmpipe-2               0.07     0.07
	  2 llvmpipe-3               0.07     0.07
	  2 llvmpipe-4               0.07     0.07
	  2 llvmpipe-5               0.07     0.07
	  2 llvmpipe-6               0.07     0.07
	  2 llvmpipe-7               0.07     0.07
	  2 llvmpipe-8               0.07     0.07
	  2 llvmpipe-9               0.07     0.07
	  6 clang                    0.07     0.04
	  6 php                      0.05     0.09
	  3 rocminfo                 0.03     0.00
	  1 lspci                    0.00     0.03
	 82 sh                       0.00     0.00
	 13 gcc                      0.00     0.00
	 11 gsettings                0.00     0.00
	  8 stat                     0.00     0.00
	  8 systemd-detect-          0.00     0.00
	  6 llvm-link                0.00     0.00
	  5 phoronix-test-s          0.00     0.00
	  3 compress-pbzip2          0.00     0.00
	  3 gmain                    0.00     0.00
	  2 cc                       0.00     0.00
	  2 dconf worker             0.00     0.00
	  2 lscpu                    0.00     0.00
	  2 uname                    0.00     0.00
	  2 which                    0.00     0.00
	  2 xset                     0.00     0.00
	  1 date                     0.00     0.00
	  1 dirname                  0.00     0.00
	  1 dmesg                    0.00     0.00
	  1 dmidecode                0.00     0.00
	  1 grep                     0.00     0.00
	  1 ifconfig                 0.00     0.00
	  1 ip                       0.00     0.00
	  1 lsmod                    0.00     0.00
	  1 mktemp                   0.00     0.00
	  1 ps                       0.00     0.00
	  1 qdbus                    0.00     0.00
	  1 readlink                 0.00     0.00
	  1 realpath                 0.00     0.00
	  1 sed                      0.00     0.00
	  1 sort                     0.00     0.00
	  1 stty                     0.00     0.00
	  1 systemctl                0.00     0.00
	  1 template.sh              0.00     0.00
	  1 wc                       0.00     0.00
	  1 xrandr                   0.00     0.00
0 processes running
47 maximum processes

Computation blocks.

      68277) compress-pbzip2  cpu=15 start=5.62  finish=13.27
        68278) pbzip2           cpu=5 start=5.62  finish=13.27
          68279) pbzip2           cpu=12 start=5.62  finish=13.27
          68280) pbzip2           cpu=3 start=5.62  finish=13.27
          68281) pbzip2           cpu=1 start=5.62  finish=8.53 
          68282) pbzip2           cpu=13 start=5.62  finish=11.10
          68283) pbzip2           cpu=0 start=5.62  finish=10.09
          68284) pbzip2           cpu=14 start=5.62  finish=12.34
          68285) pbzip2           cpu=5 start=5.62  finish=12.97
          68286) pbzip2           cpu=7 start=5.62  finish=13.00
          68287) pbzip2           cpu=12 start=5.62  finish=13.13
          68288) pbzip2           cpu=10 start=5.62  finish=13.12
          68289) pbzip2           cpu=9 start=5.62  finish=11.99
          68290) pbzip2           cpu=0 start=5.62  finish=12.53
          68291) pbzip2           cpu=11 start=5.62  finish=13.18
          68292) pbzip2           cpu=5 start=5.62  finish=10.89
          68293) pbzip2           cpu=1 start=5.62  finish=13.26
          68294) pbzip2           cpu=10 start=5.62  finish=11.78
          68295) pbzip2           cpu=5 start=5.62  finish=11.42
          68296) pbzip2           cpu=5 start=5.62  finish=6.90 
          68297) pbzip2           cpu=9 start=5.62  finish=13.27