lzbench is a benchmark for compression with seven workloads. These have both compress and decompress metrics so ~14 total. This is a single-thread workload.

Topdown profile shows frontend and backend stalls about even but varying by the test case. Branch misprediction is surprisingly high.

AMD metrics show the high rate of speculation misses. This code has little floating point and a moderate level of branches.

elapsed              755.928
on_cpu               0.051          # 0.81 / 16 cores
utime                583.276
stime                32.447
nvcsw                2596           # 49.00%
nivcsw               2702           # 51.00%
inblock              0              # 0.00/sec
onblock              15424          # 20.40/sec
cpu-clock            615848223310   # 615.848 seconds
task-clock           615860546931   # 615.861 seconds
page faults          18659342       # 30297.999/sec
context switches     8845           # 14.362/sec
cpu migrations       326            # 0.529/sec
major page faults    2              # 0.003/sec
minor page faults    18659340       # 30297.995/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             783659267177   # 130.034 branches per 1000 inst
branch misses        47146996285    # 6.02% branch miss
conditional          685766650450   # 113.791 conditional branches per 1000 inst
indirect             6891243211     # 1.143 indirect branches per 1000 inst
cpu-cycles           2372502199230  # 0.23 GHz
instructions         5279491010779  # 2.23 IPC
slots                4748483394420  #
retiring             1718435481181  # 36.2% (36.2%)
-- ucode             497136752      #     0.0%
-- fastpath          1717938344429  #    36.2%
frontend             1151657223287  # 24.3% (24.3%)
-- latency           735670326708   #    15.5%
-- bandwidth         415986896579   #     8.8%
backend              1018700227325  # 21.5% (21.5%)
-- cpu               179530650466   #     3.8%
-- memory            839169576859   #    17.7%
speculation          859479089638   # 18.1% (18.1%) high
-- branch mispredict 855319116325   #    18.0%
-- pipeline restart  4159973313     #     0.1%
smt-contention       211006212      #  0.0% ( 0.0%)
cpu-cycles           2366480123372  # 0.23 GHz
instructions         5261054743756  # 2.22 IPC
instructions         1754626398723  # 24.734 l2 access per 1000 inst
l2 hit from l1       30101932020    # 19.34% l2 miss
l2 miss from l1      2239669010     #
l2 hit from l2 pf    7142938168     #
l3 hit from l2 pf    1738642924     #
l3 miss from l2 pf   4416281676     #
instructions         1754894600217  # 20.259 float per 1000 inst
float 512            67             # 0.000 AVX-512 per 1000 inst
float 256            496            # 0.000 AVX-256 per 1000 inst
float 128            35553242796    # 20.259 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         0              # 0.000 scalar per 1000 inst

Intel metrics

elapsed              6333.440
on_cpu               0.742          # 11.88 / 16 cores
utime                74750.330
stime                473.297
nvcsw                1313743        # 89.74%
nivcsw               150279         # 10.26%
inblock              30890000       # 4877.29/sec
onblock              694880         # 109.72/sec
cpu-clock            75222931189368 # 75222.931 seconds
task-clock           75223322994820 # 75223.323 seconds
page faults          85985637       # 1143.072/sec
context switches     1495425        # 19.880/sec
cpu migrations       54829          # 0.729/sec
major page faults    1233961        # 16.404/sec
minor page faults    84751671       # 1126.667/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             45503197495161 # 71.793 branches per 1000 inst
branch misses        90982505997    # 0.20% branch miss
conditional          45503197511129 # 71.793 conditional branches per 1000 inst
indirect             11993405533405 # 18.923 indirect branches per 1000 inst
slots                433838595012596 #
retiring             283286737116435 # 65.3% (65.3%) high
-- ucode             13222986405835 #     3.0%
-- fastpath          270063750710600 #    62.2%
frontend             19439778785847 #  4.5% ( 4.5%) low
-- latency           9361508360355  #     2.2%
-- bandwidth         10078270425492 #     2.3%
backend              112743375337721 # 26.0% (26.0%)
-- cpu               63018362871765 #    14.5%
-- memory            49725012465956 #    11.5%
speculation          12881927224817 #  3.0% ( 3.0%)
-- branch mispredict 12368006069230 #     2.9%
-- pipeline restart  513921155587   #     0.1%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           216187026514697 # 2.16 GHz
instructions         883987677201512 # 4.09 IPC high
l2 access            3937606297777  # 13.285 l2 access per 1000 inst
l2 miss              358227811293   # 9.10% l2 miss

Process overview is straightforward with invocations of lzbench

402 processes
	 42 lzbench                486.92    26.60
	 68 clinfo                  19.50     7.66
	 38 vulkaninfo               1.10     1.15
	  6 glxinfo:gdrv0            0.15     0.06
	  6 glxinfo:gl0              0.15     0.06
	  4 vulkani:disk$0           0.11     0.13
	  6 php                      0.08     0.25
	  2 glxinfo                  0.07     0.02
	  2 glxinfo:cs0              0.07     0.02
	  2 glxinfo:disk$0           0.07     0.02
	  2 glxinfo:sh0              0.07     0.02
	  2 glxinfo:shlo0            0.07     0.02
	  2 llvmpipe-0               0.06     0.07
	  2 llvmpipe-1               0.06     0.07
	  2 llvmpipe-2               0.06     0.07
	  2 llvmpipe-3               0.06     0.07
	  2 llvmpipe-4               0.06     0.07
	  6 clang                    0.06     0.06
	  2 llvmpipe-10              0.06     0.06
	  2 llvmpipe-11              0.06     0.06
	  2 llvmpipe-12              0.06     0.06
	  2 llvmpipe-13              0.06     0.06
	  2 llvmpipe-14              0.06     0.06
	  2 llvmpipe-15              0.06     0.06
	  2 llvmpipe-5               0.06     0.06
	  2 llvmpipe-6               0.06     0.06
	  2 llvmpipe-7               0.06     0.06
	  2 llvmpipe-8               0.06     0.06
	  2 llvmpipe-9               0.06     0.06
	  3 rocminfo                 0.03     0.00
	  1 lspci                    0.00     0.03
	  1 ps                       0.00     0.01
	 94 sh                       0.00     0.00
	 13 gcc                      0.00     0.00
	 13 gsettings                0.00     0.00
	  8 stat                     0.00     0.00
	  8 systemd-detect-          0.00     0.00
	  6 llvm-link                0.00     0.00
	  5 phoronix-test-s          0.00     0.00
	  2 cc                       0.00     0.00
	  2 gmain                    0.00     0.00
	  2 lscpu                    0.00     0.00
	  2 uname                    0.00     0.00
	  2 which                    0.00     0.00
	  2 xset                     0.00     0.00
	  1 date                     0.00     0.00
	  1 dconf worker             0.00     0.00
	  1 dirname                  0.00     0.00
	  1 dmesg                    0.00     0.00
	  1 dmidecode                0.00     0.00
	  1 grep                     0.00     0.00
	  1 ifconfig                 0.00     0.00
	  1 ip                       0.00     0.00
	  1 lsmod                    0.00     0.00
	  1 mktemp                   0.00     0.00
	  1 qdbus                    0.00     0.00
	  1 readlink                 0.00     0.00
	  1 realpath                 0.00     0.00
	  1 sed                      0.00     0.00
	  1 sort                     0.00     0.00
	  1 stty                     0.00     0.00
	  1 systemctl                0.00     0.00
	  1 template.sh              0.00     0.00
	  1 wc                       0.00     0.00
	  1 xrandr                   0.00     0.00
0 processes running
47 maximum processes

Example of computation blocks

      39349) lzbench          cpu=15 start=5.62  finish=33.77
        39350) lzbench          cpu=9 start=5.62  finish=33.77
      39353) lzbench          cpu=13 start=37.78 finish=66.05
        39354) lzbench          cpu=6 start=37.78 finish=66.04
      39355) lzbench          cpu=13 start=70.05 finish=98.24
        39356) lzbench          cpu=15 start=70.05 finish=98.24
      39357) sh               cpu=6 start=98.24 finish=98.24
        39358) sh               cpu=7 start=98.24 finish=98.24
      39359) lzbench          cpu=5 start=108.42 finish=131.47
        39360) lzbench          cpu=6 start=108.42 finish=131.46
      39361) lzbench          cpu=5 start=135.47 finish=158.54
        39362) lzbench          cpu=6 start=135.47 finish=158.54
      39363) lzbench          cpu=13 start=162.54 finish=185.54
        39364) lzbench          cpu=6 start=162.54 finish=185.54
      39366) sh               cpu=13 start=185.54 finish=185.54
        39367) sh               cpu=7 start=185.54 finish=185.54
      39368) lzbench          cpu=5 start=195.90 finish=222.22
        39369) lzbench          cpu=6 start=195.90 finish=222.22
      39370) lzbench          cpu=14 start=226.23 finish=252.51
        39371) lzbench          cpu=7 start=226.23 finish=252.51
      39374) lzbench          cpu=13 start=256.51 finish=282.70
        39375) lzbench          cpu=6 start=256.51 finish=282.70
      39416) sh               cpu=13 start=282.70 finish=282.70
        39417) sh               cpu=7 start=282.70 finish=282.70
      39418) lzbench          cpu=5 start=293.20 finish=318.00
        39419) lzbench          cpu=14 start=293.20 finish=318.00
      39420) lzbench          cpu=5 start=322.01 finish=347.21
        39421) lzbench          cpu=6 start=322.01 finish=347.20
      39422) lzbench          cpu=6 start=351.21 finish=376.21
        39423) lzbench          cpu=7 start=351.21 finish=376.21
      39424) sh               cpu=13 start=376.22 finish=376.22
        39425) sh               cpu=7 start=376.22 finish=376.22
      39427) lzbench          cpu=6 start=386.40 finish=410.11
        39428) lzbench          cpu=7 start=386.40 finish=410.10
      39429) lzbench          cpu=5 start=414.11 finish=436.48
        39430) lzbench          cpu=6 start=414.11 finish=436.48
      39431) lzbench          cpu=13 start=440.48 finish=462.81
        39432) lzbench          cpu=14 start=440.49 finish=462.81
      39433) sh               cpu=6 start=462.81 finish=462.81
        39434) sh               cpu=15 start=462.81 finish=462.81
      39435) lzbench          cpu=6 start=472.99 finish=495.27
        39436) lzbench          cpu=15 start=472.99 finish=495.27
      39437) lzbench          cpu=13 start=499.27 finish=522.41
        39438) lzbench          cpu=14 start=499.28 finish=522.40
      39439) lzbench          cpu=13 start=526.41 finish=548.59
        39440) lzbench          cpu=14 start=526.41 finish=548.59
      39441) sh               cpu=15 start=548.59 finish=548.59
        39442) sh               cpu=9 start=548.59 finish=548.59
      39443) lzbench          cpu=2 start=559.46 finish=583.09
        39444) lzbench          cpu=5 start=559.46 finish=583.08
      39447) lzbench          cpu=9 start=587.09 finish=610.72
        39448) lzbench          cpu=2 start=587.09 finish=610.71
      39449) lzbench          cpu=9 start=614.72 finish=638.31
        39450) lzbench          cpu=10 start=614.72 finish=638.31