Testing lz4 with compressing and decompressing an Ubuntu ISO file. Very high speculation amounts and looks like the first workload (compression level 1) has different characteristics than the second (compression level 3) and third (compression level 9). Also the metrics for compression are much slower for levels 3 and 9. Also interesting that none tests of different compression tools use the same metrics and workload so not easy to compare between tools.

AMD metrics show a single-threaded workload with high branch misprediction.

elapsed              431.352
on_cpu               0.055          # 0.88 / 16 cores
utime                359.005
stime                20.407
nvcsw                2827           # 61.94%
nivcsw               1737           # 38.06%
inblock              4068624        # 9432.26/sec
onblock              1712           # 3.97/sec
cpu-clock            379468394089   # 379.468 seconds
task-clock           379477419868   # 379.477 seconds
page faults          13892968       # 36610.790/sec
context switches     6516           # 17.171/sec
cpu migrations       404            # 1.065/sec
major page faults    6              # 0.016/sec
minor page faults    13892962       # 36610.774/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             342994981953   # 124.827 branches per 1000 inst
branch misses        16699643661    # 4.87% branch miss
conditional          328758224448   # 119.646 conditional branches per 1000 inst
indirect             41443550       # 0.015 indirect branches per 1000 inst
cpu-cycles           1889122196227  # 0.27 GHz
instructions         2737446068872  # 1.45 IPC
slots                3782952328728  #
retiring             846187305931   # 22.4% (22.4%)
-- ucode             198281044      #     0.0%
-- fastpath          845989024887   #    22.4%
frontend             528484472316   # 14.0% (14.0%)
-- latency           355532417646   #     9.4%
-- bandwidth         172952054670   #     4.6%
backend              1621417936769  # 42.9% (42.9%)
-- cpu               298110098455   #     7.9%
-- memory            1323307838314  #    35.0%
speculation          786707510749   # 20.8% (20.8%)
-- branch mispredict 784728830964   #    20.7%
-- pipeline restart  1978679785     #     0.1%
smt-contention       154715438      #  0.0% ( 0.0%)
cpu-cycles           2595318207377  # 0.28 GHz
instructions         3853567369924  # 1.48 IPC
instructions         1284825674105  # 91.779 l2 access per 1000 inst
l2 hit from l1       63536670254    # 19.27% l2 miss
l2 miss from l1      493759169      #
l2 hit from l2 pf    32151337359    #
l3 hit from l2 pf    153791651      #
l3 miss from l2 pf   22078005914    #
instructions         1282592117042  # 86.140 float per 1000 inst
float 512            62             # 0.000 AVX-512 per 1000 inst
float 256            4              # 0.000 AVX-256 per 1000 inst
float 128            110482972318   # 86.140 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         0              # 0.000 scalar per 1000 inst

Intel metrics show an even higher level of branch misprediction and relatively higher l2 miss rate.

elapsed              491.513
on_cpu               0.055          # 0.88 / 16 cores
utime                415.116
stime                19.254
nvcsw                4910           # 54.74%
nivcsw               4060           # 45.26%
inblock              4687864        # 9537.61/sec
onblock              1704           # 3.47/sec
cpu-clock            434381427808   # 434.381 seconds
task-clock           434392506214   # 434.393 seconds
page faults          13889059       # 31973.523/sec
context switches     11240          # 25.875/sec
cpu migrations       485            # 1.117/sec
major page faults    709            # 1.632/sec
minor page faults    13888350       # 31971.891/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             292862196279   # 119.149 branches per 1000 inst
branch misses        13029570308    # 4.45% branch miss
conditional          292862208759   # 119.149 conditional branches per 1000 inst
indirect             87813308       # 0.036 indirect branches per 1000 inst
slots                9596426393090  #
retiring             2335824751615  # 24.3% (24.3%)
-- ucode             196248424731   #     2.0%
-- fastpath          2139576326884  #    22.3%
frontend             936041875040   #  9.8% ( 9.8%)
-- latency           451564041635   #     4.7%
-- bandwidth         484477833405   #     5.0%
backend              3559540182779  # 37.1% (37.1%)
-- cpu               1019988499996  #    10.6%
-- memory            2539551682783  #    26.5%
speculation          3299242768642  # 34.4% (34.4%)
-- branch mispredict 3283919991047  #    34.2%
-- pipeline restart  15322777595    #     0.2%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           1605722641299  # 0.21 GHz
instructions         2460908984915  # 1.53 IPC
l2 access            160521933389   # 65.231 l2 access per 1000 inst
l2 miss              83738528448    # 52.17% l2 miss

Process level metrics show just a few invocations of lz4 and otherwise test infrastructure like clinfo taking some time.

372 processes
	  9 lz4                    357.43    19.55
	 64 clinfo                  11.52     3.52
	 38 vulkaninfo               0.93     0.95
	  6 php                      0.10     0.12
	  4 vulkani:disk$0           0.09     0.10
	  6 glxinfo:gdrv0            0.08     0.08
	  2 llvmpipe-0               0.05     0.05
	  2 llvmpipe-1               0.05     0.05
	  2 llvmpipe-10              0.05     0.05
	  2 llvmpipe-11              0.05     0.05
	  2 llvmpipe-12              0.05     0.05
	  2 llvmpipe-13              0.05     0.05
	  2 llvmpipe-14              0.05     0.05
	  2 llvmpipe-15              0.05     0.05
	  2 llvmpipe-2               0.05     0.05
	  2 llvmpipe-3               0.05     0.05
	  2 llvmpipe-4               0.05     0.05
	  2 llvmpipe-5               0.05     0.05
	  2 llvmpipe-6               0.05     0.05
	  2 llvmpipe-7               0.05     0.05
	  2 llvmpipe-8               0.05     0.05
	  2 llvmpipe-9               0.05     0.05
	  2 glxinfo                  0.04     0.04
	  2 glxinfo:cs0              0.04     0.04
	  2 glxinfo:disk$0           0.04     0.04
	  2 glxinfo:sh0              0.04     0.04
	  2 glxinfo:shlo0            0.04     0.04
	  6 clang                    0.03     0.04
	  1 lspci                    0.00     0.02
	 93 sh                       0.00     0.00
	 13 gcc                      0.00     0.00
	 10 gsettings                0.00     0.00
	  9 compress-lz4             0.00     0.00
	  9 stty                     0.00     0.00
	  8 stat                     0.00     0.00
	  8 systemd-detect-          0.00     0.00
	  6 llvm-link                0.00     0.00
	  5 phoronix-test-s          0.00     0.00
	  4 gmain                    0.00     0.00
	  2 cc                       0.00     0.00
	  2 dconf worker             0.00     0.00
	  2 lscpu                    0.00     0.00
	  2 uname                    0.00     0.00
	  2 which                    0.00     0.00
	  2 xset                     0.00     0.00
	  1 date                     0.00     0.00
	  1 dirname                  0.00     0.00
	  1 dmesg                    0.00     0.00
	  1 dmidecode                0.00     0.00
	  1 grep                     0.00     0.00
	  1 ifconfig                 0.00     0.00
	  1 ip                       0.00     0.00
	  1 lsmod                    0.00     0.00
	  1 mktemp                   0.00     0.00
	  1 ps                       0.00     0.00
	  1 qdbus                    0.00     0.00
	  1 readlink                 0.00     0.00
	  1 realpath                 0.00     0.00
	  1 sed                      0.00     0.00
	  1 sort                     0.00     0.00
	  1 systemctl                0.00     0.00
	  1 template.sh              0.00     0.00
	  1 wc                       0.00     0.00
	  1 xrandr                   0.00     0.00
0 processes running
47 maximum processes

The core benchmark block

      32236) compress-lz4 start=5.34  finish=35.63
        32237) lz4 start=5.35  finish=35.63
      32240) compress-lz4 start=39.64 finish=70.64
        32241) lz4 start=39.64 finish=70.64
      32242) compress-lz4 start=74.65 finish=105.87
        32243) lz4 start=74.65 finish=105.87
      32244) sh start=105.88 finish=105.88
        32245) sh start=105.88 finish=105.88
      32246) compress-lz4 start=116.07 finish=163.21
        32247) lz4 start=116.07 finish=163.21
      32249) compress-lz4 start=167.22 finish=214.69
        32250) lz4 start=167.22 finish=214.69
      32251) compress-lz4 start=218.70 finish=266.72
        32252) lz4 start=218.70 finish=266.72
      32284) sh start=266.72 finish=266.73
        32285) sh start=266.72 finish=266.72
      32287) compress-lz4 start=276.92 finish=324.67
        32288) lz4 start=276.92 finish=324.67
      32289) compress-lz4 start=328.68 finish=376.55
        32290) lz4 start=328.68 finish=376.55
      32291) compress-lz4 start=380.56 finish=427.71
        32292) lz4 start=380.56 finish=427.71