compress-gzip – Performance analysis, tools and experiments

Testing gzip with an archive of the Linux source tree. Relatively high frontend time and also higher than average speculation. Also interesting that none tests of different compression tools use the same metrics and workload so not easy to compare between tools.

AMD metrics show this is a single threaded test with on_cpu = 0.97. Branch misprediction is ~10%

elapsed              114.506
on_cpu               0.061          # 0.97 / 16 cores
utime                91.441
stime                19.796
nvcsw                177298         # 98.96%
nivcsw               1867           # 1.04%
inblock              40             # 0.35/sec
onblock              7227360        # 63117.86/sec
cpu-clock            111006148726   # 111.006 seconds
task-clock           111095628310   # 111.096 seconds
page faults          153331         # 1380.171/sec
context switches     179526         # 1615.959/sec
cpu migrations       440            # 3.961/sec
major page faults    0              # 0.000/sec
minor page faults    153331         # 1380.171/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             137063645615   # 194.883 branches per 1000 inst
branch misses        6598040560     # 4.81% branch miss
conditional          118795025229   # 168.908 conditional branches per 1000 inst
indirect             130980107      # 0.186 indirect branches per 1000 inst
cpu-cycles           523296982865   # 0.29 GHz
instructions         704108543986   # 1.35 IPC
slots                1046083290948  #
retiring             222296651323   # 21.3% (21.3%)
-- ucode             133972432      #     0.0%
-- fastpath          222162678891   #    21.2%
frontend             334207248998   # 31.9% (32.0%)
-- latency           185991431652   #    17.8%
-- bandwidth         148215817346   #    14.2%
backend              384933199811   # 36.8% (36.8%)
-- cpu               79870268435    #     7.6%
-- memory            305062931376   #    29.2%
speculation          104151488257   # 10.0% (10.0%)
-- branch mispredict 104110788174   #    10.0%
-- pipeline restart  40700083       #     0.0%
smt-contention       494316794      #  0.0% ( 0.0%)
cpu-cycles           520290545671   # 0.27 GHz
instructions         700630953700   # 1.35 IPC
instructions         233734844896   # 65.940 l2 access per 1000 inst
l2 hit from l1       10143531840    # 1.32% l2 miss
l2 miss from l1      101054348      #
l2 hit from l2 pf    5165956737     #
l3 hit from l2 pf    52180456       #
l3 miss from l2 pf   50850081       #
instructions         234427107073   # 1.198 float per 1000 inst
float 512            69             # 0.000 AVX-512 per 1000 inst
float 256            62             # 0.000 AVX-256 per 1000 inst
float 128            280776469      # 1.198 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         0              # 0.000 scalar per 1000 inst

Intel metrics

elapsed              135.102
on_cpu               0.060          # 0.95 / 16 cores
utime                115.671
stime                13.159
nvcsw                180106         # 99.24%
nivcsw               1374           # 0.76%
inblock              424            # 3.14/sec
onblock              7226968        # 53492.83/sec
cpu-clock            127195184462   # 127.195 seconds
task-clock           127350251510   # 127.350 seconds
page faults          148395         # 1165.251/sec
context switches     181959         # 1428.808/sec
cpu migrations       819            # 6.431/sec
major page faults    0              # 0.000/sec
minor page faults    148395         # 1165.251/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             133636627064   # 191.427 branches per 1000 inst
branch misses        4859420388     # 3.64% branch miss
conditional          133636639640   # 191.427 conditional branches per 1000 inst
indirect             299381650      # 0.429 indirect branches per 1000 inst
slots                2816999297276  #
retiring             640641585470   # 22.7% (22.7%)
-- ucode             29076199603    #     1.0%
-- fastpath          611565385867   #    21.7%
frontend             534785879010   # 19.0% (19.0%)
-- latency           175651233196   #     6.2%
-- bandwidth         359134645814   #    12.7%
backend              1033672078961  # 36.7% (36.7%)
-- cpu               541860874031   #    19.2%
-- memory            491811204930   #    17.5%
speculation          602880895949   # 21.4% (21.4%)
-- branch mispredict 602043399503   #    21.4%
-- pipeline restart  837496446      #     0.0%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           470155455895   # 0.22 GHz
instructions         698463029008   # 1.49 IPC
l2 access            18736530988    # 26.854 l2 access per 1000 inst
l2 miss              995215530      # 5.31% l2 miss

Process tree information shows only 4 instances of gzip and overall very short runtime the rest is mostly test suite overhead…

378 processes
	  4 gzip                    88.30     1.40
	 64 clinfo                  10.88     3.20
	  4 tar                      1.40     9.06
	 38 vulkaninfo               0.57     1.48
	  2 cp                       0.23     4.22
	  6 glxinfo:gdrv0            0.14     0.07
	  6 php                      0.08     0.17
	  4 vulkani:disk$0           0.06     0.15
	  2 glxinfo                  0.06     0.03
	  2 glxinfo:cs0              0.06     0.03
	  2 glxinfo:disk$0           0.06     0.03
	  2 glxinfo:sh0              0.06     0.03
	  2 glxinfo:shlo0            0.06     0.03
	  7 rm                       0.05     2.99
	  2 llvmpipe-0               0.03     0.08
	  2 llvmpipe-1               0.03     0.08
	  2 llvmpipe-10              0.03     0.08
	  2 llvmpipe-11              0.03     0.08
	  2 llvmpipe-12              0.03     0.08
	  2 llvmpipe-13              0.03     0.08
	  2 llvmpipe-14              0.03     0.08
	  2 llvmpipe-15              0.03     0.08
	  2 llvmpipe-2               0.03     0.08
	  2 llvmpipe-3               0.03     0.08
	  2 llvmpipe-4               0.03     0.08
	  2 llvmpipe-5               0.03     0.08
	  2 llvmpipe-6               0.03     0.08
	  2 llvmpipe-7               0.03     0.08
	  2 llvmpipe-8               0.03     0.08
	  2 llvmpipe-9               0.03     0.08
	  6 clang                    0.03     0.04
	  1 lspci                    0.00     0.03
	 95 sh                       0.00     0.00
	 12 gcc                      0.00     0.00
	  9 stty                     0.00     0.00
	  8 gsettings                0.00     0.00
	  8 stat                     0.00     0.00
	  8 systemd-detect-          0.00     0.00
	  6 llvm-link                0.00     0.00
	  5 gmain                    0.00     0.00
	  5 phoronix-test-s          0.00     0.00
	  4 bash                     0.00     0.00
	  3 compress-gzip            0.00     0.00
	  3 dconf worker             0.00     0.00
	  2 lscpu                    0.00     0.00
	  2 uname                    0.00     0.00
	  2 which                    0.00     0.00
	  2 xset                     0.00     0.00
	  1 cc                       0.00     0.00
	  1 date                     0.00     0.00
	  1 dirname                  0.00     0.00
	  1 dmesg                    0.00     0.00
	  1 dmidecode                0.00     0.00
	  1 grep                     0.00     0.00
	  1 ifconfig                 0.00     0.00
	  1 ip                       0.00     0.00
	  1 lsmod                    0.00     0.00
	  1 mktemp                   0.00     0.00
	  1 ps                       0.00     0.00
	  1 qdbus                    0.00     0.00
	  1 readlink                 0.00     0.00
	  1 realpath                 0.00     0.00
	  1 sed                      0.00     0.00
	  1 sort                     0.00     0.00
	  1 systemctl                0.00     0.00
	  1 template.sh              0.00     0.00
	  1 wc                       0.00     0.00
	  1 xrandr                   0.00     0.00
0 processes running
47 maximum processes

The core part of the benchmark is as follows

      30297) compress-gzip start=13.71 finish=43.18
        30298) tar start=13.71 finish=43.18
          30299) sh start=13.71 finish=43.18
            30300) gzip start=13.71 finish=43.18
      30301) sh start=43.18 finish=43.24
        30302) bash start=43.18 finish=43.24
          30303) rm start=43.18 finish=43.23
      30304) compress-gzip start=47.24 finish=76.22
        30305) tar start=47.24 finish=76.22
          30306) sh start=47.25 finish=76.22
            30307) gzip start=47.25 finish=76.21