{"id":306,"date":"2024-01-07T01:57:40","date_gmt":"2024-01-07T01:57:40","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=306"},"modified":"2024-01-07T14:29:11","modified_gmt":"2024-01-07T14:29:11","slug":"compress-gzip","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/compress-gzip\/","title":{"rendered":"compress-gzip"},"content":{"rendered":"\n<p>Testing gzip with an archive of the Linux source tree.  Relatively high frontend time and also higher than average speculation. Also interesting that none tests of different compression tools use the same metrics and workload so not easy to compare between tools.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-23.png\" alt=\"\" class=\"wp-image-319\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-23.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-23-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-23-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show this is a single threaded test with on_cpu = 0.97.  Branch misprediction is ~10%<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              114.506\non_cpu               0.061          # 0.97 \/ 16 cores\nutime                91.441\nstime                19.796\nnvcsw                177298         # 98.96%\nnivcsw               1867           # 1.04%\ninblock              40             # 0.35\/sec\nonblock              7227360        # 63117.86\/sec\ncpu-clock            111006148726   # 111.006 seconds\ntask-clock           111095628310   # 111.096 seconds\npage faults          153331         # 1380.171\/sec\ncontext switches     179526         # 1615.959\/sec\ncpu migrations       440            # 3.961\/sec\nmajor page faults    0              # 0.000\/sec\nminor page faults    153331         # 1380.171\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             137063645615   # 194.883 branches per 1000 inst\nbranch misses        6598040560     # 4.81% branch miss\nconditional          118795025229   # 168.908 conditional branches per 1000 inst\nindirect             130980107      # 0.186 indirect branches per 1000 inst\ncpu-cycles           523296982865   # 0.29 GHz\ninstructions         704108543986   # 1.35 IPC\nslots                1046083290948  #\nretiring             222296651323   # 21.3% (21.3%)\n-- ucode             133972432      #     0.0%\n-- fastpath          222162678891   #    21.2%\nfrontend             334207248998   # 31.9% (32.0%)\n-- latency           185991431652   #    17.8%\n-- bandwidth         148215817346   #    14.2%\nbackend              384933199811   # 36.8% (36.8%)\n-- cpu               79870268435    #     7.6%\n-- memory            305062931376   #    29.2%\nspeculation          104151488257   # 10.0% (10.0%)\n-- branch mispredict 104110788174   #    10.0%\n-- pipeline restart  40700083       #     0.0%\nsmt-contention       494316794      #  0.0% ( 0.0%)\ncpu-cycles           520290545671   # 0.27 GHz\ninstructions         700630953700   # 1.35 IPC\ninstructions         233734844896   # 65.940 l2 access per 1000 inst\nl2 hit from l1       10143531840    # 1.32% l2 miss\nl2 miss from l1      101054348      #\nl2 hit from l2 pf    5165956737     #\nl3 hit from l2 pf    52180456       #\nl3 miss from l2 pf   50850081       #\ninstructions         234427107073   # 1.198 float per 1000 inst\nfloat 512            69             # 0.000 AVX-512 per 1000 inst\nfloat 256            62             # 0.000 AVX-256 per 1000 inst\nfloat 128            280776469      # 1.198 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              135.102\non_cpu               0.060          # 0.95 \/ 16 cores\nutime                115.671\nstime                13.159\nnvcsw                180106         # 99.24%\nnivcsw               1374           # 0.76%\ninblock              424            # 3.14\/sec\nonblock              7226968        # 53492.83\/sec\ncpu-clock            127195184462   # 127.195 seconds\ntask-clock           127350251510   # 127.350 seconds\npage faults          148395         # 1165.251\/sec\ncontext switches     181959         # 1428.808\/sec\ncpu migrations       819            # 6.431\/sec\nmajor page faults    0              # 0.000\/sec\nminor page faults    148395         # 1165.251\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             133636627064   # 191.427 branches per 1000 inst\nbranch misses        4859420388     # 3.64% branch miss\nconditional          133636639640   # 191.427 conditional branches per 1000 inst\nindirect             299381650      # 0.429 indirect branches per 1000 inst\nslots                2816999297276  #\nretiring             640641585470   # 22.7% (22.7%)\n-- ucode             29076199603    #     1.0%\n-- fastpath          611565385867   #    21.7%\nfrontend             534785879010   # 19.0% (19.0%)\n-- latency           175651233196   #     6.2%\n-- bandwidth         359134645814   #    12.7%\nbackend              1033672078961  # 36.7% (36.7%)\n-- cpu               541860874031   #    19.2%\n-- memory            491811204930   #    17.5%\nspeculation          602880895949   # 21.4% (21.4%)\n-- branch mispredict 602043399503   #    21.4%\n-- pipeline restart  837496446      #     0.0%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           470155455895   # 0.22 GHz\ninstructions         698463029008   # 1.49 IPC\nl2 access            18736530988    # 26.854 l2 access per 1000 inst\nl2 miss              995215530      # 5.31% l2 miss<\/code><\/pre>\n\n\n\n<p>Process tree information shows only 4 instances of gzip and overall very short runtime the rest is mostly test suite overhead&#8230;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>378 processes\n\t  4 gzip                    88.30     1.40\n\t 64 clinfo                  10.88     3.20\n\t  4 tar                      1.40     9.06\n\t 38 vulkaninfo               0.57     1.48\n\t  2 cp                       0.23     4.22\n\t  6 glxinfo:gdrv0            0.14     0.07\n\t  6 php                      0.08     0.17\n\t  4 vulkani:disk$0           0.06     0.15\n\t  2 glxinfo                  0.06     0.03\n\t  2 glxinfo:cs0              0.06     0.03\n\t  2 glxinfo:disk$0           0.06     0.03\n\t  2 glxinfo:sh0              0.06     0.03\n\t  2 glxinfo:shlo0            0.06     0.03\n\t  7 rm                       0.05     2.99\n\t  2 llvmpipe-0               0.03     0.08\n\t  2 llvmpipe-1               0.03     0.08\n\t  2 llvmpipe-10              0.03     0.08\n\t  2 llvmpipe-11              0.03     0.08\n\t  2 llvmpipe-12              0.03     0.08\n\t  2 llvmpipe-13              0.03     0.08\n\t  2 llvmpipe-14              0.03     0.08\n\t  2 llvmpipe-15              0.03     0.08\n\t  2 llvmpipe-2               0.03     0.08\n\t  2 llvmpipe-3               0.03     0.08\n\t  2 llvmpipe-4               0.03     0.08\n\t  2 llvmpipe-5               0.03     0.08\n\t  2 llvmpipe-6               0.03     0.08\n\t  2 llvmpipe-7               0.03     0.08\n\t  2 llvmpipe-8               0.03     0.08\n\t  2 llvmpipe-9               0.03     0.08\n\t  6 clang                    0.03     0.04\n\t  1 lspci                    0.00     0.03\n\t 95 sh                       0.00     0.00\n\t 12 gcc                      0.00     0.00\n\t  9 stty                     0.00     0.00\n\t  8 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 gmain                    0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  4 bash                     0.00     0.00\n\t  3 compress-gzip            0.00     0.00\n\t  3 dconf worker             0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 cc                       0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 ps                       0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes<\/code><\/pre>\n\n\n\n<p>The core part of the benchmark is as follows<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      30297) compress-gzip start=13.71 finish=43.18\n        30298) tar start=13.71 finish=43.18\n          30299) sh start=13.71 finish=43.18\n            30300) gzip start=13.71 finish=43.18\n      30301) sh start=43.18 finish=43.24\n        30302) bash start=43.18 finish=43.24\n          30303) rm start=43.18 finish=43.23\n      30304) compress-gzip start=47.24 finish=76.22\n        30305) tar start=47.24 finish=76.22\n          30306) sh start=47.25 finish=76.22\n            30307) gzip start=47.25 finish=76.21<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Testing gzip with an archive of the Linux source tree. Relatively high frontend time and also higher than average speculation. Also interesting that none tests of different compression tools use the same metrics and workload so not easy to compare <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/compress-gzip\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-306","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/306","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=306"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/306\/revisions"}],"predecessor-version":[{"id":335,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/306\/revisions\/335"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=306"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}