{"id":292,"date":"2024-01-06T13:20:40","date_gmt":"2024-01-06T13:20:40","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=292"},"modified":"2024-01-07T14:36:03","modified_gmt":"2024-01-07T14:36:03","slug":"compress-lz4","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/compress-lz4\/","title":{"rendered":"compress-lz4"},"content":{"rendered":"\n<p>Testing lz4 with compressing and decompressing an Ubuntu ISO file.  Very high speculation amounts and looks like the first workload (compression level 1) has different characteristics than the second (compression level 3) and third (compression level 9).  Also the metrics for compression are much slower for levels 3 and 9. Also interesting that none tests of different compression tools use the same metrics and workload so not easy to compare between tools.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-24.png\" alt=\"\" class=\"wp-image-322\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-24.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-24-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-24-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show a single-threaded workload with high branch misprediction.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              431.352\non_cpu               0.055          # 0.88 \/ 16 cores\nutime                359.005\nstime                20.407\nnvcsw                2827           # 61.94%\nnivcsw               1737           # 38.06%\ninblock              4068624        # 9432.26\/sec\nonblock              1712           # 3.97\/sec\ncpu-clock            379468394089   # 379.468 seconds\ntask-clock           379477419868   # 379.477 seconds\npage faults          13892968       # 36610.790\/sec\ncontext switches     6516           # 17.171\/sec\ncpu migrations       404            # 1.065\/sec\nmajor page faults    6              # 0.016\/sec\nminor page faults    13892962       # 36610.774\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             342994981953   # 124.827 branches per 1000 inst\nbranch misses        16699643661    # 4.87% branch miss\nconditional          328758224448   # 119.646 conditional branches per 1000 inst\nindirect             41443550       # 0.015 indirect branches per 1000 inst\ncpu-cycles           1889122196227  # 0.27 GHz\ninstructions         2737446068872  # 1.45 IPC\nslots                3782952328728  #\nretiring             846187305931   # 22.4% (22.4%)\n-- ucode             198281044      #     0.0%\n-- fastpath          845989024887   #    22.4%\nfrontend             528484472316   # 14.0% (14.0%)\n-- latency           355532417646   #     9.4%\n-- bandwidth         172952054670   #     4.6%\nbackend              1621417936769  # 42.9% (42.9%)\n-- cpu               298110098455   #     7.9%\n-- memory            1323307838314  #    35.0%\nspeculation          786707510749   # 20.8% (20.8%)\n-- branch mispredict 784728830964   #    20.7%\n-- pipeline restart  1978679785     #     0.1%\nsmt-contention       154715438      #  0.0% ( 0.0%)\ncpu-cycles           2595318207377  # 0.28 GHz\ninstructions         3853567369924  # 1.48 IPC\ninstructions         1284825674105  # 91.779 l2 access per 1000 inst\nl2 hit from l1       63536670254    # 19.27% l2 miss\nl2 miss from l1      493759169      #\nl2 hit from l2 pf    32151337359    #\nl3 hit from l2 pf    153791651      #\nl3 miss from l2 pf   22078005914    #\ninstructions         1282592117042  # 86.140 float per 1000 inst\nfloat 512            62             # 0.000 AVX-512 per 1000 inst\nfloat 256            4              # 0.000 AVX-256 per 1000 inst\nfloat 128            110482972318   # 86.140 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst<\/code><\/pre>\n\n\n\n<p>Intel metrics show an even higher level of branch misprediction and relatively higher l2 miss rate.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              491.513\non_cpu               0.055          # 0.88 \/ 16 cores\nutime                415.116\nstime                19.254\nnvcsw                4910           # 54.74%\nnivcsw               4060           # 45.26%\ninblock              4687864        # 9537.61\/sec\nonblock              1704           # 3.47\/sec\ncpu-clock            434381427808   # 434.381 seconds\ntask-clock           434392506214   # 434.393 seconds\npage faults          13889059       # 31973.523\/sec\ncontext switches     11240          # 25.875\/sec\ncpu migrations       485            # 1.117\/sec\nmajor page faults    709            # 1.632\/sec\nminor page faults    13888350       # 31971.891\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             292862196279   # 119.149 branches per 1000 inst\nbranch misses        13029570308    # 4.45% branch miss\nconditional          292862208759   # 119.149 conditional branches per 1000 inst\nindirect             87813308       # 0.036 indirect branches per 1000 inst\nslots                9596426393090  #\nretiring             2335824751615  # 24.3% (24.3%)\n-- ucode             196248424731   #     2.0%\n-- fastpath          2139576326884  #    22.3%\nfrontend             936041875040   #  9.8% ( 9.8%)\n-- latency           451564041635   #     4.7%\n-- bandwidth         484477833405   #     5.0%\nbackend              3559540182779  # 37.1% (37.1%)\n-- cpu               1019988499996  #    10.6%\n-- memory            2539551682783  #    26.5%\nspeculation          3299242768642  # 34.4% (34.4%)\n-- branch mispredict 3283919991047  #    34.2%\n-- pipeline restart  15322777595    #     0.2%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           1605722641299  # 0.21 GHz\ninstructions         2460908984915  # 1.53 IPC\nl2 access            160521933389   # 65.231 l2 access per 1000 inst\nl2 miss              83738528448    # 52.17% l2 miss<\/code><\/pre>\n\n\n\n<p>Process level metrics show just a few invocations of lz4 and otherwise test infrastructure like clinfo taking some time.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>372 processes\n\t  9 lz4                    357.43    19.55\n\t 64 clinfo                  11.52     3.52\n\t 38 vulkaninfo               0.93     0.95\n\t  6 php                      0.10     0.12\n\t  4 vulkani:disk$0           0.09     0.10\n\t  6 glxinfo:gdrv0            0.08     0.08\n\t  2 llvmpipe-0               0.05     0.05\n\t  2 llvmpipe-1               0.05     0.05\n\t  2 llvmpipe-10              0.05     0.05\n\t  2 llvmpipe-11              0.05     0.05\n\t  2 llvmpipe-12              0.05     0.05\n\t  2 llvmpipe-13              0.05     0.05\n\t  2 llvmpipe-14              0.05     0.05\n\t  2 llvmpipe-15              0.05     0.05\n\t  2 llvmpipe-2               0.05     0.05\n\t  2 llvmpipe-3               0.05     0.05\n\t  2 llvmpipe-4               0.05     0.05\n\t  2 llvmpipe-5               0.05     0.05\n\t  2 llvmpipe-6               0.05     0.05\n\t  2 llvmpipe-7               0.05     0.05\n\t  2 llvmpipe-8               0.05     0.05\n\t  2 llvmpipe-9               0.05     0.05\n\t  2 glxinfo                  0.04     0.04\n\t  2 glxinfo:cs0              0.04     0.04\n\t  2 glxinfo:disk$0           0.04     0.04\n\t  2 glxinfo:sh0              0.04     0.04\n\t  2 glxinfo:shlo0            0.04     0.04\n\t  6 clang                    0.03     0.04\n\t  1 lspci                    0.00     0.02\n\t 93 sh                       0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t 10 gsettings                0.00     0.00\n\t  9 compress-lz4             0.00     0.00\n\t  9 stty                     0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  4 gmain                    0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 dconf worker             0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 ps                       0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>The core benchmark block<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      32236) compress-lz4 start=5.34  finish=35.63\n        32237) lz4 start=5.35  finish=35.63\n      32240) compress-lz4 start=39.64 finish=70.64\n        32241) lz4 start=39.64 finish=70.64\n      32242) compress-lz4 start=74.65 finish=105.87\n        32243) lz4 start=74.65 finish=105.87\n      32244) sh start=105.88 finish=105.88\n        32245) sh start=105.88 finish=105.88\n      32246) compress-lz4 start=116.07 finish=163.21\n        32247) lz4 start=116.07 finish=163.21\n      32249) compress-lz4 start=167.22 finish=214.69\n        32250) lz4 start=167.22 finish=214.69\n      32251) compress-lz4 start=218.70 finish=266.72\n        32252) lz4 start=218.70 finish=266.72\n      32284) sh start=266.72 finish=266.73\n        32285) sh start=266.72 finish=266.72\n      32287) compress-lz4 start=276.92 finish=324.67\n        32288) lz4 start=276.92 finish=324.67\n      32289) compress-lz4 start=328.68 finish=376.55\n        32290) lz4 start=328.68 finish=376.55\n      32291) compress-lz4 start=380.56 finish=427.71\n        32292) lz4 start=380.56 finish=427.71<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Testing lz4 with compressing and decompressing an Ubuntu ISO file. Very high speculation amounts and looks like the first workload (compression level 1) has different characteristics than the second (compression level 3) and third (compression level 9). Also the metrics <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/compress-lz4\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-292","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/292","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=292"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/292\/revisions"}],"predecessor-version":[{"id":336,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/292\/revisions\/336"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=292"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}