{"id":294,"date":"2024-01-06T13:21:13","date_gmt":"2024-01-06T13:21:13","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=294"},"modified":"2024-01-07T14:52:06","modified_gmt":"2024-01-07T14:52:06","slug":"compress-zstd","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/compress-zstd\/","title":{"rendered":"compress-zstd"},"content":{"rendered":"\n<p>Compressing and decompressing a tar file.  Seven different tests below with slightly different profiles in the testing below, useful to later separate these out. Interesting that none tests of different compression tools use the same metrics and workload so not easy to compare between tools.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-27.png\" alt=\"\" class=\"wp-image-328\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-27.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-27-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-27-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show a multi-threaded code but only 4.17 cores and not much I\/O, also a smaller amount of floating point code and otherwise memory-bound program.  The profile above shows several workloads are more memory-bound than others.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              2901.236\non_cpu               0.261          # 4.17 \/ 16 cores\nutime                12051.991\nstime                42.412\nnvcsw                761019         # 85.50%\nnivcsw               129106         # 14.50%\ninblock              416416         # 143.53\/sec\nonblock              8296           # 2.86\/sec\ncpu-clock            12088006176826 # 12088.006 seconds\ntask-clock           12089423964530 # 12089.424 seconds\npage faults          10563887       # 873.812\/sec\ncontext switches     904309         # 74.802\/sec\ncpu migrations       73731          # 6.099\/sec\nmajor page faults    19             # 0.002\/sec\nminor page faults    10563868       # 873.811\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             8500567397945  # 123.737 branches per 1000 inst\nbranch misses        235070446072   # 2.77% branch miss\nconditional          7691845876170  # 111.965 conditional branches per 1000 inst\nindirect             2728510194     # 0.040 indirect branches per 1000 inst\ncpu-cycles           64399572067291 # 1.03 GHz\ninstructions         88389737628447 # 1.37 IPC\nslots                128795715843420 #\nretiring             28442626656254 # 22.1% (24.1%)\n-- ucode             320626280      #     0.0%\n-- fastpath          28442306029974 #    22.1%\nfrontend             11861452414971 #  9.2% (10.1%)\n-- latency           7369823434362  #     5.7%\n-- bandwidth         4491628980609  #     3.5%\nbackend              68935929188030 # 53.5% (58.5%)\n-- cpu               4938287659500  #     3.8%\n-- memory            63997641528530 #    49.7%\nspeculation          8580707704693  #  6.7% ( 7.3%)\n-- branch mispredict 8456809584640  #     6.6%\n-- pipeline restart  123898120053   #     0.1%\nsmt-contention       10974862393946 #  8.5% ( 0.0%)\ncpu-cycles           55104213476379 # 1.18 GHz\ninstructions         71191798572066 # 1.29 IPC\ninstructions         23728504766934 # 34.177 l2 access per 1000 inst\nl2 hit from l1       639125597952   # 39.52% l2 miss\nl2 miss from l1      221137265226   #\nl2 hit from l2 pf    72461497838    #\nl3 hit from l2 pf    56428551426    #\nl3 miss from l2 pf   42946931535    #\ninstructions         23721123454337 # 22.889 float per 1000 inst\nfloat 512            135            # 0.000 AVX-512 per 1000 inst\nfloat 256            274            # 0.000 AVX-256 per 1000 inst\nfloat 128            542946205318   # 22.889 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              1535.496\non_cpu               0.260          # 4.16 \/ 16 cores\nutime                6370.957\nstime                15.872\nnvcsw                238930         # 79.50%\nnivcsw               61627          # 20.50%\ninblock              417560         # 271.94\/sec\nonblock              3752           # 2.44\/sec\ncpu-clock            6383952839045  # 6383.953 seconds\ntask-clock           6384328182880  # 6384.328 seconds\npage faults          6088624        # 953.683\/sec\ncontext switches     307983         # 48.240\/sec\ncpu migrations       80353          # 12.586\/sec\nmajor page faults    22             # 0.003\/sec\nminor page faults    6088602        # 953.679\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             3627813375966  # 127.466 branches per 1000 inst\nbranch misses        100424353464   # 2.77% branch miss\nconditional          3627813400830  # 127.466 conditional branches per 1000 inst\nindirect             460528146902   # 16.181 indirect branches per 1000 inst\nslots                69134049635162 #\nretiring             21718349215816 # 31.4% (31.4%)\n-- ucode             670291785478   #     1.0%\n-- fastpath          21048057430338 #    30.4%\nfrontend             4655453627152  #  6.7% ( 6.7%)\n-- latency           2225063048502  #     3.2%\n-- bandwidth         2430390578650  #     3.5%\nbackend              31176250844073 # 45.1% (45.1%)\n-- cpu               9246534763123  #    13.4%\n-- memory            21929716080950 #    31.7%\nspeculation          11778658018445 # 17.0% (17.0%)\n-- branch mispredict 11694875066244 #    16.9%\n-- pipeline restart  83782952201    #     0.1%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           22497766397352 # 0.92 GHz\ninstructions         37035269756825 # 1.65 IPC\nl2 access            554900697586   # 26.405 l2 access per 1000 inst\nl2 miss              245652286024   # 44.27% l2 miss\n<\/code><\/pre>\n\n\n\n<p>Process tree structure.  Overall we have longer runtime than with other compression tests.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>989 processes\n\t561 zstd                 145543.77   430.84\n\t 64 clinfo                  10.88     5.44\n\t 38 vulkaninfo               1.14     0.76\n\t  6 php                      0.15     0.56\n\t  4 vulkani:disk$0           0.12     0.08\n\t  6 glxinfo:gdrv0            0.09     0.09\n\t  2 llvmpipe-0               0.06     0.04\n\t  2 llvmpipe-1               0.06     0.04\n\t  2 llvmpipe-10              0.06     0.04\n\t  2 llvmpipe-11              0.06     0.04\n\t  2 llvmpipe-12              0.06     0.04\n\t  2 llvmpipe-13              0.06     0.04\n\t  2 llvmpipe-14              0.06     0.04\n\t  2 llvmpipe-15              0.06     0.04\n\t  2 llvmpipe-2               0.06     0.04\n\t  2 llvmpipe-3               0.06     0.04\n\t  2 llvmpipe-4               0.06     0.04\n\t  2 llvmpipe-5               0.06     0.04\n\t  2 llvmpipe-6               0.06     0.04\n\t  2 llvmpipe-7               0.06     0.04\n\t  2 llvmpipe-8               0.06     0.04\n\t  2 llvmpipe-9               0.06     0.04\n\t  2 glxinfo                  0.05     0.04\n\t  2 glxinfo:cs0              0.05     0.03\n\t  2 glxinfo:disk$0           0.05     0.03\n\t  2 glxinfo:sh0              0.05     0.03\n\t  2 glxinfo:shlo0            0.05     0.03\n\t  6 clang                    0.04     0.03\n\t  1 lspci                    0.01     0.03\n\t101 sh                       0.00     0.00\n\t 34 sed                      0.00     0.00\n\t 33 compress-zstd            0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t 10 gsettings                0.00     0.00\n\t  9 stty                     0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  4 gmain                    0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 dconf worker             0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 ps                       0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Core computation blocks look as follows show we start one process per core.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      43700) compress-zstd start=71.84 finish=134.40\n        43701) zstd start=71.84 finish=134.40\n          43702) zstd start=72.47 finish=134.35\n          43703) zstd start=72.47 finish=134.35\n          43704) zstd start=72.47 finish=134.35\n          43705) zstd start=72.47 finish=134.35\n          43706) zstd start=72.47 finish=134.35\n          43707) zstd start=72.47 finish=134.35\n          43708) zstd start=72.47 finish=134.35\n          43709) zstd start=72.47 finish=134.34\n          43710) zstd start=72.47 finish=134.35\n          43711) zstd start=72.47 finish=134.35\n          43712) zstd start=72.47 finish=134.35\n          43713) zstd start=72.47 finish=134.35\n          43714) zstd start=72.47 finish=134.35\n          43715) zstd start=72.47 finish=134.34\n          43716) zstd start=72.47 finish=134.34\n          43717) zstd start=72.47 finish=134.34\n        43718) sed start=134.40 finish=134.40<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Compressing and decompressing a tar file. Seven different tests below with slightly different profiles in the testing below, useful to later separate these out. Interesting that none tests of different compression tools use the same metrics and workload so not <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/compress-zstd\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-294","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/294","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=294"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/294\/revisions"}],"predecessor-version":[{"id":339,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/294\/revisions\/339"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=294"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}