{"id":2084,"date":"2024-03-16T09:45:06","date_gmt":"2024-03-16T09:45:06","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=2084"},"modified":"2024-03-17T19:42:40","modified_gmt":"2024-03-17T19:42:40","slug":"compress-pbzip2","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/compress-pbzip2\/","title":{"rendered":"compress-pbzip2"},"content":{"rendered":"\n<p>Compress a file using parallel bzip2 compression. There is one quick running application<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-32.png\" alt=\"\" class=\"wp-image-2093\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-32.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-32-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-32-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile us sparse ith some backend stalls<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-32.png\" alt=\"\" class=\"wp-image-2095\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-32.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-32-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-32-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show not as many backend stalls as I anticipated, otherwise a higher retirement rate.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              39.290\non_cpu               0.444          # 7.11 \/ 16 cores\nutime                275.294\nstime                4.122\nnvcsw                2607           # 54.13%\nnivcsw               2209           # 45.87%\ninblock              384            # 9.77\/sec\nonblock              12592          # 320.49\/sec\ncpu-clock            279549404186   # 279.549 seconds\ntask-clock           279554227264   # 279.554 seconds\npage faults          1526656        # 5461.037\/sec\ncontext switches     4845           # 17.331\/sec\ncpu migrations       402            # 1.438\/sec\nmajor page faults    3              # 0.011\/sec\nminor page faults    1526653        # 5461.026\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             405690431935   # 157.660 branches per 1000 inst\nbranch misses        7664670190     # 1.89% branch miss\nconditional          388947972615   # 151.153 conditional branches per 1000 inst\nindirect             48854277       # 0.019 indirect branches per 1000 inst\ncpu-cycles           1104017155061  # 1.78 GHz\ninstructions         2570541869626  # 2.33 IPC\nslots                2208850134342  #\nretiring             778802931145   # 35.3% (47.0%)\n-- ucode             32914703       #     0.0%\n-- fastpath          778770016442   #    35.3%\nfrontend             388951396379   # 17.6% (23.5%)\n-- latency           239192378484   #    10.8%\n-- bandwidth         149759017895   #     6.8%\nbackend              351953216116   # 15.9% (21.3%)\n-- cpu               76005859665    #     3.4%\n-- memory            275947356451   #    12.5%\nspeculation          135869065026   #  6.2% ( 8.2%)\n-- branch mispredict 135099694249   #     6.1%\n-- pipeline restart  769370777      #     0.0%\nsmt-contention       553271289481   # 25.0% ( 0.0%)\ncpu-cycles           1108008877477  # 1.79 GHz\ninstructions         2567804578461  # 2.32 IPC\ninstructions         857325636839   # 12.626 l2 access per 1000 inst\nl2 hit from l1       7810054367     # 29.43% l2 miss\nl2 miss from l1      1876069203     #\nl2 hit from l2 pf    1705332446     #\nl3 hit from l2 pf    987529541      #\nl3 miss from l2 pf   321847175      #\ninstructions         858401905672   # 0.556 float per 1000 inst\nfloat 512            41             # 0.000 AVX-512 per 1000 inst\nfloat 256            2              # 0.000 AVX-256 per 1000 inst\nfloat 128            476845848      # 0.556 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\ninstructions         2572295572386  #\nopcache              334110438004   # 129.888 opcache per 1000 inst\nopcache miss         3345500499     #  1.0% opcache miss rate\nl1 dTLB miss         10297934436    # 4.003 L1 dTLB per 1000 inst\nl2 dTLB miss         12199879       # 0.005 L2 dTLB per 1000 inst\ninstructions         2572580274663  #\nicache               5932167387     # 2.306 icache per 1000 inst\nicache miss          339422680      #  5.7% icache miss rate\nl1 iTLB miss         8111752        # 0.003 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            19252          # 0.000 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              217.029\non_cpu               0.454          # 7.27 \/ 16 cores\nutime                1564.660\nstime                13.047\nnvcsw                8655           # 48.53%\nnivcsw               9181           # 51.47%\ninblock              2888928        # 13311.24\/sec\nonblock              1624           # 7.48\/sec\ncpu-clock            1578186070527  # 1578.186 seconds\ntask-clock           1578198436979  # 1578.198 seconds\npage faults          7063707        # 4475.804\/sec\ncontext switches     18700          # 11.849\/sec\ncpu migrations       1288           # 0.816\/sec\nmajor page faults    4189           # 2.654\/sec\nminor page faults    7059518        # 4473.150\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             2019204514918  # 157.422 branches per 1000 inst\nbranch misses        36464472234    # 1.81% branch miss\nconditional          2019204538822  # 157.422 conditional branches per 1000 inst\nindirect             670211240004   # 52.251 indirect branches per 1000 inst\nslots                3486631743476  #\nretiring             1834568352046  # 52.6% (52.6%)\n-- ucode             50661114144    #     1.5%\n-- fastpath          1783907237902  #    51.2%\nfrontend             540684834399   # 15.5% (15.5%)\n-- latency           217279560079   #     6.2%\n-- bandwidth         323405274320   #     9.3%\nbackend              403493301076   # 11.6% (11.6%) low\n-- cpu               171285762594   #     4.9%\n-- memory            232207538482   #     6.7%\nspeculation          712333729923   # 20.4% (20.4%) high\n-- branch mispredict 706110802471   #    20.3%\n-- pipeline restart  6222927452     #     0.2%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           1087484725530  # 1.48 GHz\ninstructions         2985719942975  # 2.75 IPC\nl2 access            19524502346    # 12.381 l2 access per 1000 inst\nl2 miss              5732882298     # 29.36% l2 miss\ncpu-cycles           567498203439   # 14.2% memory latency\nload stalls          57898117945    #  0.8% l1 bound\nl1 miss              53391267178    #  7.6% l2 bound\nl2 miss              10124825349    #  1.2% l3 bound\nl3 miss              3578476694     #  0.6% dram bound\nstore_stalls         22732454902    #  4.0% store bound\n<\/code><\/pre>\n\n\n\n<p>Process overview shows pbzip as the primary process<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>411 processes\n\t 60 pbzip2                4740.02    56.19\n\t 68 clinfo                  18.37     6.99\n\t 38 vulkaninfo               1.31     1.31\n\t  6 glxinfo:gdrv0            0.16     0.06\n\t  6 glxinfo:gl0              0.15     0.06\n\t  4 vulkani:disk$0           0.13     0.13\n\t  2 glxinfo                  0.08     0.02\n\t  2 glxinfo:cs0              0.08     0.02\n\t  2 glxinfo:disk$0           0.08     0.02\n\t  2 glxinfo:sh0              0.08     0.02\n\t  2 glxinfo:shlo0            0.08     0.02\n\t  2 llvmpipe-0               0.07     0.07\n\t  2 llvmpipe-1               0.07     0.07\n\t  2 llvmpipe-10              0.07     0.07\n\t  2 llvmpipe-11              0.07     0.07\n\t  2 llvmpipe-12              0.07     0.07\n\t  2 llvmpipe-13              0.07     0.07\n\t  2 llvmpipe-14              0.07     0.07\n\t  2 llvmpipe-15              0.07     0.07\n\t  2 llvmpipe-2               0.07     0.07\n\t  2 llvmpipe-3               0.07     0.07\n\t  2 llvmpipe-4               0.07     0.07\n\t  2 llvmpipe-5               0.07     0.07\n\t  2 llvmpipe-6               0.07     0.07\n\t  2 llvmpipe-7               0.07     0.07\n\t  2 llvmpipe-8               0.07     0.07\n\t  2 llvmpipe-9               0.07     0.07\n\t  6 clang                    0.07     0.04\n\t  6 php                      0.05     0.09\n\t  3 rocminfo                 0.03     0.00\n\t  1 lspci                    0.00     0.03\n\t 82 sh                       0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t 11 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  3 compress-pbzip2          0.00     0.00\n\t  3 gmain                    0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 dconf worker             0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 ps                       0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Computation blocks.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      68277) compress-pbzip2  cpu=15 start=5.62  finish=13.27\n        68278) pbzip2           cpu=5 start=5.62  finish=13.27\n          68279) pbzip2           cpu=12 start=5.62  finish=13.27\n          68280) pbzip2           cpu=3 start=5.62  finish=13.27\n          68281) pbzip2           cpu=1 start=5.62  finish=8.53 \n          68282) pbzip2           cpu=13 start=5.62  finish=11.10\n          68283) pbzip2           cpu=0 start=5.62  finish=10.09\n          68284) pbzip2           cpu=14 start=5.62  finish=12.34\n          68285) pbzip2           cpu=5 start=5.62  finish=12.97\n          68286) pbzip2           cpu=7 start=5.62  finish=13.00\n          68287) pbzip2           cpu=12 start=5.62  finish=13.13\n          68288) pbzip2           cpu=10 start=5.62  finish=13.12\n          68289) pbzip2           cpu=9 start=5.62  finish=11.99\n          68290) pbzip2           cpu=0 start=5.62  finish=12.53\n          68291) pbzip2           cpu=11 start=5.62  finish=13.18\n          68292) pbzip2           cpu=5 start=5.62  finish=10.89\n          68293) pbzip2           cpu=1 start=5.62  finish=13.26\n          68294) pbzip2           cpu=10 start=5.62  finish=11.78\n          68295) pbzip2           cpu=5 start=5.62  finish=11.42\n          68296) pbzip2           cpu=5 start=5.62  finish=6.90 \n          68297) pbzip2           cpu=9 start=5.62  finish=13.27\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Compress a file using parallel bzip2 compression. There is one quick running application Topdown profile us sparse ith some backend stalls AMD metrics show not as many backend stalls as I anticipated, otherwise a higher retirement rate. Intel metrics Process <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/compress-pbzip2\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2084","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2084","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=2084"}],"version-history":[{"count":3,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2084\/revisions"}],"predecessor-version":[{"id":2105,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2084\/revisions\/2105"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=2084"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}