{"id":1450,"date":"2024-02-03T23:21:36","date_gmt":"2024-02-03T23:21:36","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=1450"},"modified":"2024-02-07T03:36:36","modified_gmt":"2024-02-07T03:36:36","slug":"blosc","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/blosc\/","title":{"rendered":"blosc"},"content":{"rendered":"\n<p>blosc is a data store library for C that compresses binary data. This runs 18 different workloads with a variety of buffer sizes and algorithms. Looks like these run moderately quickly with a variable number of threads but single-threaded is most common.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-30.png\" alt=\"\" class=\"wp-image-1558\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-30.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-30-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-30-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile also shows metrics smeared across with some segments of almost 90% frontend or backend stalls and the retirement rate also variable. There also seem to be occasional stripes of downward retirement rates. Some of this likely easier to see with fewer than the 18 test cases,<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-32.png\" alt=\"\" class=\"wp-image-1561\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-32.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-32-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-32-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics provide a composite of numbers above. On average two cores are kept busy. There are a high rate of page faults. The average retirement rate is high and backend memory stalls are the largest culprit. The is some floating point code and some L2 access.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              987.760\non_cpu               0.162          # 2.58 \/ 16 cores\nutime                2314.586\nstime                238.219\nnvcsw                4504420        # 99.74%\nnivcsw               11524          # 0.26%\ninblock              0              # 0.00\/sec\nonblock              25688          # 26.01\/sec\ncpu-clock            2556766690317  # 2556.767 seconds\ntask-clock           2558049210877  # 2558.049 seconds\npage faults          120445947      # 47085.078\/sec\ncontext switches     4520560        # 1767.190\/sec\ncpu migrations       79117          # 30.929\/sec\nmajor page faults    2              # 0.001\/sec\nminor page faults    120445945      # 47085.077\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             1709968636406  # 153.759 branches per 1000 inst\nbranch misses        22626348307    # 1.32% branch miss\nconditional          1548644588767  # 139.253 conditional branches per 1000 inst\nindirect             1238978398     # 0.111 indirect branches per 1000 inst\ncpu-cycles           10248138945269 # 0.70 GHz\ninstructions         9924240533970  # 0.97 IPC\nslots                20670495263052 #\nretiring             3187699511326  # 15.4% (16.1%)\n-- ucode             1905874547     #     0.0%\n-- fastpath          3185793636779  #    15.4%\nfrontend             4160988424228  # 20.1% (21.0%)\n-- latency           1279637073474  #     6.2%\n-- bandwidth         2881351350754  #    13.9%\nbackend              12378049471540 # 59.9% (62.5%)\n-- cpu               2368053514869  #    11.5%\n-- memory            10009995956671 #    48.4%\nspeculation          82492660206    #  0.4% ( 0.4%) low\n-- branch mispredict 81067482527    #     0.4%\n-- pipeline restart  1425177679     #     0.0%\nsmt-contention       861159583090   #  4.2% ( 0.0%)\ncpu-cycles           10255907734990 # 0.71 GHz\ninstructions         9936437247179  # 0.97 IPC\ninstructions         3320711390252  # 85.963 l2 access per 1000 inst\nl2 hit from l1       180588408594   # 17.83% l2 miss\nl2 miss from l1      9329067623     #\nl2 hit from l2 pf    63289259950    #\nl3 hit from l2 pf    14279437545    #\nl3 miss from l2 pf   27299697284    #\ninstructions         3308296933757  # 89.615 float per 1000 inst\nfloat 512            128            # 0.000 AVX-512 per 1000 inst\nfloat 256            420            # 0.000 AVX-256 per 1000 inst\nfloat 128            296472821770   # 89.615 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\ninstructions         2690265        #\nopcache              1005701        # 373.830 opcache per 1000 inst\nopcache miss         537919         # 53.5% opcache miss rate\nl1 dTLB miss         5199           # 1.933 L1 dTLB per 1000 inst\nl2 dTLB miss         1138           # 0.423 L2 dTLB per 1000 inst\ninstructions         2718101        #\nicache               1334928        # 491.125 icache per 1000 inst\nicache miss          112953         #  8.5% icache miss rate\nl1 iTLB miss         9              # 0.003 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            19             # 0.007 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics give more clues with L3 and dram the largest contributors on the memory stalls.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              1185.087\non_cpu               0.224          # 3.59 \/ 16 cores\nutime                4036.438\nstime                217.599\nnvcsw                3545389        # 97.41%\nnivcsw               94223          # 2.59%\ninblock              3232           # 2.73\/sec\nonblock              13408          # 11.31\/sec\ncpu-clock            4235117712182  # 4235.118 seconds\ntask-clock           4238670444667  # 4238.670 seconds\npage faults          113600453      # 26800.964\/sec\ncontext switches     3645262        # 860.001\/sec\ncpu migrations       236530         # 55.803\/sec\nmajor page faults    9              # 0.002\/sec\nminor page faults    113600444      # 26800.962\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             1565019616139  # 149.293 branches per 1000 inst\nbranch misses        2447114186     # 0.16% branch miss\nconditional          1565019865131  # 149.293 conditional branches per 1000 inst\nindirect             205635023039   # 19.616 indirect branches per 1000 inst\nslots                21419426156342 #\nretiring             7365886196434  # 34.4% (34.4%)\n-- ucode             335345158142   #     1.6%\n-- fastpath          7030541038292  #    32.8%\nfrontend             1845138431656  #  8.6% ( 8.6%)\n-- latency           807731404302   #     3.8%\n-- bandwidth         1037407027354  #     4.8%\nbackend              11990871392273 # 56.0% (56.0%)\n-- cpu               1992105550930  #     9.3%\n-- memory            9998765841343  #    46.7%\nspeculation          328808935666   #  1.5% ( 1.5%)\n-- branch mispredict 240387547131   #     1.1%\n-- pipeline restart  88421388535    #     0.4%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           7975989603086  # 0.42 GHz\ninstructions         15222097361603 # 1.91 IPC\nl2 access            466905689925   # 60.108 l2 access per 1000 inst\nl2 miss              209008506469   # 44.76% l2 miss\ncpu-cycles           4128004411313  # 51.2% memory latency\nload stalls          1623882878339  #  0.0% l1 bound\nl1 miss              1652161569527  # 16.6% l2 bound\nl2 miss              966736683768   #  7.7% l3 bound\nl3 miss              649440877367   # 15.7% dram bound\nstore_stalls         488044403173   # 11.8% store bound\n<\/code><\/pre>\n\n\n\n<p>Process overview says b2bench is the primary driver.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>7780 processes\n\t7344 b2bench              170576.84 19068.19\n\t 68 clinfo                  17.20     5.32\n\t 38 vulkaninfo               0.95     1.33\n\t  6 php                      0.16     0.42\n\t  6 glxinfo:gdrv0            0.12     0.06\n\t  6 glxinfo:gl0              0.12     0.06\n\t  4 vulkani:disk$0           0.10     0.14\n\t  2 glxinfo                  0.06     0.02\n\t  2 glxinfo:cs0              0.06     0.02\n\t  2 glxinfo:disk$0           0.06     0.02\n\t  2 glxinfo:sh0              0.06     0.02\n\t  2 glxinfo:shlo0            0.06     0.02\n\t  6 clang                    0.05     0.07\n\t  2 llvmpipe-0               0.05     0.07\n\t  2 llvmpipe-1               0.05     0.07\n\t  2 llvmpipe-10              0.05     0.07\n\t  2 llvmpipe-11              0.05     0.07\n\t  2 llvmpipe-12              0.05     0.07\n\t  2 llvmpipe-13              0.05     0.07\n\t  2 llvmpipe-14              0.05     0.07\n\t  2 llvmpipe-15              0.05     0.07\n\t  2 llvmpipe-2               0.05     0.07\n\t  2 llvmpipe-3               0.05     0.07\n\t  2 llvmpipe-4               0.05     0.07\n\t  2 llvmpipe-5               0.05     0.07\n\t  2 llvmpipe-6               0.05     0.07\n\t  2 llvmpipe-7               0.05     0.07\n\t  2 llvmpipe-8               0.05     0.07\n\t  2 llvmpipe-9               0.05     0.07\n\t  3 rocminfo                 0.03     0.00\n\t  1 lspci                    0.00     0.02\n\t  1 ps                       0.00     0.01\n\t116 sh                       0.00     0.00\n\t 54 blosc                    0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t  9 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 gmain                    0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 dconf worker             0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Computation structures have many threads started<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      710701) blosc            cpu=8 start=5.71  finish=12.38\n        710702) b2bench          cpu=0 start=5.71  finish=12.37\n          710703) b2bench          cpu=3 start=6.80  finish=7.34 \n          710704) b2bench          cpu=4 start=6.80  finish=7.34 \n          710705) b2bench          cpu=13 start=7.34  finish=7.79 \n          710706) b2bench          cpu=7 start=7.35  finish=7.79 \n          710707) b2bench          cpu=14 start=7.35  finish=7.79 \n          710708) b2bench          cpu=11 start=7.79  finish=8.17 \n          710709) b2bench          cpu=4 start=7.79  finish=8.17 \n          710710) b2bench          cpu=8 start=7.79  finish=8.16 \n          710711) b2bench          cpu=1 start=7.79  finish=8.17 \n          710714) b2bench          cpu=5 start=8.17  finish=8.53 \n          710715) b2bench          cpu=6 start=8.17  finish=8.53 \n          710716) b2bench          cpu=15 start=8.17  finish=8.53 \n          710717) b2bench          cpu=3 start=8.17  finish=8.53 \n          710718) b2bench          cpu=4 start=8.17  finish=8.53 \n          710719) b2bench          cpu=9 start=8.53  finish=8.89 \n          710720) b2bench          cpu=7 start=8.53  finish=8.89 \n          710721) b2bench          cpu=4 start=8.53  finish=8.89 \n          710722) b2bench          cpu=5 start=8.53  finish=8.89 \n          710723) b2bench          cpu=8 start=8.53  finish=8.89 \n          710724) b2bench          cpu=6 start=8.53  finish=8.89 \n          710725) b2bench          cpu=11 start=8.89  finish=9.24 \n          710726) b2bench          cpu=4 start=8.89  finish=9.24 \n          710727) b2bench          cpu=8 start=8.89  finish=9.24 \n          710728) b2bench          cpu=12 start=8.89  finish=9.24 \n          710729) b2bench          cpu=6 start=8.89  finish=9.24 \n          710730) b2bench          cpu=1 start=8.89  finish=9.24 \n          710731) b2bench          cpu=5 start=8.89  finish=9.24 \n          710732) b2bench          cpu=11 start=9.24  finish=9.56 \n          710733) b2bench          cpu=10 start=9.24  finish=9.56 \n          710734) b2bench          cpu=13 start=9.24  finish=9.56 \n          710735) b2bench          cpu=6 start=9.24  finish=9.56 \n          710736) b2bench          cpu=7 start=9.24  finish=9.56 \n          710737) b2bench          cpu=1 start=9.24  finish=9.56 \n          710738) b2bench          cpu=8 start=9.24  finish=9.56 \n          710739) b2bench          cpu=12 start=9.24  finish=9.56 \n          710740) b2bench          cpu=0 start=9.56  finish=9.92 \n          710741) b2bench          cpu=5 start=9.56  finish=9.92 \n          710742) b2bench          cpu=12 start=9.56  finish=9.92 \n          710743) b2bench          cpu=9 start=9.56  finish=9.92 \n          710744) b2bench          cpu=10 start=9.56  finish=9.92 \n          710745) b2bench          cpu=6 start=9.56  finish=9.91 \n          710746) b2bench          cpu=10 start=9.56  finish=9.91 \n          710747) b2bench          cpu=15 start=9.56  finish=9.91 \n          710748) b2bench          cpu=2 start=9.56  finish=9.92 \n          710749) b2bench          cpu=15 start=9.92  finish=10.27\n          710750) b2bench          cpu=6 start=9.92  finish=10.27\n          710751) b2bench          cpu=12 start=9.92  finish=10.27\n          710752) b2bench          cpu=5 start=9.92  finish=10.27\n          710753) b2bench          cpu=8 start=9.92  finish=10.27\n          710754) b2bench          cpu=1 start=9.92  finish=10.27\n          710755) b2bench          cpu=0 start=9.92  finish=10.27\n          710756) b2bench          cpu=10 start=9.92  finish=10.27\n          710757) b2bench          cpu=3 start=9.92  finish=10.27\n          710758) b2bench          cpu=4 start=9.92  finish=10.27\n          710759) b2bench          cpu=15 start=10.27 finish=10.62\n          710760) b2bench          cpu=3 start=10.27 finish=10.61\n          710761) b2bench          cpu=4 start=10.27 finish=10.61\n          710762) b2bench          cpu=12 start=10.27 finish=10.61\n          710763) b2bench          cpu=10 start=10.27 finish=10.62\n          710764) b2bench          cpu=9 start=10.27 finish=10.62\n          710765) b2bench          cpu=0 start=10.27 finish=10.61\n          710766) b2bench          cpu=11 start=10.28 finish=10.61\n          710767) b2bench          cpu=14 start=10.28 finish=10.62\n          710768) b2bench          cpu=5 start=10.28 finish=10.61\n          710769) b2bench          cpu=1 start=10.28 finish=10.62\n          710770) b2bench          cpu=1 start=10.62 finish=10.96\n          710771) b2bench          cpu=0 start=10.62 finish=10.96\n          710772) b2bench          cpu=1 start=10.62 finish=10.96\n          710773) b2bench          cpu=10 start=10.62 finish=10.96\n          710774) b2bench          cpu=15 start=10.62 finish=10.96\n          710775) b2bench          cpu=3 start=10.62 finish=10.96\n          710776) b2bench          cpu=4 start=10.62 finish=10.96\n          710777) b2bench          cpu=7 start=10.62 finish=10.96\n          710778) b2bench          cpu=11 start=10.62 finish=10.96\n          710779) b2bench          cpu=6 start=10.62 finish=10.96\n          710780) b2bench          cpu=8 start=10.62 finish=10.96\n          710781) b2bench          cpu=12 start=10.62 finish=10.96\n          710782) b2bench          cpu=1 start=10.96 finish=11.32\n          710783) b2bench          cpu=9 start=10.96 finish=11.32\n          710784) b2bench          cpu=10 start=10.96 finish=11.32\n          710785) b2bench          cpu=13 start=10.96 finish=11.32\n          710786) b2bench          cpu=3 start=10.96 finish=11.32\n          710787) b2bench          cpu=12 start=10.96 finish=11.32\n          710788) b2bench          cpu=11 start=10.96 finish=11.32\n          710789) b2bench          cpu=5 start=10.96 finish=11.32\n          710790) b2bench          cpu=8 start=10.96 finish=11.32\n          710791) b2bench          cpu=15 start=10.96 finish=11.32\n          710792) b2bench          cpu=0 start=10.96 finish=11.32\n          710793) b2bench          cpu=4 start=10.96 finish=11.32\n          710794) b2bench          cpu=14 start=10.96 finish=11.32\n          710795) b2bench          cpu=11 start=11.32 finish=11.67\n          710796) b2bench          cpu=8 start=11.32 finish=11.66\n          710797) b2bench          cpu=6 start=11.32 finish=11.67\n          710798) b2bench          cpu=2 start=11.32 finish=11.67\n          710799) b2bench          cpu=5 start=11.32 finish=11.67\n          710800) b2bench          cpu=1 start=11.32 finish=11.67\n          710801) b2bench          cpu=15 start=11.32 finish=11.67\n          710802) b2bench          cpu=12 start=11.32 finish=11.67\n          710803) b2bench          cpu=9 start=11.32 finish=11.67\n          710804) b2bench          cpu=3 start=11.32 finish=11.67\n          710805) b2bench          cpu=14 start=11.32 finish=11.67\n          710806) b2bench          cpu=4 start=11.32 finish=11.67\n          710807) b2bench          cpu=10 start=11.32 finish=11.67\n          710808) b2bench          cpu=7 start=11.32 finish=11.67\n          710809) b2bench          cpu=13 start=11.67 finish=12.02\n          710810) b2bench          cpu=3 start=11.67 finish=12.02\n          710811) b2bench          cpu=15 start=11.67 finish=12.02\n          710812) b2bench          cpu=0 start=11.67 finish=12.02\n          710813) b2bench          cpu=7 start=11.67 finish=12.02\n          710814) b2bench          cpu=14 start=11.67 finish=12.02\n          710815) b2bench          cpu=5 start=11.67 finish=12.02\n          710816) b2bench          cpu=12 start=11.67 finish=12.02\n          710817) b2bench          cpu=9 start=11.67 finish=12.02\n          710818) b2bench          cpu=11 start=11.67 finish=12.01\n          710819) b2bench          cpu=1 start=11.67 finish=12.02\n          710820) b2bench          cpu=2 start=11.67 finish=12.01\n          710821) b2bench          cpu=6 start=11.67 finish=12.02\n          710822) b2bench          cpu=8 start=11.67 finish=12.02\n          710823) b2bench          cpu=4 start=11.67 finish=12.02\n          710824) b2bench          cpu=1 start=12.02 finish=12.37\n          710825) b2bench          cpu=11 start=12.02 finish=12.37\n          710826) b2bench          cpu=0 start=12.02 finish=12.37\n          710827) b2bench          cpu=6 start=12.02 finish=12.37\n          710828) b2bench          cpu=5 start=12.02 finish=12.37\n          710829) b2bench          cpu=15 start=12.02 finish=12.37\n          710830) b2bench          cpu=14 start=12.02 finish=12.37\n          710831) b2bench          cpu=8 start=12.02 finish=12.37\n          710832) b2bench          cpu=12 start=12.02 finish=12.37\n          710833) b2bench          cpu=10 start=12.02 finish=12.37\n          710834) b2bench          cpu=13 start=12.02 finish=12.37\n          710835) b2bench          cpu=7 start=12.02 finish=12.37\n          710836) b2bench          cpu=3 start=12.02 finish=12.37\n          710837) b2bench          cpu=1 start=12.02 finish=12.37\n          710838) b2bench          cpu=2 start=12.02 finish=12.37\n          710839) b2bench          cpu=4 start=12.02 finish=12.37\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>blosc is a data store library for C that compresses binary data. This runs 18 different workloads with a variety of buffer sizes and algorithms. Looks like these run moderately quickly with a variable number of threads but single-threaded is <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/blosc\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-1450","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1450","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=1450"}],"version-history":[{"count":3,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1450\/revisions"}],"predecessor-version":[{"id":1562,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1450\/revisions\/1562"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=1450"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}