{"id":1237,"date":"2024-02-01T23:14:39","date_gmt":"2024-02-01T23:14:39","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=1237"},"modified":"2024-02-01T23:15:29","modified_gmt":"2024-02-01T23:15:29","slug":"lzbench","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/lzbench\/","title":{"rendered":"lzbench"},"content":{"rendered":"\n<p>lzbench is a benchmark for compression with seven workloads. These have both compress and decompress metrics so ~14 total. This is a single-thread workload.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-4.png\" alt=\"\" class=\"wp-image-1238\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-4.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-4-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-4-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows frontend and backend stalls about even but varying by the test case. Branch misprediction is surprisingly high.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-4.png\" alt=\"\" class=\"wp-image-1239\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-4.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-4-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-4-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show the high rate of speculation misses. This code has little floating point and a moderate level of branches.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              755.928\non_cpu               0.051          # 0.81 \/ 16 cores\nutime                583.276\nstime                32.447\nnvcsw                2596           # 49.00%\nnivcsw               2702           # 51.00%\ninblock              0              # 0.00\/sec\nonblock              15424          # 20.40\/sec\ncpu-clock            615848223310   # 615.848 seconds\ntask-clock           615860546931   # 615.861 seconds\npage faults          18659342       # 30297.999\/sec\ncontext switches     8845           # 14.362\/sec\ncpu migrations       326            # 0.529\/sec\nmajor page faults    2              # 0.003\/sec\nminor page faults    18659340       # 30297.995\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             783659267177   # 130.034 branches per 1000 inst\nbranch misses        47146996285    # 6.02% branch miss\nconditional          685766650450   # 113.791 conditional branches per 1000 inst\nindirect             6891243211     # 1.143 indirect branches per 1000 inst\ncpu-cycles           2372502199230  # 0.23 GHz\ninstructions         5279491010779  # 2.23 IPC\nslots                4748483394420  #\nretiring             1718435481181  # 36.2% (36.2%)\n-- ucode             497136752      #     0.0%\n-- fastpath          1717938344429  #    36.2%\nfrontend             1151657223287  # 24.3% (24.3%)\n-- latency           735670326708   #    15.5%\n-- bandwidth         415986896579   #     8.8%\nbackend              1018700227325  # 21.5% (21.5%)\n-- cpu               179530650466   #     3.8%\n-- memory            839169576859   #    17.7%\nspeculation          859479089638   # 18.1% (18.1%) high\n-- branch mispredict 855319116325   #    18.0%\n-- pipeline restart  4159973313     #     0.1%\nsmt-contention       211006212      #  0.0% ( 0.0%)\ncpu-cycles           2366480123372  # 0.23 GHz\ninstructions         5261054743756  # 2.22 IPC\ninstructions         1754626398723  # 24.734 l2 access per 1000 inst\nl2 hit from l1       30101932020    # 19.34% l2 miss\nl2 miss from l1      2239669010     #\nl2 hit from l2 pf    7142938168     #\nl3 hit from l2 pf    1738642924     #\nl3 miss from l2 pf   4416281676     #\ninstructions         1754894600217  # 20.259 float per 1000 inst\nfloat 512            67             # 0.000 AVX-512 per 1000 inst\nfloat 256            496            # 0.000 AVX-256 per 1000 inst\nfloat 128            35553242796    # 20.259 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              6333.440\non_cpu               0.742          # 11.88 \/ 16 cores\nutime                74750.330\nstime                473.297\nnvcsw                1313743        # 89.74%\nnivcsw               150279         # 10.26%\ninblock              30890000       # 4877.29\/sec\nonblock              694880         # 109.72\/sec\ncpu-clock            75222931189368 # 75222.931 seconds\ntask-clock           75223322994820 # 75223.323 seconds\npage faults          85985637       # 1143.072\/sec\ncontext switches     1495425        # 19.880\/sec\ncpu migrations       54829          # 0.729\/sec\nmajor page faults    1233961        # 16.404\/sec\nminor page faults    84751671       # 1126.667\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             45503197495161 # 71.793 branches per 1000 inst\nbranch misses        90982505997    # 0.20% branch miss\nconditional          45503197511129 # 71.793 conditional branches per 1000 inst\nindirect             11993405533405 # 18.923 indirect branches per 1000 inst\nslots                433838595012596 #\nretiring             283286737116435 # 65.3% (65.3%) high\n-- ucode             13222986405835 #     3.0%\n-- fastpath          270063750710600 #    62.2%\nfrontend             19439778785847 #  4.5% ( 4.5%) low\n-- latency           9361508360355  #     2.2%\n-- bandwidth         10078270425492 #     2.3%\nbackend              112743375337721 # 26.0% (26.0%)\n-- cpu               63018362871765 #    14.5%\n-- memory            49725012465956 #    11.5%\nspeculation          12881927224817 #  3.0% ( 3.0%)\n-- branch mispredict 12368006069230 #     2.9%\n-- pipeline restart  513921155587   #     0.1%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           216187026514697 # 2.16 GHz\ninstructions         883987677201512 # 4.09 IPC high\nl2 access            3937606297777  # 13.285 l2 access per 1000 inst\nl2 miss              358227811293   # 9.10% l2 miss\n<\/code><\/pre>\n\n\n\n<p>Process overview is straightforward with invocations of lzbench<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>402 processes\n\t 42 lzbench                486.92    26.60\n\t 68 clinfo                  19.50     7.66\n\t 38 vulkaninfo               1.10     1.15\n\t  6 glxinfo:gdrv0            0.15     0.06\n\t  6 glxinfo:gl0              0.15     0.06\n\t  4 vulkani:disk$0           0.11     0.13\n\t  6 php                      0.08     0.25\n\t  2 glxinfo                  0.07     0.02\n\t  2 glxinfo:cs0              0.07     0.02\n\t  2 glxinfo:disk$0           0.07     0.02\n\t  2 glxinfo:sh0              0.07     0.02\n\t  2 glxinfo:shlo0            0.07     0.02\n\t  2 llvmpipe-0               0.06     0.07\n\t  2 llvmpipe-1               0.06     0.07\n\t  2 llvmpipe-2               0.06     0.07\n\t  2 llvmpipe-3               0.06     0.07\n\t  2 llvmpipe-4               0.06     0.07\n\t  6 clang                    0.06     0.06\n\t  2 llvmpipe-10              0.06     0.06\n\t  2 llvmpipe-11              0.06     0.06\n\t  2 llvmpipe-12              0.06     0.06\n\t  2 llvmpipe-13              0.06     0.06\n\t  2 llvmpipe-14              0.06     0.06\n\t  2 llvmpipe-15              0.06     0.06\n\t  2 llvmpipe-5               0.06     0.06\n\t  2 llvmpipe-6               0.06     0.06\n\t  2 llvmpipe-7               0.06     0.06\n\t  2 llvmpipe-8               0.06     0.06\n\t  2 llvmpipe-9               0.06     0.06\n\t  3 rocminfo                 0.03     0.00\n\t  1 lspci                    0.00     0.03\n\t  1 ps                       0.00     0.01\n\t 94 sh                       0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t 13 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 gmain                    0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dconf worker             0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Example of computation blocks<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      39349) lzbench          cpu=15 start=5.62  finish=33.77\n        39350) lzbench          cpu=9 start=5.62  finish=33.77\n      39353) lzbench          cpu=13 start=37.78 finish=66.05\n        39354) lzbench          cpu=6 start=37.78 finish=66.04\n      39355) lzbench          cpu=13 start=70.05 finish=98.24\n        39356) lzbench          cpu=15 start=70.05 finish=98.24\n      39357) sh               cpu=6 start=98.24 finish=98.24\n        39358) sh               cpu=7 start=98.24 finish=98.24\n      39359) lzbench          cpu=5 start=108.42 finish=131.47\n        39360) lzbench          cpu=6 start=108.42 finish=131.46\n      39361) lzbench          cpu=5 start=135.47 finish=158.54\n        39362) lzbench          cpu=6 start=135.47 finish=158.54\n      39363) lzbench          cpu=13 start=162.54 finish=185.54\n        39364) lzbench          cpu=6 start=162.54 finish=185.54\n      39366) sh               cpu=13 start=185.54 finish=185.54\n        39367) sh               cpu=7 start=185.54 finish=185.54\n      39368) lzbench          cpu=5 start=195.90 finish=222.22\n        39369) lzbench          cpu=6 start=195.90 finish=222.22\n      39370) lzbench          cpu=14 start=226.23 finish=252.51\n        39371) lzbench          cpu=7 start=226.23 finish=252.51\n      39374) lzbench          cpu=13 start=256.51 finish=282.70\n        39375) lzbench          cpu=6 start=256.51 finish=282.70\n      39416) sh               cpu=13 start=282.70 finish=282.70\n        39417) sh               cpu=7 start=282.70 finish=282.70\n      39418) lzbench          cpu=5 start=293.20 finish=318.00\n        39419) lzbench          cpu=14 start=293.20 finish=318.00\n      39420) lzbench          cpu=5 start=322.01 finish=347.21\n        39421) lzbench          cpu=6 start=322.01 finish=347.20\n      39422) lzbench          cpu=6 start=351.21 finish=376.21\n        39423) lzbench          cpu=7 start=351.21 finish=376.21\n      39424) sh               cpu=13 start=376.22 finish=376.22\n        39425) sh               cpu=7 start=376.22 finish=376.22\n      39427) lzbench          cpu=6 start=386.40 finish=410.11\n        39428) lzbench          cpu=7 start=386.40 finish=410.10\n      39429) lzbench          cpu=5 start=414.11 finish=436.48\n        39430) lzbench          cpu=6 start=414.11 finish=436.48\n      39431) lzbench          cpu=13 start=440.48 finish=462.81\n        39432) lzbench          cpu=14 start=440.49 finish=462.81\n      39433) sh               cpu=6 start=462.81 finish=462.81\n        39434) sh               cpu=15 start=462.81 finish=462.81\n      39435) lzbench          cpu=6 start=472.99 finish=495.27\n        39436) lzbench          cpu=15 start=472.99 finish=495.27\n      39437) lzbench          cpu=13 start=499.27 finish=522.41\n        39438) lzbench          cpu=14 start=499.28 finish=522.40\n      39439) lzbench          cpu=13 start=526.41 finish=548.59\n        39440) lzbench          cpu=14 start=526.41 finish=548.59\n      39441) sh               cpu=15 start=548.59 finish=548.59\n        39442) sh               cpu=9 start=548.59 finish=548.59\n      39443) lzbench          cpu=2 start=559.46 finish=583.09\n        39444) lzbench          cpu=5 start=559.46 finish=583.08\n      39447) lzbench          cpu=9 start=587.09 finish=610.72\n        39448) lzbench          cpu=2 start=587.09 finish=610.71\n      39449) lzbench          cpu=9 start=614.72 finish=638.31\n        39450) lzbench          cpu=10 start=614.72 finish=638.31\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>lzbench is a benchmark for compression with seven workloads. These have both compress and decompress metrics so ~14 total. This is a single-thread workload. Topdown profile shows frontend and backend stalls about even but varying by the test case. Branch <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/lzbench\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-1237","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1237","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=1237"}],"version-history":[{"count":1,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1237\/revisions"}],"predecessor-version":[{"id":1241,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1237\/revisions\/1241"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=1237"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}