{"id":2124,"date":"2024-03-20T12:19:39","date_gmt":"2024-03-20T12:19:39","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=2124"},"modified":"2024-03-22T00:39:03","modified_gmt":"2024-03-22T00:39:03","slug":"webp2","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/webp2\/","title":{"rendered":"webp2"},"content":{"rendered":"\n<p>Google libwebp2 library with image encoding.  There are five workloads that run differing amounts of time. Looks like the last one takes a majority of the time.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-38.png\" alt=\"\" class=\"wp-image-2147\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-38.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-38-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-38-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows overall a high retirement rate<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-40.png\" alt=\"\" class=\"wp-image-2149\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-40.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-40-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-40-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics confirm the higher retirement rate. Some floating point code and lower amount of L2 access.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              6454.668\non_cpu               0.938          # 15.01 \/ 16 cores\nutime                96826.664\nstime                45.413\nnvcsw                494978         # 29.55%\nnivcsw               1179978        # 70.45%\ninblock              8              # 0.00\/sec\nonblock              190864         # 29.57\/sec\ncpu-clock            96875999087966 # 96875.999 seconds\ntask-clock           96876761793744 # 96876.762 seconds\npage faults          10494175       # 108.325\/sec\ncontext switches     1706972        # 17.620\/sec\ncpu migrations       160719         # 1.659\/sec\nmajor page faults    3              # 0.000\/sec\nminor page faults    10494172       # 108.325\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             126898490547422 # 158.007 branches per 1000 inst\nbranch misses        1318886240479  # 1.04% branch miss\nconditional          77448549388599 # 96.435 conditional branches per 1000 inst\nindirect             13673282628499 # 17.025 indirect branches per 1000 inst\ncpu-cycles           384962570882363 # 3.72 GHz\ninstructions         803239029833114 # 2.09 IPC\nslots                769854586083126 #\nretiring             284609100009382 # 37.0% (62.1%) high\n-- ucode             4043501393113  #     0.5%\n-- fastpath          280565598616269 #    36.4%\nfrontend             78261693919570 # 10.2% (17.1%)\n-- latency           42905861803014 #     5.6%\n-- bandwidth         35355832116556 #     4.6%\nbackend              80176935537309 # 10.4% (17.5%) low\n-- cpu               49494451462269 #     6.4%\n-- memory            30682484075040 #     4.0%\nspeculation          15095623143789 #  2.0% ( 3.3%)\n-- branch mispredict 15003422604144 #     1.9%\n-- pipeline restart  92200539645    #     0.0%\nsmt-contention       311710573425872 # 40.5% ( 0.0%)\ncpu-cycles           384784177103671 # 3.71 GHz\ninstructions         803248721506170 # 2.09 IPC\ninstructions         267743308628030 # 7.474 l2 access per 1000 inst\nl2 hit from l1       1602663711417  # 7.55% l2 miss\nl2 miss from l1      35570585376    #\nl2 hit from l2 pf    282998514358   #\nl3 hit from l2 pf    43222627984    #\nl3 miss from l2 pf   72300954073    #\ninstructions         267647173786960 # 121.102 float per 1000 inst\nfloat 512            79             # 0.000 AVX-512 per 1000 inst\nfloat 256            616            # 0.000 AVX-256 per 1000 inst\nfloat 128            32412525008255 # 121.102 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\ninstructions         803109168877818 #\nopcache              127066718164852 # 158.218 opcache per 1000 inst\nopcache miss         3493204387362  #  2.7% opcache miss rate\nl1 dTLB miss         665818661116   # 0.829 L1 dTLB per 1000 inst\nl2 dTLB miss         6124862374     # 0.008 L2 dTLB per 1000 inst\ninstructions         803111520402727 #\nicache               4365248868945  # 5.435 icache per 1000 inst\nicache miss          710717117556   # 16.3% icache miss rate\nl1 iTLB miss         95935516176    # 0.119 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            142328         # 0.000 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              8161.727\non_cpu               0.937          # 15.00 \/ 16 cores\nutime                122359.757\nstime                47.232\nnvcsw                1448562        # 50.39%\nnivcsw               1426173        # 49.61%\ninblock              22528          # 2.76\/sec\nonblock              179512         # 21.99\/sec\ncpu-clock            122409623127842 # 122409.623 seconds\ntask-clock           122410292988041 # 122410.293 seconds\npage faults          10143595       # 82.866\/sec\ncontext switches     2915091        # 23.814\/sec\ncpu migrations       124435         # 1.017\/sec\nmajor page faults    82             # 0.001\/sec\nminor page faults    10143513       # 82.865\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             127006812912740 # 158.044 branches per 1000 inst\nbranch misses        1410065467341  # 1.11% branch miss\nconditional          127006812933188 # 158.044 conditional branches per 1000 inst\nindirect             50858947018231 # 63.288 indirect branches per 1000 inst\nslots                557076544137674 #\nretiring             400021032719744 # 71.8% (71.8%) high\n-- ucode             43305567539593 #     7.8%\n-- fastpath          356715465180151 #    64.0%\nfrontend             80086580625828 # 14.4% (14.4%)\n-- latency           33826533132000 #     6.1%\n-- bandwidth         46260047493828 #     8.3%\nbackend              27264603206660 #  4.9% ( 4.9%) low\n-- cpu               19162921555765 #     3.4%\n-- memory            8101681650895  #     1.5%\nspeculation          59508409749058 # 10.7% (10.7%) high\n-- branch mispredict 59089662975152 #    10.6%\n-- pipeline restart  418746773906   #     0.1%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           346572446103804 # 2.67 GHz\ninstructions         760195516210084 # 2.19 IPC\nl2 access            2119361444064  # 5.371 l2 access per 1000 inst\nl2 miss              440106375672   # 20.77% l2 miss\ncpu-cycles           179557431925404 # 11.4% memory latency\nload stalls          20166657919693 #  8.7% l1 bound\nl1 miss              4621141485201  #  1.9% l2 bound\nl2 miss              1141520305635  #  0.2% l3 bound\nl3 miss              849908241729   #  0.5% dram bound\nstore_stalls         259645621514   #  0.1% store bound\n<\/code><\/pre>\n\n\n\n<p>Process overview shows time spent in cwp2<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>512 processes\n\t234 cwp2                 896093.32   511.22\n\t 34 clinfo                  10.01     3.00\n\t 19 vulkaninfo               0.74     0.76\n\t  2 vulkani:disk$0           0.07     0.08\n\t  6 clang                    0.05     0.07\n\t  1 llvmpipe-0               0.04     0.04\n\t  1 llvmpipe-1               0.04     0.04\n\t  1 llvmpipe-10              0.04     0.04\n\t  1 llvmpipe-11              0.04     0.04\n\t  1 llvmpipe-12              0.04     0.04\n\t  1 llvmpipe-13              0.04     0.04\n\t  1 llvmpipe-14              0.04     0.04\n\t  1 llvmpipe-15              0.04     0.04\n\t  1 llvmpipe-2               0.04     0.04\n\t  1 llvmpipe-3               0.04     0.04\n\t  1 llvmpipe-4               0.04     0.04\n\t  1 llvmpipe-5               0.04     0.04\n\t  1 llvmpipe-6               0.04     0.04\n\t  1 llvmpipe-7               0.04     0.04\n\t  1 llvmpipe-8               0.04     0.04\n\t  1 llvmpipe-9               0.04     0.04\n\t  1 ps                       0.00     0.01\n\t 68 sh                       0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t 13 rm                       0.00     0.00\n\t 13 webp2                    0.00     0.00\n\t 11 gsettings                0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  7 stat                     0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  4 glxinfo                  0.00     0.00\n\t  4 phoronix-test-s          0.00     0.00\n\t  3 gmain                    0.00     0.00\n\t  2 grep                     0.00     0.00\n\t  2 which                    0.00     0.00\n\t  1 cc                       0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dconf worker             0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lscpu                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 setterm                  0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n28 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Computation blocks<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      13497) webp2            cpu=0 start=13.20 finish=16.73\n        13498) cwp2             cpu=9 start=13.20 finish=16.71\n          13499) cwp2             cpu=8 start=13.70 finish=16.67\n          13500) cwp2             cpu=15 start=13.70 finish=16.67\n          13501) cwp2             cpu=0 start=13.70 finish=16.67\n          13502) cwp2             cpu=7 start=13.70 finish=16.67\n          13503) cwp2             cpu=11 start=13.70 finish=16.67\n          13504) cwp2             cpu=15 start=13.70 finish=16.67\n          13505) cwp2             cpu=7 start=13.70 finish=16.67\n          13506) cwp2             cpu=13 start=13.70 finish=16.67\n          13507) cwp2             cpu=10 start=13.70 finish=16.67\n          13508) cwp2             cpu=4 start=13.70 finish=16.67\n          13509) cwp2             cpu=12 start=13.70 finish=16.67\n          13510) cwp2             cpu=6 start=13.70 finish=16.67\n          13511) cwp2             cpu=1 start=13.70 finish=16.67\n          13512) cwp2             cpu=3 start=13.70 finish=16.69\n          13513) cwp2             cpu=2 start=13.70 finish=16.69\n          13514) cwp2             cpu=11 start=13.70 finish=16.69\n          13515) cwp2             cpu=14 start=13.70 finish=16.70\n        13517) rm               cpu=2 start=16.73 finish=16.73\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Google libwebp2 library with image encoding. There are five workloads that run differing amounts of time. Looks like the last one takes a majority of the time. Topdown profile shows overall a high retirement rate AMD metrics confirm the higher <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/webp2\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2124","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2124","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=2124"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2124\/revisions"}],"predecessor-version":[{"id":2150,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2124\/revisions\/2150"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=2124"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}