{"id":1173,"date":"2024-01-31T13:18:46","date_gmt":"2024-01-31T13:18:46","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=1173"},"modified":"2024-02-01T01:22:19","modified_gmt":"2024-02-01T01:22:19","slug":"jpegxl","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/jpegxl\/","title":{"rendered":"jpegxl"},"content":{"rendered":"\n<p>Multithreaded library for image encoding. There are six workloads for PNG and JPG files at quality levels of 80, 90 and 100.  It looks like in the chart below that the second half is the quality level 100 runs and the first half is the 80\/90 runs.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-1.png\" alt=\"\" class=\"wp-image-1213\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-1.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-1-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-1-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile has an extended duration, perhaps because of extra time to settle down.  Again the first parts look like quality level of 80 or 90 and the last part has a different profile.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-1.png\" alt=\"\" class=\"wp-image-1215\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-1.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-1-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-1-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show we spend approximately two cores.  This has some floating point and some L2 misses but not as much as other codes.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              5675.342\non_cpu               0.122          # 1.96 \/ 16 cores\nutime                9924.448\nstime                1177.538\nnvcsw                261835         # 73.66%\nnivcsw               93642          # 26.34%\ninblock              0              # 0.00\/sec\nonblock              329168         # 58.00\/sec\ncpu-clock            11102930950490 # 11102.931 seconds\ntask-clock           11103320353169 # 11103.320 seconds\npage faults          665296711      # 59918.717\/sec\ncontext switches     383635         # 34.551\/sec\ncpu migrations       41777          # 3.763\/sec\nmajor page faults    3              # 0.000\/sec\nminor page faults    665296708      # 59918.717\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             10784996197656 # 127.911 branches per 1000 inst\nbranch misses        177067769575   # 1.64% branch miss\nconditional          9151340256953  # 108.536 conditional branches per 1000 inst\nindirect             122058699223   # 1.448 indirect branches per 1000 inst\ncpu-cycles           37254707984946 # 0.60 GHz\ninstructions         64568477307349 # 1.73 IPC\nslots                74516063637558 #\nretiring             21085576318904 # 28.3% (33.8%)\n-- ucode             10854167509    #     0.0%\n-- fastpath          21074722151395 #    28.3%\nfrontend             11291436271569 # 15.2% (18.1%)\n-- latency           7305850960068  #     9.8%\n-- bandwidth         3985585311501  #     5.3%\nbackend              28357036767501 # 38.1% (45.5%)\n-- cpu               8580672843794  #    11.5%\n-- memory            19776363923707 #    26.5%\nspeculation          1582289637387  #  2.1% ( 2.5%)\n-- branch mispredict 1491776078800  #     2.0%\n-- pipeline restart  90513558587    #     0.1%\nsmt-contention       12199680822816 # 16.4% ( 0.0%)\ncpu-cycles           47521318199361 # 0.52 GHz\ninstructions         84159334995043 # 1.77 IPC\ninstructions         28045156973055 # 22.907 l2 access per 1000 inst\nl2 hit from l1       446431052504   # 17.42% l2 miss\nl2 miss from l1      40163624729    #\nl2 hit from l2 pf    124272894941   #\nl3 hit from l2 pf    17708445666    #\nl3 miss from l2 pf   54009010862    #\ninstructions         28026770371181 # 47.993 float per 1000 inst\nfloat 512            79             # 0.000 AVX-512 per 1000 inst\nfloat 256            17101222963    # 0.610 AVX-256 per 1000 inst\nfloat 128            1327999054364  # 47.383 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              3879.335\non_cpu               0.150          # 2.40 \/ 16 cores\nutime                8628.362\nstime                693.654\nnvcsw                228233         # 73.43%\nnivcsw               82573          # 26.57%\ninblock              16             # 0.00\/sec\nonblock              180592         # 46.55\/sec\ncpu-clock            9320946410414  # 9320.946 seconds\ntask-clock           9321173373333  # 9321.173 seconds\npage faults          547123213      # 58696.818\/sec\ncontext switches     330002         # 35.403\/sec\ncpu migrations       55623          # 5.967\/sec\nmajor page faults    3              # 0.000\/sec\nminor page faults    547123210      # 58696.817\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             8263097839432  # 124.372 branches per 1000 inst\nbranch misses        53469754306    # 0.65% branch miss\nconditional          8263097861864  # 124.372 conditional branches per 1000 inst\nindirect             1472508835553  # 22.164 indirect branches per 1000 inst\nslots                109777534393358 #\nretiring             49185428195933 # 44.8% (44.8%)\n-- ucode             3954220797973  #     3.6%\n-- fastpath          45231207397960 #    41.2%\nfrontend             10520885318395 #  9.6% ( 9.6%)\n-- latency           4636226245813  #     4.2%\n-- bandwidth         5884659072582  #     5.4%\nbackend              45213599965195 # 41.2% (41.2%)\n-- cpu               21043679830200 #    19.2%\n-- memory            24169920134995 #    22.0%\nspeculation          6286501719911  #  5.7% ( 5.7%)\n-- branch mispredict 5571414597598  #     5.1%\n-- pipeline restart  715087122313   #     0.7%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           34959735470966 # 0.57 GHz\ninstructions         74247673496426 # 2.12 IPC\nl2 access            792486728041   # 15.946 l2 access per 1000 inst\nl2 miss              388343780546   # 49.00% l2 miss\n<\/code><\/pre>\n\n\n\n<p>Process overview has time spent in cjxl<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>564 processes\n\t289 cjxl                 113792.21 16239.88\n\t 34 clinfo                  10.57     3.04\n\t 19 vulkaninfo               0.38     0.95\n\t  3 glxinfo:gdrv0            0.08     0.03\n\t  3 glxinfo:gl0              0.08     0.03\n\t  6 clang                    0.07     0.05\n\t  2 vulkani:disk$0           0.04     0.10\n\t  1 glxinfo                  0.04     0.01\n\t  1 glxinfo:cs0              0.04     0.01\n\t  1 glxinfo:disk$0           0.04     0.01\n\t  1 glxinfo:sh0              0.04     0.01\n\t  1 glxinfo:shlo0            0.04     0.01\n\t  1 llvmpipe-0               0.02     0.05\n\t  1 llvmpipe-1               0.02     0.05\n\t  1 llvmpipe-10              0.02     0.05\n\t  1 llvmpipe-11              0.02     0.05\n\t  1 llvmpipe-12              0.02     0.05\n\t  1 llvmpipe-13              0.02     0.05\n\t  1 llvmpipe-14              0.02     0.05\n\t  1 llvmpipe-15              0.02     0.05\n\t  1 llvmpipe-2               0.02     0.05\n\t  1 llvmpipe-3               0.02     0.05\n\t  1 llvmpipe-4               0.02     0.05\n\t  1 llvmpipe-5               0.02     0.05\n\t  1 llvmpipe-6               0.02     0.05\n\t  1 llvmpipe-7               0.02     0.05\n\t  1 llvmpipe-8               0.02     0.05\n\t  1 llvmpipe-9               0.02     0.05\n\t  1 ps                       0.00     0.01\n\t 68 sh                       0.00     0.00\n\t 17 jpegxl                   0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t  9 gsettings                0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  7 stat                     0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  4 gmain                    0.00     0.00\n\t  4 phoronix-test-s          0.00     0.00\n\t  2 dconf worker             0.00     0.00\n\t  2 which                    0.00     0.00\n\t  1 cc                       0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lscpu                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n\t  1 xset                     0.00     0.00\n27 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Computation blocks<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      2644162) jpegxl           cpu=6 start=5.83  finish=149.67\n        2644163) cjxl             cpu=8 start=5.83  finish=149.65\n          2644290) cjxl             cpu=5 start=6.37  finish=149.63\n          2644291) cjxl             cpu=1 start=6.37  finish=149.64\n          2644292) cjxl             cpu=10 start=6.37  finish=149.64\n          2644293) cjxl             cpu=8 start=6.37  finish=149.63\n          2644294) cjxl             cpu=14 start=6.37  finish=149.63\n          2644295) cjxl             cpu=2 start=6.37  finish=149.63\n          2644296) cjxl             cpu=6 start=6.37  finish=149.63\n          2644297) cjxl             cpu=13 start=6.37  finish=149.63\n          2644298) cjxl             cpu=15 start=6.37  finish=149.64\n          2644299) cjxl             cpu=4 start=6.37  finish=149.64\n          2644300) cjxl             cpu=3 start=6.37  finish=149.64\n          2644301) cjxl             cpu=11 start=6.37  finish=149.64\n          2644302) cjxl             cpu=12 start=6.37  finish=149.64\n          2644303) cjxl             cpu=5 start=6.37  finish=149.63\n          2644304) cjxl             cpu=0 start=6.37  finish=149.63\n          2644305) cjxl             cpu=7 start=6.37  finish=149.63\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Multithreaded library for image encoding. There are six workloads for PNG and JPG files at quality levels of 80, 90 and 100. It looks like in the chart below that the second half is the quality level 100 runs and <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/jpegxl\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-1173","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1173","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=1173"}],"version-history":[{"count":3,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1173\/revisions"}],"predecessor-version":[{"id":1218,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1173\/revisions\/1218"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=1173"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}