{"id":1444,"date":"2024-02-03T23:18:17","date_gmt":"2024-02-03T23:18:17","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=1444"},"modified":"2024-02-09T13:02:39","modified_gmt":"2024-02-09T13:02:39","slug":"astcenc","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/astcenc\/","title":{"rendered":"astcenc"},"content":{"rendered":"\n<p>ASTC encoder doing both encoding and decoding. There are four workloads with different encode settings.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-41.png\" alt=\"\" class=\"wp-image-1627\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-41.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-41-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-41-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows a high retirement rate with backend stalls and frontend stalls changing some by workload.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-43.png\" alt=\"\" class=\"wp-image-1629\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-43.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-43-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-43-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics a lot of floating point and relatively low L2 access.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              516.788\non_cpu               0.795          # 12.72 \/ 16 cores\nutime                6565.977\nstime                6.691\nnvcsw                11724          # 11.71%\nnivcsw               88402          # 88.29%\ninblock              0              # 0.00\/sec\nonblock              733904         # 1420.13\/sec\ncpu-clock            6587278481382  # 6587.278 seconds\ntask-clock           6587335905427  # 6587.336 seconds\npage faults          1895236        # 287.709\/sec\ncontext switches     102432         # 15.550\/sec\ncpu migrations       1655           # 0.251\/sec\nmajor page faults    2              # 0.000\/sec\nminor page faults    1895234        # 287.709\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             1984788977011  # 55.516 branches per 1000 inst\nbranch misses        35322383228    # 1.78% branch miss\nconditional          1786077491888  # 49.958 conditional branches per 1000 inst\nindirect             4240505356     # 0.119 indirect branches per 1000 inst\ncpu-cycles           25693830621900 # 3.11 GHz\ninstructions         35705494385797 # 1.39 IPC\nslots                51621364363374 #\nretiring             19006066855327 # 36.8% (54.0%)\n-- ucode             176927047233   #     0.3%\n-- fastpath          18829139808094 #    36.5%\nfrontend             6995059072211  # 13.6% (19.9%)\n-- latency           6786744700704  #    13.1%\n-- bandwidth         208314371507   #     0.4%\nbackend              8827395238913  # 17.1% (25.1%)\n-- cpu               7624454215155  #    14.8%\n-- memory            1202941023758  #     2.3%\nspeculation          372063736871   #  0.7% ( 1.1%)\n-- branch mispredict 368100409350   #     0.7%\n-- pipeline restart  3963327521     #     0.0%\nsmt-contention       16420749718088 # 31.8% ( 0.0%)\ncpu-cycles           25718683077443 # 3.10 GHz\ninstructions         35671503486601 # 1.39 IPC\ninstructions         11924446164926 # 14.974 l2 access per 1000 inst\nl2 hit from l1       156742650154   # 6.54% l2 miss\nl2 miss from l1      6711981682     #\nl2 hit from l2 pf    16851909912    #\nl3 hit from l2 pf    4467774002     #\nl3 miss from l2 pf   493179304      #\ninstructions         11911659524416 # 450.784 float per 1000 inst\nfloat 512            45             # 0.000 AVX-512 per 1000 inst\nfloat 256            410            # 0.000 AVX-256 per 1000 inst\nfloat 128            5369581741491  # 450.784 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\ninstructions         2652042        #\nopcache              975668         # 367.893 opcache per 1000 inst\nopcache miss         524462         # 53.8% opcache miss rate\nl1 dTLB miss         5625           # 2.121 L1 dTLB per 1000 inst\nl2 dTLB miss         1138           # 0.429 L2 dTLB per 1000 inst\ninstructions         2682386        #\nicache               1314716        # 490.129 icache per 1000 inst\nicache miss          112304         #  8.5% icache miss rate\nl1 iTLB miss         12             # 0.004 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            19             # 0.007 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              728.305\non_cpu               0.838          # 13.41 \/ 16 cores\nutime                9762.031\nstime                5.245\nnvcsw                12570          # 10.96%\nnivcsw               102172         # 89.04%\ninblock              41560          # 57.06\/sec\nonblock              722624         # 992.20\/sec\ncpu-clock            9780538310498  # 9780.538 seconds\ntask-clock           9780576896522  # 9780.577 seconds\npage faults          1881464        # 192.367\/sec\ncontext switches     117838         # 12.048\/sec\ncpu migrations       3274           # 0.335\/sec\nmajor page faults    10             # 0.001\/sec\nminor page faults    1881454        # 192.366\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             1983363120229  # 55.477 branches per 1000 inst\nbranch misses        45953695711    # 2.32% branch miss\nconditional          1983363599909  # 55.477 conditional branches per 1000 inst\nindirect             451720544818   # 12.635 indirect branches per 1000 inst\nslots                47669514571232 #\nretiring             28083200512637 # 58.9% (58.9%) high\n-- ucode             5470434164301  #    11.5%\n-- fastpath          22612766348336 #    47.4%\nfrontend             12066471484638 # 25.3% (25.3%)\n-- latency           9275333680799  #    19.5%\n-- bandwidth         2791137803839  #     5.9%\nbackend              5201850758995  # 10.9% (10.9%) low\n-- cpu               4149275756514  #     8.7%\n-- memory            1052575002481  #     2.2%\nspeculation          2342704926128  #  4.9% ( 4.9%)\n-- branch mispredict 2324136467288  #     4.9%\n-- pipeline restart  18568458840    #     0.0%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           17808128614553 # 1.53 GHz\ninstructions         26843281855229 # 1.51 IPC\nl2 access            439669313013   # 18.599 l2 access per 1000 inst\nl2 miss              80659863636    # 18.35% l2 miss\ncpu-cycles           15678325923933 # 13.0% memory latency\nload stalls          2028612957217  #  9.1% l1 bound\nl1 miss              606377070378   #  3.4% l2 bound\nl2 miss              66744825283    #  0.4% l3 bound\nl3 miss              6195120582     #  0.0% dram bound\nstore_stalls         4857894264     #  0.0% store bound\n<\/code><\/pre>\n\n\n\n<p>Process overview shows many invocations of astenc-avx2<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>14970 processes\n\t14604 astcenc-avx2         3637052.49  5886.00\n\t 68 clinfo                  16.21     6.32\n\t 38 vulkaninfo               0.94     1.35\n\t  6 php                      0.14     0.08\n\t  6 glxinfo:gdrv0            0.11     0.07\n\t  6 glxinfo:gl0              0.11     0.07\n\t  4 vulkani:disk$0           0.10     0.14\n\t  6 clang                    0.05     0.07\n\t  2 llvmpipe-0               0.05     0.07\n\t  2 llvmpipe-1               0.05     0.07\n\t  2 llvmpipe-10              0.05     0.07\n\t  2 llvmpipe-11              0.05     0.07\n\t  2 llvmpipe-12              0.05     0.07\n\t  2 llvmpipe-13              0.05     0.07\n\t  2 llvmpipe-14              0.05     0.07\n\t  2 llvmpipe-15              0.05     0.07\n\t  2 llvmpipe-2               0.05     0.07\n\t  2 llvmpipe-3               0.05     0.07\n\t  2 llvmpipe-4               0.05     0.07\n\t  2 llvmpipe-5               0.05     0.07\n\t  2 llvmpipe-6               0.05     0.07\n\t  2 llvmpipe-7               0.05     0.07\n\t  2 llvmpipe-8               0.05     0.07\n\t  2 llvmpipe-9               0.05     0.07\n\t  2 glxinfo                  0.05     0.03\n\t  2 glxinfo:cs0              0.05     0.03\n\t  2 glxinfo:disk$0           0.05     0.03\n\t  2 glxinfo:sh0              0.05     0.03\n\t  2 glxinfo:shlo0            0.05     0.03\n\t  1 lspci                    0.01     0.01\n\t 88 sh                       0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t 12 astcenc                  0.00     0.00\n\t 11 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  3 gmain                    0.00     0.00\n\t  3 rocminfo                 0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 dconf worker             0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 ps                       0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>ASTC encoder doing both encoding and decoding. There are four workloads with different encode settings. Topdown profile shows a high retirement rate with backend stalls and frontend stalls changing some by workload. AMD metrics a lot of floating point and <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/astcenc\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-1444","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1444","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=1444"}],"version-history":[{"count":3,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1444\/revisions"}],"predecessor-version":[{"id":1630,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1444\/revisions\/1630"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=1444"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}