{"id":2026,"date":"2024-03-06T12:05:09","date_gmt":"2024-03-06T12:05:09","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=2026"},"modified":"2024-03-07T03:18:53","modified_gmt":"2024-03-07T03:18:53","slug":"etcpak","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/etcpak\/","title":{"rendered":"etcpak"},"content":{"rendered":"\n<p>A fast version of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Ericsson_Texture_Compression\">ericson texture compression<\/a>. This has four workloads. The first two look multi-threaded and the last two single-threaded.<\/p>\n\n\n\n<p> <\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-22.png\" alt=\"\" class=\"wp-image-2029\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-22.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-22-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-22-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows an overall higher retirement rate with some backend stalls. Branch misses look higher than average.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-23.png\" alt=\"\" class=\"wp-image-2031\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-23.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-23-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-23-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show floating point code, low levels of L2 access. Backend stalls are more CPU than memory.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              379.185\non_cpu               0.127          # 2.03 \/ 16 cores\nutime                695.160\nstime                73.650\nnvcsw                33939          # 79.16%\nnivcsw               8933           # 20.84%\ninblock              0              # 0.00\/sec\nonblock              13072          # 34.47\/sec\ncpu-clock            768605623724   # 768.606 seconds\ntask-clock           768683156709   # 768.683 seconds\npage faults          40265127       # 52381.956\/sec\ncontext switches     44571          # 57.984\/sec\ncpu migrations       728            # 0.947\/sec\nmajor page faults    2              # 0.003\/sec\nminor page faults    40265125       # 52381.953\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             322067504448   # 53.327 branches per 1000 inst\nbranch misses        19026737996    # 5.91% branch miss\nconditional          261678731073   # 43.328 conditional branches per 1000 inst\nindirect             2411440948     # 0.399 indirect branches per 1000 inst\ncpu-cycles           3143167306570  # 0.52 GHz\ninstructions         6014440828700  # 1.91 IPC\nslots                6299593694976  #\nretiring             2064474000665  # 32.8% (41.4%)\n-- ucode             17942446201    #     0.3%\n-- fastpath          2046531554464  #    32.5%\nfrontend             809225698829   # 12.8% (16.2%)\n-- latency           547271642676   #     8.7%\n-- bandwidth         261954056153   #     4.2%\nbackend              1774056180250  # 28.2% (35.6%)\n-- cpu               1117262901849  #    17.7%\n-- memory            656793278401   #    10.4%\nspeculation          339046574585   #  5.4% ( 6.8%)\n-- branch mispredict 337884265343   #     5.4%\n-- pipeline restart  1162309242     #     0.0%\nsmt-contention       1312785174486  # 20.8% ( 0.0%)\ncpu-cycles           3136549829895  # 0.52 GHz\ninstructions         6011087700746  # 1.92 IPC\ninstructions         2007447939680  # 8.024 l2 access per 1000 inst\nl2 hit from l1       11062739284    # 26.83% l2 miss\nl2 miss from l1      486228979      #\nl2 hit from l2 pf    1209907993     #\nl3 hit from l2 pf    708610652      #\nl3 miss from l2 pf   3126294311     #\ninstructions         2010826750680  # 267.182 float per 1000 inst\nfloat 512            49             # 0.000 AVX-512 per 1000 inst\nfloat 256            380            # 0.000 AVX-256 per 1000 inst\nfloat 128            537257423693   # 267.182 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\ninstructions         6022694670998  #\nopcache              927164558278   # 153.945 opcache per 1000 inst\nopcache miss         56305763687    #  6.1% opcache miss rate\nl1 dTLB miss         954245753      # 0.158 L1 dTLB per 1000 inst\nl2 dTLB miss         206814759      # 0.034 L2 dTLB per 1000 inst\ninstructions         6022972249053  #\nicache               110398259315   # 18.330 icache per 1000 inst\nicache miss          9028214708     #  8.2% icache miss rate\nl1 iTLB miss         9538498        # 0.002 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            23646          # 0.000 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              475.332\non_cpu               0.130          # 2.09 \/ 16 cores\nutime                939.204\nstime                52.098\nnvcsw                32300          # 71.25%\nnivcsw               13036          # 28.75%\ninblock              1136           # 2.39\/sec\nonblock              1736           # 3.65\/sec\ncpu-clock            990936944257   # 990.937 seconds\ntask-clock           990980198648   # 990.980 seconds\npage faults          40260308       # 40626.753\/sec\ncontext switches     47515          # 47.947\/sec\ncpu migrations       5556           # 5.607\/sec\nmajor page faults    0              # 0.000\/sec\nminor page faults    40260308       # 40626.753\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             310052567082   # 49.041 branches per 1000 inst\nbranch misses        12993611831    # 4.19% branch miss\nconditional          310052583722   # 49.041 conditional branches per 1000 inst\nindirect             40474888197    # 6.402 indirect branches per 1000 inst\nslots                11846758328900 #\nretiring             5119661056273  # 43.2% (43.2%)\n-- ucode             304518481434   #     2.6%\n-- fastpath          4815142574839  #    40.6%\nfrontend             1266742842987  # 10.7% (10.7%)\n-- latency           964654096534   #     8.1%\n-- bandwidth         302088746453   #     2.5%\nbackend              3986359025539  # 33.6% (33.6%)\n-- cpu               3603962998717  #    30.4%\n-- memory            382396026822   #     3.2%\nspeculation          1471295695168  # 12.4% (12.4%) high\n-- branch mispredict 1430113408581  #    12.1%\n-- pipeline restart  41182286587    #     0.3%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           3696784401081  # 0.49 GHz\ninstructions         7221215707281  # 1.95 IPC\nl2 access            34425025841    # 6.898 l2 access per 1000 inst\nl2 miss              17807291225    # 51.73% l2 miss\ncpu-cycles           2550674883434  #  8.5% memory latency\nload stalls          203370913214   #  6.1% l1 bound\nl1 miss              46953318482    #  1.3% l2 bound\nl2 miss              12616747498    #  0.2% l3 bound\nl3 miss              8507747116     #  0.3% dram bound\nstore_stalls         14693094374    #  0.6% store bound\n<\/code><\/pre>\n\n\n\n<p>Process overview<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>480 processes\n\t 36 etcpak                 706.06    73.79\n\t 68 clinfo                  16.86     5.66\n\t 38 vulkaninfo               1.14     1.33\n\t  4 vulkani:disk$0           0.12     0.14\n\t  6 php                      0.09     0.13\n\t  6 glxinfo:gdrv0            0.08     0.09\n\t  6 glxinfo:gl0              0.07     0.09\n\t  2 llvmpipe-0               0.06     0.07\n\t  2 llvmpipe-1               0.06     0.07\n\t  2 llvmpipe-10              0.06     0.07\n\t  2 llvmpipe-11              0.06     0.07\n\t  2 llvmpipe-12              0.06     0.07\n\t  2 llvmpipe-13              0.06     0.07\n\t  2 llvmpipe-14              0.06     0.07\n\t  2 llvmpipe-15              0.06     0.07\n\t  2 llvmpipe-2               0.06     0.07\n\t  2 llvmpipe-3               0.06     0.07\n\t  2 llvmpipe-4               0.06     0.07\n\t  2 llvmpipe-5               0.06     0.07\n\t  2 llvmpipe-6               0.06     0.07\n\t  2 llvmpipe-7               0.06     0.07\n\t  2 llvmpipe-8               0.06     0.07\n\t  2 llvmpipe-9               0.06     0.07\n\t  6 clang                    0.06     0.06\n\t  2 glxinfo                  0.06     0.03\n\t  2 glxinfo:cs0              0.06     0.03\n\t  2 glxinfo:disk$0           0.06     0.03\n\t  2 glxinfo:sh0              0.06     0.03\n\t  2 glxinfo:shlo0            0.06     0.03\n\t  3 rocminfo                 0.03     0.00\n\t  6 Worker 0                 0.00   459.87\n\t  6 Worker 1                 0.00   459.87\n\t  6 Worker 13                0.00   459.87\n\t  6 Worker 2                 0.00   459.87\n\t  6 Worker 3                 0.00   459.87\n\t  6 Worker 6                 0.00   459.87\n\t  6 Worker 7                 0.00   459.87\n\t  6 Worker 8                 0.00   459.87\n\t  6 Worker 9                 0.00   459.87\n\t  6 Worker 10                0.00   459.86\n\t  6 Worker 11                0.00   459.86\n\t  6 Worker 12                0.00   459.86\n\t  6 Worker 14                0.00   459.86\n\t  6 Worker 4                 0.00   459.86\n\t  6 Worker 5                 0.00   459.86\n\t  1 lspci                    0.00     0.03\n\t  1 ps                       0.00     0.01\n\t 88 sh                       0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t  9 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  4 gmain                    0.00     0.00\n\t  3 dconf worker             0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Computation blocks<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      998072) etcpak           cpu=2 start=5.66  finish=13.05\n        998073) etcpak           cpu=2 start=5.66  finish=13.05\n          998074) etcpak           cpu=1 start=5.66  finish=6.72 \n          998075) Worker 0         cpu=-1 start=6.72  finish=13.04\n          998076) Worker 1         cpu=-1 start=6.73  finish=13.04\n          998077) Worker 2         cpu=-1 start=6.73  finish=13.04\n          998078) Worker 3         cpu=-1 start=6.73  finish=13.04\n          998079) Worker 4         cpu=-1 start=6.73  finish=13.04\n          998080) Worker 5         cpu=-1 start=6.73  finish=13.04\n          998081) Worker 6         cpu=-1 start=6.73  finish=13.04\n          998082) Worker 7         cpu=-1 start=6.73  finish=13.04\n          998083) Worker 8         cpu=-1 start=6.73  finish=13.04\n          998084) Worker 9         cpu=-1 start=6.73  finish=13.04\n          998085) Worker 10        cpu=-1 start=6.73  finish=13.04\n          998086) Worker 11        cpu=-1 start=6.73  finish=13.04\n          998087) Worker 12        cpu=-1 start=6.73  finish=13.04\n          998088) Worker 13        cpu=-1 start=6.73  finish=13.04\n          998089) Worker 14        cpu=-1 start=6.73  finish=13.04\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>A fast version of ericson texture compression. This has four workloads. The first two look multi-threaded and the last two single-threaded. Topdown profile shows an overall higher retirement rate with some backend stalls. Branch misses look higher than average. AMD <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/etcpak\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2026","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2026","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=2026"}],"version-history":[{"count":3,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2026\/revisions"}],"predecessor-version":[{"id":2036,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2026\/revisions\/2036"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=2026"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}