{"id":1143,"date":"2024-01-31T01:15:28","date_gmt":"2024-01-31T01:15:28","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=1143"},"modified":"2024-01-31T13:59:28","modified_gmt":"2024-01-31T13:59:28","slug":"vkpeak","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/vkpeak\/","title":{"rendered":"vkpeak"},"content":{"rendered":"\n<p>A Vulkan compute benchmark. Perhaps using GPU more than CPU, but interesting to see as a workload. The scores for AMD are ~4x that of Intel and the &#8220;on cpu&#8221; for AMD is extremely low, so likely this is a GPU benchmark on AMD and not a CPU benchmark.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-90.png\" alt=\"\" class=\"wp-image-1178\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-90.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-90-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-90-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile not as interesting for a GPU workload. Most of the limited time is in frontend stalls.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-127.png\" alt=\"\" class=\"wp-image-1179\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-127.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-127-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-127-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              553.514\non_cpu               0.000          # 0.00 \/ 16 cores\nutime                1.526\nstime                1.212\nnvcsw                2733           # 86.98%\nnivcsw               409            # 13.02%\ninblock              0              # 0.00\/sec\nonblock              12856          # 23.23\/sec\ncpu-clock            2855338425     # 2.855 seconds\ntask-clock           2868514388     # 2.869 seconds\npage faults          188333         # 65655.240\/sec\ncontext switches     5740           # 2001.036\/sec\ncpu migrations       365            # 127.244\/sec\nmajor page faults    2              # 0.697\/sec\nminor page faults    188331         # 65654.543\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             2368093368     # 193.615 branches per 1000 inst\nbranch misses        95611140       # 4.04% branch miss\nconditional          1562371350     # 127.739 conditional branches per 1000 inst\nindirect             71829798       # 5.873 indirect branches per 1000 inst\ncpu-cycles           6729727183     # 0.00 GHz\ninstructions         11277840703    # 1.68 IPC\nslots                14128227432    #\nretiring             4129006916     # 29.2% (29.2%)\n-- ucode             14670688       #     0.1%\n-- fastpath          4114336228     #    29.1%\nfrontend             6880000619     # 48.7% (48.7%) high\n-- latency           5825721654     #    41.2%\n-- bandwidth         1054278965     #     7.5%\nbackend              2397483487     # 17.0% (17.0%) low\n-- cpu               367027234      #     2.6%\n-- memory            2030456253     #    14.4%\nspeculation          712684044      #  5.0% ( 5.0%)\n-- branch mispredict 705290499      #     5.0%\n-- pipeline restart  7393545        #     0.1%\nsmt-contention       8850370        #  0.1% ( 0.0%)\ncpu-cycles           6758209267     # 0.00 GHz\ninstructions         11770449191    # 1.74 IPC\ninstructions         4184068846     # 37.128 l2 access per 1000 inst\nl2 hit from l1       133916262      # 20.49% l2 miss\nl2 miss from l1      20575948       #\nl2 hit from l2 pf    10174800       #\nl3 hit from l2 pf    5159661        #\nl3 miss from l2 pf   6096840        #\ninstructions         4024389819     # 16.687 float per 1000 inst\nfloat 512            60             # 0.000 AVX-512 per 1000 inst\nfloat 256            620            # 0.000 AVX-256 per 1000 inst\nfloat 128            67152429       # 16.686 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              487.511\non_cpu               0.000          # 0.00 \/ 16 cores\nutime                1.679\nstime                0.655\nnvcsw                2699           # 93.98%\nnivcsw               173            # 6.02%\ninblock              15608          # 32.02\/sec\nonblock              1944           # 3.99\/sec\ncpu-clock            2416426268     # 2.416 seconds\ntask-clock           2426124568     # 2.426 seconds\npage faults          173841         # 71653.782\/sec\ncontext switches     5143           # 2119.842\/sec\ncpu migrations       261            # 107.579\/sec\nmajor page faults    77             # 31.738\/sec\nminor page faults    173764         # 71622.044\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             2151066026     # 188.267 branches per 1000 inst\nbranch misses        28106902       # 1.31% branch miss\nconditional          2151078154     # 188.268 conditional branches per 1000 inst\nindirect             74254188       # 6.499 indirect branches per 1000 inst\nslots                31725273092    #\nretiring             10600531446    # 33.4% (33.4%)\n-- ucode             1170997398     #     3.7%\n-- fastpath          9429534048     #    29.7%\nfrontend             8715553434     # 27.5% (27.5%)\n-- latency           4390001681     #    13.8%\n-- bandwidth         4325551753     #    13.6%\nbackend              8702700014     # 27.4% (27.4%)\n-- cpu               2757085515     #     8.7%\n-- memory            5945614499     #    18.7%\nspeculation          3449049077     # 10.9% (10.9%) high\n-- branch mispredict 3212998733     #    10.1%\n-- pipeline restart  236050344      #     0.7%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           5320185366     # 0.00 GHz\ninstructions         10362909825    # 1.95 IPC\nl2 access            370759599      # 36.047 l2 access per 1000 inst\nl2 miss              143647561      # 38.74% l2 miss\n<\/code><\/pre>\n\n\n\n<p>Process overview shows the time is in the test scaffold<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>356 processes\n\t 68 clinfo                  19.85     6.32\n\t 38 vulkaninfo               1.14     1.50\n\t  6 vkpeak                   0.43     0.36\n\t  3 vkpeak:disk$0            0.43     0.36\n\t  6 glxinfo:gdrv0            0.13     0.04\n\t  6 glxinfo:gl0              0.13     0.04\n\t  4 vulkani:disk$0           0.12     0.16\n\t  2 glxinfo                  0.08     0.02\n\t  2 glxinfo:cs0              0.08     0.02\n\t  2 glxinfo:disk$0           0.08     0.02\n\t  2 glxinfo:sh0              0.08     0.02\n\t  2 glxinfo:shlo0            0.08     0.02\n\t  6 php                      0.06     0.23\n\t  2 llvmpipe-0               0.06     0.08\n\t  2 llvmpipe-1               0.06     0.08\n\t  2 llvmpipe-10              0.06     0.08\n\t  2 llvmpipe-11              0.06     0.08\n\t  2 llvmpipe-12              0.06     0.08\n\t  2 llvmpipe-13              0.06     0.08\n\t  2 llvmpipe-14              0.06     0.08\n\t  2 llvmpipe-15              0.06     0.08\n\t  2 llvmpipe-2               0.06     0.08\n\t  2 llvmpipe-3               0.06     0.08\n\t  2 llvmpipe-4               0.06     0.08\n\t  2 llvmpipe-5               0.06     0.08\n\t  2 llvmpipe-6               0.06     0.08\n\t  2 llvmpipe-7               0.06     0.08\n\t  2 llvmpipe-8               0.06     0.08\n\t  2 llvmpipe-9               0.06     0.08\n\t  6 clang                    0.05     0.07\n\t  3 rocminfo                 0.03     0.00\n\t  1 lspci                    0.01     0.02\n\t  1 ps                       0.00     0.01\n\t 82 sh                       0.00     0.00\n\t 15 gsettings                0.00     0.00\n\t 12 gcc                      0.00     0.00\n\t  9 systemd-detect-          0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 cc                       0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 gmain                    0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Computation blocks<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      2603431) vkpeak           cpu=3 start=6.66  finish=185.81\n        2603432) vkpeak           cpu=15 start=6.67  finish=185.81\n          2603433) vkpeak:disk$0    cpu=13 start=6.70  finish=185.81\n      2603439) vkpeak           cpu=4 start=189.82 finish=369.02\n        2603440) vkpeak           cpu=15 start=189.82 finish=369.01\n          2603441) vkpeak:disk$0    cpu=9 start=189.85 finish=369.01\n      2603443) vkpeak           cpu=8 start=373.02 finish=552.26\n        2603444) vkpeak           cpu=11 start=373.03 finish=552.25\n          2603445) vkpeak:disk$0    cpu=15 start=373.06 finish=552.25\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>A Vulkan compute benchmark. Perhaps using GPU more than CPU, but interesting to see as a workload. The scores for AMD are ~4x that of Intel and the &#8220;on cpu&#8221; for AMD is extremely low, so likely this is a <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/vkpeak\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-1143","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1143","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=1143"}],"version-history":[{"count":3,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1143\/revisions"}],"predecessor-version":[{"id":1180,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1143\/revisions\/1180"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=1143"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}