{"id":413,"date":"2024-01-11T12:25:02","date_gmt":"2024-01-11T12:25:02","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=413"},"modified":"2024-01-12T00:41:04","modified_gmt":"2024-01-12T00:41:04","slug":"openvkl","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/openvkl\/","title":{"rendered":"openvkl"},"content":{"rendered":"\n<p>OpenVKL does &#8220;volume computational kernels&#8221; as part of Intel&#8217;s rendering toolkit. There are two workloads (and a third that does SYCL that didn&#8217;t run). Overall metrics are more middle of the road with some backend memory stalls but also ok retirement rate. Pattern below suggests multiple phases through the code.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-37.png\" alt=\"\" class=\"wp-image-418\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-37.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-37-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-37-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show floating point intensive codes with smaller amount of branches.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              3971.149\non_cpu               0.896          # 14.34 \/ 16 cores\nutime                56359.739\nstime                584.664\nnvcsw                33937687       # 98.60%\nnivcsw               480386         # 1.40%\ninblock              1002920        # 252.55\/sec\nonblock              8052352        # 2027.71\/sec\ncpu-clock            56931734359547 # 56931.734 seconds\ntask-clock           56941310564491 # 56941.311 seconds\npage faults          47461092       # 833.509\/sec\ncontext switches     34437705       # 604.793\/sec\ncpu migrations       20220          # 0.355\/sec\nmajor page faults    5466           # 0.096\/sec\nminor page faults    47455626       # 833.413\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             19400954559475 # 97.035 branches per 1000 inst\nbranch misses        201690377818   # 1.04% branch miss\nconditional          12895106123407 # 64.496 conditional branches per 1000 inst\nindirect             1374941034553  # 6.877 indirect branches per 1000 inst\ncpu-cycles           235999440720760 # 3.72 GHz\ninstructions         199532654165200 # 0.85 IPC\nslots                471738528929274 #\nretiring             103684313307317 # 22.0% (27.1%)\n-- ucode             587326337589   #     0.1%\n-- fastpath          103096986969728 #    21.9%\nfrontend             46597727613920 #  9.9% (12.2%)\n-- latency           41563283533494 #     8.8%\n-- bandwidth         5034444080426  #     1.1%\nbackend              226997644094787 # 48.1% (59.2%)\n-- cpu               88712172916793 #    18.8%\n-- memory            138285471177994 #    29.3%\nspeculation          5850505760480  #  1.2% ( 1.5%)\n-- branch mispredict 4911986409490  #     1.0%\n-- pipeline restart  938519350990   #     0.2%\nsmt-contention       88602923627941 # 18.8% ( 0.0%)\ncpu-cycles           234813534923853 # 3.71 GHz\ninstructions         199261913372212 # 0.85 IPC\ninstructions         66407166959113 # 17.629 l2 access per 1000 inst\nl2 hit from l1       905877907882   # 24.18% l2 miss\nl2 miss from l1      164919857111   #\nl2 hit from l2 pf    146645173962   #\nl3 hit from l2 pf    48609945732    #\nl3 miss from l2 pf   69549635740    #\ninstructions         66387630998801 # 375.847 float per 1000 inst\nfloat 512            93             # 0.000 AVX-512 per 1000 inst\nfloat 256            27363772230    # 0.412 AVX-256 per 1000 inst\nfloat 128            24924227430959 # 375.435 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         3              # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              4136.591\non_cpu               0.896          # 14.33 \/ 16 cores\nutime                57965.710\nstime                1331.660\nnvcsw                76665884       # 99.28%\nnivcsw               555305         # 0.72%\ninblock              816520         # 197.39\/sec\nonblock              7819888        # 1890.42\/sec\ncpu-clock            59247135542528 # 59247.136 seconds\ntask-clock           59259522665649 # 59259.523 seconds\npage faults          43320780       # 731.035\/sec\ncontext switches     77241651       # 1303.447\/sec\ncpu migrations       46304          # 0.781\/sec\nmajor page faults    4528           # 0.076\/sec\nminor page faults    43316252       # 730.959\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             19225950316371 # 91.942 branches per 1000 inst\nbranch misses        198690011214   # 1.03% branch miss\nconditional          19225950333715 # 91.942 conditional branches per 1000 inst\nindirect             4849736895446  # 23.192 indirect branches per 1000 inst\nslots                282541754736980 #\nretiring             146147493214299 # 51.7% (51.7%)\n-- ucode             14751838264754 #     5.2%\n-- fastpath          131395654949545 #    46.5%\nfrontend             55659811470922 # 19.7% (19.7%)\n-- latency           42073001368270 #    14.9%\n-- bandwidth         13586810102652 #     4.8%\nbackend              69719964764269 # 24.7% (24.7%)\n-- cpu               36329134258130 #    12.9%\n-- memory            33390830506139 #    11.8%\nspeculation          14637555457175 #  5.2% ( 5.2%)\n-- branch mispredict 14202906582505 #     5.0%\n-- pipeline restart  434648874670   #     0.2%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           182201387562091 # 2.77 GHz\ninstructions         253179753855479 # 1.39 IPC\nl2 access            1615030149531  # 12.736 l2 access per 1000 inst\nl2 miss              684417918973   # 42.38% l2 miss\n<\/code><\/pre>\n\n\n\n<p>Process information crashed part way through so was incomplete but pieces shown below.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>328 processes\n\t 80 vklBenchmarkCPU      743574.56  8707.04\n\t 34 clinfo                   9.58     3.34\n\t 19 vulkaninfo               0.19     0.96\n\t  3 glxinfo:gdrv0            0.11     0.01\n\t  6 clang                    0.06     0.06\n\t  1 glxinfo                  0.05     0.01\n\t  1 glxinfo:cs0              0.05     0.01\n\t  1 glxinfo:disk$0           0.05     0.01\n\t  1 glxinfo:sh0              0.05     0.01\n\t  1 glxinfo:shlo0            0.05     0.01\n\t  2 vulkani:disk$0           0.02     0.11\n\t  1 llvmpipe-1               0.01     0.06\n\t  1 llvmpipe-10              0.01     0.06\n\t  1 llvmpipe-11              0.01     0.06\n\t  1 llvmpipe-12              0.01     0.06\n\t  1 llvmpipe-13              0.01     0.06\n\t  1 llvmpipe-14              0.01     0.06\n\t  1 llvmpipe-15              0.01     0.06\n\t  1 llvmpipe-2               0.01     0.06\n\t  1 llvmpipe-3               0.01     0.06\n\t  1 llvmpipe-4               0.01     0.06\n\t  1 llvmpipe-5               0.01     0.06\n\t  1 llvmpipe-6               0.01     0.06\n\t  1 llvmpipe-7               0.01     0.06\n\t  1 llvmpipe-8               0.01     0.06\n\t  1 llvmpipe-9               0.01     0.06\n\t  1 llvmpipe-0               0.01     0.05\n\t  1 ps                       0.00     0.01\n\t 59 sh                       0.00     0.00\n\t 13 gsettings                0.00     0.00\n\t 12 gcc                      0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  7 stat                     0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 openvkl                  0.00     0.00\n\t  4 phoronix-test-s          0.00     0.00\n\t  2 gmain                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lscpu                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n\t  1 xset                     0.00     0.00\n26 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Core computation blocks look as follows<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      6550) openvkl          cpu=0 start=648.38 finish=1286.06\n        6551) vklBenchmarkCPU  cpu=12 start=648.38 finish=1286.01\n          6552) vklBenchmarkCPU  cpu=7 start=648.42 finish=1286.01\n            6554) vklBenchmarkCPU  cpu=6 start=648.42 finish=1286.01\n              6557) vklBenchmarkCPU  cpu=3 start=648.42 finish=1286.01\n                6565) vklBenchmarkCPU  cpu=9 start=648.42 finish=1286.01\n                6566) vklBenchmarkCPU  cpu=15 start=648.42 finish=1286.01\n              6558) vklBenchmarkCPU  cpu=14 start=648.42 finish=1286.01\n            6556) vklBenchmarkCPU  cpu=11 start=648.42 finish=1286.01\n              6561) vklBenchmarkCPU  cpu=2 start=648.42 finish=1286.01\n              6563) vklBenchmarkCPU  cpu=5 start=648.42 finish=1286.01\n          6553) vklBenchmarkCPU  cpu=4 start=648.42 finish=1286.01\n            6555) vklBenchmarkCPU  cpu=1 start=648.42 finish=1286.01\n              6560) vklBenchmarkCPU  cpu=8 start=648.42 finish=1286.01\n                6562) vklBenchmarkCPU  cpu=0 start=648.42 finish=1286.01\n              6564) vklBenchmarkCPU  cpu=13 start=648.42 finish=1286.01\n            6559) vklBenchmarkCPU  cpu=10 start=648.42 finish=1286.01<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>OpenVKL does &#8220;volume computational kernels&#8221; as part of Intel&#8217;s rendering toolkit. There are two workloads (and a third that does SYCL that didn&#8217;t run). Overall metrics are more middle of the road with some backend memory stalls but also ok <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/openvkl\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-413","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/413","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=413"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/413\/revisions"}],"predecessor-version":[{"id":419,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/413\/revisions\/419"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=413"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}