{"id":340,"date":"2024-01-07T15:12:32","date_gmt":"2024-01-07T15:12:32","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=340"},"modified":"2024-01-07T21:48:47","modified_gmt":"2024-01-07T21:48:47","slug":"openvino","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/openvino\/","title":{"rendered":"openvino"},"content":{"rendered":"\n<p>Test of OpenVino with Intel internal tests.  There is a sequence of 18 different tests with different profiles as show below but overall high amounts of backend memory waiting.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-28.png\" alt=\"\" class=\"wp-image-345\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-28.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-28-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-28-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics.  One thing that surprises me is not as much floating point code as I might expect. Also not many branches.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              3666.089\non_cpu               0.887          # 14.20 \/ 16 cores\nutime                50536.802\nstime                1508.811\nnvcsw                25564303       # 63.91%\nnivcsw               14438697       # 36.09%\ninblock              647168         # 176.53\/sec\nonblock              450208         # 122.80\/sec\ncpu-clock            52059503721702 # 52059.504 seconds\ntask-clock           52065965279085 # 52065.965 seconds\npage faults          4182456        # 80.330\/sec\ncontext switches     40020993       # 768.659\/sec\ncpu migrations       3345690        # 64.259\/sec\nmajor page faults    3308           # 0.064\/sec\nminor page faults    4179147        # 80.266\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             6104425916302  # 35.074 branches per 1000 inst\nbranch misses        146139238875   # 2.39% branch miss\nconditional          4259728336297  # 24.475 conditional branches per 1000 inst\nindirect             335057737990   # 1.925 indirect branches per 1000 inst\ncpu-cycles           210012994377891 # 3.64 GHz\ninstructions         170500080670606 # 0.81 IPC\nslots                419981201012916 #\nretiring             59508125166069 # 14.2% (18.2%)\n-- ucode             162907627794   #     0.0%\n-- fastpath          59345217538275 #    14.1%\nfrontend             20743298000928 #  4.9% ( 6.3%)\n-- latency           16183935494766 #     3.9%\n-- bandwidth         4559362506162  #     1.1%\nbackend              246776556780422 # 58.8% (75.3%)\n-- cpu               167073230539072 #    39.8%\n-- memory            79703326241350 #    19.0%\nspeculation          774774235160   #  0.2% ( 0.2%)\n-- branch mispredict 732884814485   #     0.2%\n-- pipeline restart  41889420675    #     0.0%\nsmt-contention       92175643332376 # 21.9% ( 0.0%)\ncpu-cycles           209906778717120 # 3.64 GHz\ninstructions         170496426937503 # 0.81 IPC\ninstructions         56819543515048 # 105.906 l2 access per 1000 inst\nl2 hit from l1       4611962526922  # 11.12% l2 miss\nl2 miss from l1      247542205697   #\nl2 hit from l2 pf    984103822417   #\nl3 hit from l2 pf    329461554612   #\nl3 miss from l2 pf   91984854375    #\ninstructions         56801900395044 # 36.633 float per 1000 inst\nfloat 512            139            # 0.000 AVX-512 per 1000 inst\nfloat 256            388543342299   # 6.840 AVX-256 per 1000 inst\nfloat 128            1692275011173  # 29.793 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Corresponding Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              4364.999\non_cpu               0.888          # 14.20 \/ 16 cores\nutime                61143.155\nstime                844.689\nnvcsw                15956416       # 71.35%\nnivcsw               6406167        # 28.65%\ninblock              1066104        # 244.24\/sec\nonblock              450688         # 103.25\/sec\ncpu-clock            61996971778983 # 61996.972 seconds\ntask-clock           62000399512424 # 62000.400 seconds\npage faults          7556280        # 121.875\/sec\ncontext switches     22384091       # 361.031\/sec\ncpu migrations       5449898        # 87.901\/sec\nmajor page faults    5233           # 0.084\/sec\nminor page faults    7551047        # 121.790\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             5124546089326  # 22.836 branches per 1000 inst\nbranch misses        19662667907    # 0.38% branch miss\nconditional          5124546172878  # 22.836 conditional branches per 1000 inst\nindirect             1443260482309  # 6.432 indirect branches per 1000 inst\nslots                238516915623770 #\nretiring             117670618634802 # 49.3% (49.3%)\n-- ucode             5510128010357  #     2.3%\n-- fastpath          112160490624445 #    47.0%\nfrontend             53552026560725 # 22.5% (22.5%)\n-- latency           46173376494573 #    19.4%\n-- bandwidth         7378650066152  #     3.1%\nbackend              65661215027393 # 27.5% (27.5%)\n-- cpu               31552095091589 #    13.2%\n-- memory            34109119935804 #    14.3%\nspeculation          3334009366321  #  1.4% ( 1.4%)\n-- branch mispredict 2300226126067  #     1.0%\n-- pipeline restart  1033783240254  #     0.4%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           154893255896222 # 2.54 GHz\ninstructions         231596341850754 # 1.50 IPC\nl2 access            6182143166384  # 51.532 l2 access per 1000 inst\nl2 miss              1297190616085  # 20.98% l2 miss\n<\/code><\/pre>\n\n\n\n<p>Most the time all in the benchmark_app. There was still some mix op processes hanging so don&#8217;t have all the processes here, though it is close in elapsed time.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>2111 processes\n\t1757 benchmark_app        1624694.47 39658.01\n\t 32 clinfo                   5.12     1.92\n\t 19 vulkaninfo               0.38     0.39\n\t  3 glxinfo:gdrv0            0.05     0.03\n\t  2 vulkani:disk$0           0.04     0.05\n\t  6 clang                    0.03     0.03\n\t  1 glxinfo                  0.03     0.01\n\t  1 glxinfo:cs0              0.03     0.01\n\t  1 glxinfo:disk$0           0.03     0.01\n\t  1 glxinfo:sh0              0.03     0.01\n\t  1 glxinfo:shlo0            0.03     0.01\n\t  1 llvmpipe-0               0.02     0.02\n\t  1 llvmpipe-1               0.02     0.02\n\t  1 llvmpipe-10              0.02     0.02\n\t  1 llvmpipe-11              0.02     0.02\n\t  1 llvmpipe-12              0.02     0.02\n\t  1 llvmpipe-13              0.02     0.02\n\t  1 llvmpipe-14              0.02     0.02\n\t  1 llvmpipe-15              0.02     0.02\n\t  1 llvmpipe-2               0.02     0.02\n\t  1 llvmpipe-3               0.02     0.02\n\t  1 llvmpipe-4               0.02     0.02\n\t  1 llvmpipe-5               0.02     0.02\n\t  1 llvmpipe-6               0.02     0.02\n\t  1 llvmpipe-7               0.02     0.02\n\t  1 llvmpipe-8               0.02     0.02\n\t  1 llvmpipe-9               0.02     0.02\n\t 99 sh                       0.00     0.00\n\t 54 openvino                 0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t  9 stty                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  7 gsettings                0.00     0.00\n\t  7 stat                     0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 gmain                    0.00     0.00\n\t  4 phoronix-test-s          0.00     0.00\n\t  3 dconf worker             0.00     0.00\n\t  2 which                    0.00     0.00\n\t  1 cc                       0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lscpu                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 ps                       0.00     0.00\n\t  1 python3                  0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n\t  1 xset                     0.00     0.00\n34 processes running\n68 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Relatively straightforward block when the benchmark runs.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      59339) openvino         cpu=3 start=5.96  finish=68.21\n        59340) benchmark_app    cpu=2 start=5.96  finish=68.17\n          59341) benchmark_app    cpu=12 start=5.99  finish=68.17\n          59342) benchmark_app    cpu=13 start=5.99  finish=68.17\n          59343) benchmark_app    cpu=14 start=5.99  finish=68.17\n          59344) benchmark_app    cpu=10 start=5.99  finish=68.17\n          59345) benchmark_app    cpu=9 start=5.99  finish=68.17\n          59346) benchmark_app    cpu=8 start=5.99  finish=68.17\n          59347) benchmark_app    cpu=11 start=5.99  finish=68.17\n          59348) benchmark_app    cpu=7 start=5.99  finish=68.17\n          59349) benchmark_app    cpu=0 start=5.99  finish=68.17\n          59350) benchmark_app    cpu=4 start=5.99  finish=68.17\n          59351) benchmark_app    cpu=5 start=5.99  finish=68.17\n          59352) benchmark_app    cpu=6 start=5.99  finish=68.17\n          59353) benchmark_app    cpu=2 start=5.99  finish=68.17\n          59354) benchmark_app    cpu=3 start=5.99  finish=68.17\n          59355) benchmark_app    cpu=15 start=5.99  finish=68.17\n          59356) benchmark_app    cpu=12 start=6.38  finish=68.16\n            59369) benchmark_app    cpu=6 start=7.21  finish=68.17\n            59372) benchmark_app    cpu=10 start=7.21  finish=68.17\n          59357) benchmark_app    cpu=8 start=6.38  finish=68.16\n            59364) benchmark_app    cpu=7 start=6.79  finish=68.17\n            59365) benchmark_app    cpu=9 start=6.79  finish=68.17\n              59368) benchmark_app    cpu=15 start=7.21  finish=68.17\n          59358) benchmark_app    cpu=0 start=6.38  finish=68.16\n            59361) benchmark_app    cpu=5 start=6.78  finish=68.17\n              59363) benchmark_app    cpu=0 start=6.78  finish=68.17\n              59366) benchmark_app    cpu=11 start=6.79  finish=68.17\n              59367) benchmark_app    cpu=3 start=7.21  finish=68.17\n              59371) benchmark_app    cpu=14 start=7.21  finish=68.17\n            59362) benchmark_app    cpu=13 start=6.78  finish=68.17\n          59359) benchmark_app    cpu=4 start=6.38  finish=68.16\n            59370) benchmark_app    cpu=4 start=7.21  finish=68.17\n          59360) benchmark_app    cpu=0 start=6.39  finish=68.16<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Test of OpenVino with Intel internal tests. There is a sequence of 18 different tests with different profiles as show below but overall high amounts of backend memory waiting. AMD metrics. One thing that surprises me is not as much <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/openvino\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-340","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/340","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=340"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/340\/revisions"}],"predecessor-version":[{"id":346,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/340\/revisions\/346"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=340"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}