{"id":781,"date":"2024-01-20T22:54:07","date_gmt":"2024-01-20T22:54:07","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=781"},"modified":"2024-01-20T22:54:42","modified_gmt":"2024-01-20T22:54:42","slug":"mnn","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/mnn\/","title":{"rendered":"mnn"},"content":{"rendered":"\n<p>mnn is a neural network framework. This test tries eight different models and creates a geographic mean. The process overview suggests consistently spawning threads on all cores.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-48.png\" alt=\"\" class=\"wp-image-782\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-48.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-48-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-48-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown overview suggests certain phases of backend activity or frontend activity and a consistent retirement rate.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-86.png\" alt=\"\" class=\"wp-image-783\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-86.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-86-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-86-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              350.487\non_cpu               0.942          # 15.07 \/ 16 cores\nutime                3132.874\nstime                2147.667\nnvcsw                2652           # 5.72%\nnivcsw               43705          # 94.28%\ninblock              8              # 0.02\/sec\nonblock              12752          # 36.38\/sec\ncpu-clock            5280985842856  # 5280.986 seconds\ntask-clock           5281014226031  # 5281.014 seconds\npage faults          1210146        # 229.150\/sec\ncontext switches     47938          # 9.077\/sec\ncpu migrations       286            # 0.054\/sec\nmajor page faults    2              # 0.000\/sec\nminor page faults    1210144        # 229.150\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             2031821929040  # 120.505 branches per 1000 inst\nbranch misses        253888931138   # 12.50% branch miss\nconditional          970120823455   # 57.537 conditional branches per 1000 inst\nindirect             20899090234    # 1.239 indirect branches per 1000 inst\ncpu-cycles           110973875070917 # 3.88 GHz\ninstructions         86300390156541 # 0.78 IPC\nslots                221938347605922 #\nretiring             32263730366925 # 14.5% (17.3%)\n-- ucode             168671411473   #     0.1%\n-- fastpath          32095058955452 #    14.5%\nfrontend             71461075367125 # 32.2% (38.2%)\n-- latency           59166100322928 #    26.7%\n-- bandwidth         12294975044197 #     5.5%\nbackend              82836224092829 # 37.3% (44.3%)\n-- cpu               36711151611261 #    16.5%\n-- memory            46125072481568 #    20.8%\nspeculation          366713033149   #  0.2% ( 0.2%)\n-- branch mispredict 362656308286   #     0.2%\n-- pipeline restart  4056724863     #     0.0%\nsmt-contention       35010489172413 # 15.8% ( 0.0%)\ncpu-cycles           111531376124456 # 3.86 GHz\ninstructions         86619805970825 # 0.78 IPC\ninstructions         28881985494088 # 73.009 l2 access per 1000 inst\nl2 hit from l1       1518505686348  # 21.90% l2 miss\nl2 miss from l1      198514190460   #\nl2 hit from l2 pf    326870900178   #\nl3 hit from l2 pf    224827579693   #\nl3 miss from l2 pf   38451066229    #\ninstructions         28860405033995 # 19.572 float per 1000 inst\nfloat 512            54             # 0.000 AVX-512 per 1000 inst\nfloat 256            658            # 0.000 AVX-256 per 1000 inst\nfloat 128            564863466718   # 19.572 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         1              # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              2678.159\non_cpu               0.975          # 15.59 \/ 16 cores\nutime                25134.666\nstime                16629.070\nnvcsw                4543           # 0.15%\nnivcsw               3013644        # 99.85%\ninblock              1368           # 0.51\/sec\nonblock              1880           # 0.70\/sec\ncpu-clock            41764931327447 # 41764.931 seconds\ntask-clock           41765074209486 # 41765.074 seconds\npage faults          4760326        # 113.979\/sec\ncontext switches     3031380        # 72.582\/sec\ncpu migrations       516            # 0.012\/sec\nmajor page faults    1              # 0.000\/sec\nminor page faults    4760325        # 113.979\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             19697851320338 # 108.948 branches per 1000 inst\nbranch misses        11220044229    # 0.06% branch miss\nconditional          19697851339346 # 108.948 conditional branches per 1000 inst\nindirect             5502668904687  # 30.435 indirect branches per 1000 inst\nslots                46191558811640 #\nretiring             26812914980896 # 58.0% (58.0%)\n-- ucode             3186273037326  #     6.9%\n-- fastpath          23626641943570 #    51.1%\nfrontend             12488108841259 # 27.0% (27.0%)\n-- latency           7466005891595  #    16.2%\n-- bandwidth         5022102949664  #    10.9%\nbackend              6407985278786  # 13.9% (13.9%)\n-- cpu               2050555342763  #     4.4%\n-- memory            4357429936023  #     9.4%\nspeculation          385626559145   #  0.8% ( 0.8%)\n-- branch mispredict 263567987217   #     0.6%\n-- pipeline restart  122058571928   #     0.3%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           70900703783603 # 1.63 GHz\ninstructions         116275737814803 # 1.64 IPC\nl2 access            2953834438475  # 28.835 l2 access per 1000 inst\nl2 miss              650131037736   # 22.01% l2 miss\n<\/code><\/pre>\n\n\n\n<p>Process overview shows the application is named benchmark.out<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>597 processes\n\t240 benchmark.out        255571.35 185286.88\n\t 68 clinfo                  18.83     7.01\n\t 38 vulkaninfo               1.32     1.33\n\t  6 glxinfo:gdrv0            0.22     0.04\n\t  6 php                      0.15     0.25\n\t  4 vulkani:disk$0           0.14     0.14\n\t  2 glxinfo                  0.10     0.02\n\t  2 glxinfo:cs0              0.10     0.02\n\t  2 glxinfo:disk$0           0.10     0.02\n\t  2 glxinfo:sh0              0.10     0.02\n\t  2 glxinfo:shlo0            0.10     0.02\n\t  2 llvmpipe-0               0.07     0.07\n\t  2 llvmpipe-1               0.07     0.07\n\t  2 llvmpipe-10              0.07     0.07\n\t  2 llvmpipe-11              0.07     0.07\n\t  2 llvmpipe-12              0.07     0.07\n\t  2 llvmpipe-13              0.07     0.07\n\t  2 llvmpipe-14              0.07     0.07\n\t  2 llvmpipe-15              0.07     0.07\n\t  2 llvmpipe-2               0.07     0.07\n\t  2 llvmpipe-3               0.07     0.07\n\t  2 llvmpipe-4               0.07     0.07\n\t  2 llvmpipe-5               0.07     0.07\n\t  2 llvmpipe-6               0.07     0.07\n\t  2 llvmpipe-7               0.07     0.07\n\t  2 llvmpipe-8               0.07     0.07\n\t  2 llvmpipe-9               0.07     0.07\n\t  6 clang                    0.06     0.06\n\t  3 rocminfo                 0.01     0.03\n\t  1 lspci                    0.00     0.02\n\t  1 ps                       0.00     0.01\n\t 82 sh                       0.00     0.00\n\t 15 mnn                      0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t 11 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  3 gmain                    0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 dconf worker             0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Core computation blocks<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      12168) mnn              cpu=2 start=6.75  finish=128.36\n        12169) benchmark.out    cpu=12 start=6.75  finish=128.35\n          12170) benchmark.out    cpu=13 start=6.90  finish=128.35\n          12171) benchmark.out    cpu=5 start=6.90  finish=128.35\n          12172) benchmark.out    cpu=0 start=6.90  finish=128.35\n          12173) benchmark.out    cpu=9 start=6.90  finish=128.35\n          12174) benchmark.out    cpu=11 start=6.90  finish=128.35\n          12175) benchmark.out    cpu=7 start=6.90  finish=128.35\n          12176) benchmark.out    cpu=3 start=6.90  finish=128.35\n          12177) benchmark.out    cpu=14 start=6.90  finish=128.35\n          12178) benchmark.out    cpu=8 start=6.90  finish=128.35\n          12179) benchmark.out    cpu=1 start=6.90  finish=128.35\n          12180) benchmark.out    cpu=15 start=6.90  finish=128.35\n          12181) benchmark.out    cpu=10 start=6.90  finish=128.35\n          12182) benchmark.out    cpu=4 start=6.90  finish=128.35\n          12183) benchmark.out    cpu=2 start=6.90  finish=128.35\n          12184) benchmark.out    cpu=14 start=6.90  finish=128.35\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>mnn is a neural network framework. This test tries eight different models and creates a geographic mean. The process overview suggests consistently spawning threads on all cores. Topdown overview suggests certain phases of backend activity or frontend activity and a <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/mnn\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-781","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/781","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=781"}],"version-history":[{"count":1,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/781\/revisions"}],"predecessor-version":[{"id":785,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/781\/revisions\/785"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=781"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}