{"id":1328,"date":"2024-02-03T08:35:34","date_gmt":"2024-02-03T08:35:34","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=1328"},"modified":"2024-02-03T13:28:19","modified_gmt":"2024-02-03T13:28:19","slug":"rodinia","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/rodinia\/","title":{"rendered":"rodinia"},"content":{"rendered":"\n<p>An accelerator test including OpenMP,  CUDA and OpenCL Five tests are OpenMP and two are OpenCL. The OpenCL fail so this is really five subtests.These tests take some time to settle down and at least one is single-threaded.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-15.png\" alt=\"\" class=\"wp-image-1347\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-15.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-15-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-15-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows differences in the profiles with a few higher retirement rates and others with more backend stalls.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-15.png\" alt=\"\" class=\"wp-image-1349\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-15.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-15-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-15-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show floating point code, not much L2 access and the backend stalls are more CPU than memory. The number of branches is moderate.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              2493.966\non_cpu               0.540          # 8.64 \/ 16 cores\nutime                21518.722\nstime                21.889\nnvcsw                30584          # 13.26%\nnivcsw               200012         # 86.74%\ninblock              0              # 0.00\/sec\nonblock              1021736        # 409.68\/sec\ncpu-clock            21542010047079 # 21542.010 seconds\ntask-clock           21542202986873 # 21542.203 seconds\npage faults          8858867        # 411.233\/sec\ncontext switches     242783         # 11.270\/sec\ncpu migrations       7688           # 0.357\/sec\nmajor page faults    47             # 0.002\/sec\nminor page faults    8858820        # 411.231\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             16114747147990 # 101.889 branches per 1000 inst\nbranch misses        19153578824    # 0.12% branch miss\nconditional          10609186092769 # 67.079 conditional branches per 1000 inst\nindirect             1695742623714  # 10.722 indirect branches per 1000 inst\ncpu-cycles           87215208815204 # 2.22 GHz\ninstructions         178678953072578 # 2.05 IPC\nslots                174423368262540 #\nretiring             62401214102905 # 35.8% (56.0%) high\n-- ucode             131586618110   #     0.1%\n-- fastpath          62269627484795 #    35.7%\nfrontend             2321571740608  #  1.3% ( 2.1%) low\n-- latency           706428625452   #     0.4%\n-- bandwidth         1615143115156  #     0.9%\nbackend              46480039019914 # 26.6% (41.7%)\n-- cpu               37908047673805 #    21.7%\n-- memory            8571991346109  #     4.9%\nspeculation          299016986173   #  0.2% ( 0.3%) low\n-- branch mispredict 281309161062   #     0.2%\n-- pipeline restart  17707825111    #     0.0%\nsmt-contention       62921325104833 # 36.1% ( 0.0%)\ncpu-cycles           74673715961135 # 2.10 GHz\ninstructions         152456531522489 # 2.04 IPC\ninstructions         50818766143828 # 7.328 l2 access per 1000 inst\nl2 hit from l1       196442306356   # 23.41% l2 miss\nl2 miss from l1      13403485931    #\nl2 hit from l2 pf    102187178346   #\nl3 hit from l2 pf    45428238002    #\nl3 miss from l2 pf   28327299005    #\ninstructions         50821318031341 # 323.549 float per 1000 inst\nfloat 512            97             # 0.000 AVX-512 per 1000 inst\nfloat 256            630            # 0.000 AVX-256 per 1000 inst\nfloat 128            16443198436737 # 323.549 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\ninstructions         2670454        #\nopcache              988041         # 369.990 opcache per 1000 inst\nopcache miss         532227         # 53.9% opcache miss rate\nl1 dTLB miss         5238           # 1.961 L1 dTLB per 1000 inst\nl2 dTLB miss         1129           # 0.423 L2 dTLB per 1000 inst\ninstructions         2699471        #\nicache               1306392        # 483.944 icache per 1000 inst\nicache miss          110562         #  8.5% icache miss rate\nl1 iTLB miss         13             # 0.005 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            22             # 0.008 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              1645.268\non_cpu               0.715          # 11.44 \/ 16 cores\nutime                18811.581\nstime                15.025\nnvcsw                28926          # 16.89%\nnivcsw               142292         # 83.11%\ninblock              559936         # 340.33\/sec\nonblock              230440         # 140.06\/sec\ncpu-clock            18826064492000 # 18826.064 seconds\ntask-clock           18826151799361 # 18826.152 seconds\npage faults          8281354        # 439.886\/sec\ncontext switches     179211         # 9.519\/sec\ncpu migrations       19260          # 1.023\/sec\nmajor page faults    85             # 0.005\/sec\nminor page faults    8281269        # 439.881\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             13193076197482 # 112.251 branches per 1000 inst\nbranch misses        16326408207    # 0.12% branch miss\nconditional          13193076222218 # 112.251 conditional branches per 1000 inst\nindirect             4434825753165  # 37.733 indirect branches per 1000 inst\nslots                97563657619520 #\nretiring             59925790784692 # 61.4% (61.4%) high\n-- ucode             3166614337055  #     3.2%\n-- fastpath          56759176447637 #    58.2%\nfrontend             10212256156300 # 10.5% (10.5%)\n-- latency           8543550138460  #     8.8%\n-- bandwidth         1668706017840  #     1.7%\nbackend              26403767081345 # 27.1% (27.1%)\n-- cpu               14690616659153 #    15.1%\n-- memory            11713150422192 #    12.0%\nspeculation          815731012482   #  0.8% ( 0.8%) low\n-- branch mispredict 760996322204   #     0.8%\n-- pipeline restart  54734690278    #     0.1%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           50707273870129 # 1.92 GHz\ninstructions         105254985989712 # 2.08 IPC\nl2 access            223186481571   # 3.799 l2 access per 1000 inst\nl2 miss              76102210380    # 34.10% l2 miss\n<\/code><\/pre>\n\n\n\n<p>Process overview shows each process with a name. Looks like LavaMD took a while to settle and consumes the largest share.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>979 processes\n\t 48 lavaMD               152420.00   109.60\n\t240 3D                   67068.45    21.12\n\t 48 leukocyte            48273.12   105.60\n\t 48 euler3d_cpu_dou      17002.74    51.68\n\t 48 sc_omp               12816.32     3.52\n\t204 clinfo                  58.31    18.29\n\t 38 vulkaninfo               1.33     1.33\n\t  9 OCL_particlefil          0.26     0.27\n\t  9 myocyte.out              0.20     0.26\n\t  4 vulkani:disk$0           0.14     0.14\n\t  6 glxinfo:gdrv0            0.14     0.07\n\t  6 glxinfo:gl0              0.14     0.07\n\t  6 php                      0.13     0.37\n\t  2 llvmpipe-0               0.07     0.07\n\t  2 llvmpipe-1               0.07     0.07\n\t  2 llvmpipe-10              0.07     0.07\n\t  2 llvmpipe-11              0.07     0.07\n\t  2 llvmpipe-12              0.07     0.07\n\t  2 llvmpipe-13              0.07     0.07\n\t  2 llvmpipe-14              0.07     0.07\n\t  2 llvmpipe-15              0.07     0.07\n\t  2 llvmpipe-2               0.07     0.07\n\t  2 llvmpipe-3               0.07     0.07\n\t  2 llvmpipe-4               0.07     0.07\n\t  2 llvmpipe-5               0.07     0.07\n\t  2 llvmpipe-6               0.07     0.07\n\t  2 llvmpipe-7               0.07     0.07\n\t  2 llvmpipe-8               0.07     0.07\n\t  2 llvmpipe-9               0.07     0.07\n\t  2 glxinfo                  0.07     0.04\n\t  2 glxinfo:cs0              0.06     0.03\n\t  2 glxinfo:disk$0           0.06     0.03\n\t  2 glxinfo:sh0              0.06     0.03\n\t  2 glxinfo:shlo0            0.06     0.03\n\t  6 clang                    0.04     0.04\n\t  3 rocminfo                 0.03     0.00\n\t  1 lspci                    0.01     0.02\n\t 94 sh                       0.00     0.00\n\t 33 rodinia                  0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t  9 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  4 gmain                    0.00     0.00\n\t  3 dconf worker             0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 ps                       0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Example computation blocks<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      222891) rodinia          cpu=2 start=6.65  finish=211.78\n        222892) lavaMD           cpu=1 start=6.65  finish=211.78\n          222895) lavaMD           cpu=13 start=11.00 finish=211.78\n          222896) lavaMD           cpu=8 start=11.00 finish=211.78\n          222897) lavaMD           cpu=15 start=11.00 finish=211.78\n          222898) lavaMD           cpu=0 start=11.00 finish=211.78\n          222899) lavaMD           cpu=9 start=11.00 finish=211.78\n          222900) lavaMD           cpu=10 start=11.00 finish=211.78\n          222901) lavaMD           cpu=12 start=11.00 finish=211.78\n          222902) lavaMD           cpu=11 start=11.00 finish=211.78\n          222903) lavaMD           cpu=5 start=11.00 finish=211.78\n          222904) lavaMD           cpu=6 start=11.00 finish=211.78\n          222905) lavaMD           cpu=3 start=11.00 finish=211.78\n          222906) lavaMD           cpu=2 start=11.00 finish=211.78\n          222907) lavaMD           cpu=14 start=11.00 finish=211.78\n          222908) lavaMD           cpu=7 start=11.00 finish=211.78\n          222909) lavaMD           cpu=4 start=11.00 finish=211.78\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>An accelerator test including OpenMP, CUDA and OpenCL Five tests are OpenMP and two are OpenCL. The OpenCL fail so this is really five subtests.These tests take some time to settle down and at least one is single-threaded. Topdown profile <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/rodinia\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-1328","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1328","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=1328"}],"version-history":[{"count":3,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1328\/revisions"}],"predecessor-version":[{"id":1350,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1328\/revisions\/1350"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=1328"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}