{"id":1774,"date":"2024-02-23T01:39:04","date_gmt":"2024-02-23T01:39:04","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=1774"},"modified":"2024-02-27T00:40:26","modified_gmt":"2024-02-27T00:40:26","slug":"pennant","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/pennant\/","title":{"rendered":"pennant"},"content":{"rendered":"\n<p>Hydrodynamics on unstructured meshes. There are two workloads.  Overall, half the threads are kept busy.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-57.png\" alt=\"\" class=\"wp-image-1786\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-57.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-57-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-57-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows backend stalls dominating wit h low frontend stalls.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-59.png\" alt=\"\" class=\"wp-image-1788\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-59.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-59-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-59-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show floating point code. Backend stalls are balanced between memory and CPU.  Frontend stalls are low.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              348.985\non_cpu               0.440          # 7.03 \/ 16 cores\nutime                2447.057\nstime                7.693\nnvcsw                64303          # 90.24%\nnivcsw               6956           # 9.76%\ninblock              0              # 0.00\/sec\nonblock              100816         # 288.88\/sec\ncpu-clock            2454839428334  # 2454.839 seconds\ntask-clock           2454864590772  # 2454.865 seconds\npage faults          585693         # 238.585\/sec\ncontext switches     72808          # 29.659\/sec\ncpu migrations       4729           # 1.926\/sec\nmajor page faults    510            # 0.208\/sec\nminor page faults    585183         # 238.377\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             1405170392315  # 66.567 branches per 1000 inst\nbranch misses        2889441026     # 0.21% branch miss\nconditional          1268663493424  # 60.100 conditional branches per 1000 inst\nindirect             23984196334    # 1.136 indirect branches per 1000 inst\ncpu-cycles           10932622319963 # 1.92 GHz\ninstructions         21196256443065 # 1.94 IPC\nslots                21869456447430 #\nretiring             7385737678440  # 33.8% (33.8%)\n-- ucode             1290957661     #     0.0%\n-- fastpath          7384446720779  #    33.8%\nfrontend             805383992505   #  3.7% ( 3.7%) low\n-- latency           491365910592   #     2.2%\n-- bandwidth         314018081913   #     1.4%\nbackend              13420593631108 # 61.4% (61.4%)\n-- cpu               6892842816665  #    31.5%\n-- memory            6527750814443  #    29.8%\nspeculation          240024275753   #  1.1% ( 1.1%)\n-- branch mispredict 109870243527   #     0.5%\n-- pipeline restart  130154032226   #     0.6%\nsmt-contention       17709448967    #  0.1% ( 0.0%)\ncpu-cycles           10923049238973 # 1.96 GHz\ninstructions         21070010698041 # 1.93 IPC\ninstructions         7025290484683  # 27.155 l2 access per 1000 inst\nl2 hit from l1       152693226788   # 17.98% l2 miss\nl2 miss from l1      8245427283     #\nl2 hit from l2 pf    12025097090    #\nl3 hit from l2 pf    1003945621     #\nl3 miss from l2 pf   25051373796    #\ninstructions         7021052031386  # 331.768 float per 1000 inst\nfloat 512            93             # 0.000 AVX-512 per 1000 inst\nfloat 256            888            # 0.000 AVX-256 per 1000 inst\nfloat 128            2329362650245  # 331.768 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\ninstructions         2384561        #\nopcache              896903         # 376.129 opcache per 1000 inst\nopcache miss         477070         # 53.2% opcache miss rate\nl1 dTLB miss         4104           # 1.721 L1 dTLB per 1000 inst\nl2 dTLB miss         973            # 0.408 L2 dTLB per 1000 inst\ninstructions         2422756        #\nicache               1198318        # 494.609 icache per 1000 inst\nicache miss          112292         #  9.4% icache miss rate\nl1 iTLB miss         8              # 0.003 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            19             # 0.008 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              2939.717\non_cpu               0.733          # 11.73 \/ 16 cores\nutime                34465.043\nstime                14.981\nnvcsw                179750         # 80.58%\nnivcsw               43316          # 19.42%\ninblock              22208          # 7.55\/sec\nonblock              185160         # 62.99\/sec\ncpu-clock            34480504457611 # 34480.504 seconds\ntask-clock           34480558245085 # 34480.558 seconds\npage faults          1314803        # 38.132\/sec\ncontext switches     237469         # 6.887\/sec\ncpu migrations       34346          # 0.996\/sec\nmajor page faults    1445           # 0.042\/sec\nminor page faults    1313358        # 38.090\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             44244560364890 # 165.908 branches per 1000 inst\nbranch misses        113417834935   # 0.26% branch miss\nconditional          44244560391034 # 165.908 conditional branches per 1000 inst\nindirect             14350734555496 # 53.812 indirect branches per 1000 inst\nslots                258818070782372 #\nretiring             165943745858655 # 64.1% (64.1%) high\n-- ucode             11698884463456 #     4.5%\n-- fastpath          154244861395199 #    59.6%\nfrontend             22228896025978 #  8.6% ( 8.6%)\n-- latency           8941122571912  #     3.5%\n-- bandwidth         13287773454066 #     5.1%\nbackend              57414286431989 # 22.2% (22.2%)\n-- cpu               40082922132210 #    15.5%\n-- memory            17331364299779 #     6.7%\nspeculation          11415921070534 #  4.4% ( 4.4%)\n-- branch mispredict 7032646758570  #     2.7%\n-- pipeline restart  4383274311964  #     1.7%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           96461605636760 # 1.97 GHz\ninstructions         388850975926936 # 4.03 IPC high\nl2 access            351070041147   # 2.360 l2 access per 1000 inst\nl2 miss              138857980889   # 39.55% l2 miss\ncpu-cycles           37072612143813 #  7.2% memory latency\nload stalls          2501412244162  #  2.1% l1 bound\nl1 miss              1724915419158  #  1.9% l2 bound\nl2 miss              1037045631873  #  0.9% l3 bound\nl3 miss              705376648870   #  1.9% dram bound\nstore_stalls         158962328723   #  0.4% store bound\n<\/code><\/pre>\n\n\n\n<p>Process summary shows the pennant process driving the majority of the time.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>428 processes\n\t150 pennant               7307.63    17.49\n\t 38 vulkaninfo               1.33     1.14\n\t 24 mpirun                   0.52     2.76\n\t  4 vulkani:disk$0           0.14     0.12\n\t  6 glxinfo:gdrv0            0.14     0.09\n\t  6 glxinfo:gl0              0.13     0.09\n\t  6 php                      0.07     0.15\n\t  2 llvmpipe-0               0.07     0.06\n\t  2 llvmpipe-1               0.07     0.06\n\t  2 llvmpipe-10              0.07     0.06\n\t  2 llvmpipe-11              0.07     0.06\n\t  2 llvmpipe-12              0.07     0.06\n\t  2 llvmpipe-13              0.07     0.06\n\t  2 llvmpipe-14              0.07     0.06\n\t  2 llvmpipe-15              0.07     0.06\n\t  2 llvmpipe-2               0.07     0.06\n\t  2 llvmpipe-3               0.07     0.06\n\t  2 llvmpipe-4               0.07     0.06\n\t  2 llvmpipe-5               0.07     0.06\n\t  2 llvmpipe-6               0.07     0.06\n\t  2 llvmpipe-7               0.07     0.06\n\t  2 llvmpipe-8               0.07     0.06\n\t  2 llvmpipe-9               0.07     0.06\n\t  2 glxinfo                  0.07     0.03\n\t  2 glxinfo:cs0              0.07     0.03\n\t  2 glxinfo:disk$0           0.07     0.03\n\t  2 glxinfo:sh0              0.07     0.03\n\t  2 glxinfo:shlo0            0.07     0.03\n\t  1 lspci                    0.02     0.01\n\t 70 sh                       0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t  8 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  5 gmain                    0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  3 dconf worker             0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 clinfo                   0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 ps                       0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Process structure shows MPI being used to distribute pennant processes.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      124349) pennant          cpu=0 start=4.89  finish=65.66\n        124350) mpirun           cpu=3 start=4.89  finish=65.66\n          124352) mpirun           cpu=15 start=5.43  finish=65.65\n          124353) mpirun           cpu=7 start=5.92  finish=65.65\n          124354) mpirun           cpu=11 start=5.92  finish=65.66\n          124355) pennant          cpu=11 start=5.95  finish=65.65\n            124357) pennant          cpu=15 start=5.96  finish=65.64\n            124361) pennant          cpu=1 start=5.96  finish=65.64\n          124356) pennant          cpu=2 start=5.96  finish=65.65\n            124360) pennant          cpu=11 start=5.96  finish=65.64\n            124364) pennant          cpu=6 start=5.97  finish=65.64\n          124358) pennant          cpu=5 start=5.96  finish=65.65\n            124362) pennant          cpu=15 start=5.97  finish=65.64\n            124367) pennant          cpu=8 start=5.97  finish=65.64\n          124359) pennant          cpu=12 start=5.96  finish=65.65\n            124366) pennant          cpu=15 start=5.97  finish=65.64\n            124370) pennant          cpu=4 start=5.97  finish=65.64\n          124363) pennant          cpu=14 start=5.97  finish=65.64\n            124369) pennant          cpu=4 start=5.97  finish=65.64\n            124373) pennant          cpu=3 start=5.98  finish=65.64\n          124365) pennant          cpu=10 start=5.97  finish=65.65\n            124372) pennant          cpu=7 start=5.98  finish=65.64\n            124375) pennant          cpu=11 start=5.98  finish=65.64\n          124368) pennant          cpu=9 start=5.97  finish=65.64\n            124374) pennant          cpu=15 start=5.98  finish=65.64\n            124377) pennant          cpu=15 start=5.99  finish=65.64\n          124371) pennant          cpu=0 start=5.98  finish=65.64\n            124376) pennant          cpu=1 start=5.98  finish=65.64\n            124378) pennant          cpu=13 start=5.99  finish=65.64\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Hydrodynamics on unstructured meshes. There are two workloads. Overall, half the threads are kept busy. Topdown profile shows backend stalls dominating wit h low frontend stalls. AMD metrics show floating point code. Backend stalls are balanced between memory and CPU. <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/pennant\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-1774","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1774","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=1774"}],"version-history":[{"count":3,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1774\/revisions"}],"predecessor-version":[{"id":1828,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1774\/revisions\/1828"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=1774"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}