{"id":2186,"date":"2024-03-24T20:19:28","date_gmt":"2024-03-24T20:19:28","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=2186"},"modified":"2024-03-25T01:18:27","modified_gmt":"2024-03-25T01:18:27","slug":"palabos","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/palabos\/","title":{"rendered":"palabos"},"content":{"rendered":"\n<p>A framework for general purpose Computational Fluid Dynamics (CFD). This has five workload of the Cavity3d benchmark with different sizes. The first three work on AMD system and first two work on Intel system. The other larger ones fail. Looks like these run on the physical (not hyperthreaded) cores, perhaps with MPI.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-43.png\" alt=\"\" class=\"wp-image-2210\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-43.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-43-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-43-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows a workload with increasing backend stalls and lower retirement rate.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-45.png\" alt=\"\" class=\"wp-image-2212\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-45.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-45-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-45-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show this as floating point code with a low level of frontend stalls or speculation stalls.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              836.698\non_cpu               0.435          # 6.96 \/ 16 cores\nutime                5702.901\nstime                123.549\nnvcsw                123143         # 87.25%\nnivcsw               17988          # 12.75%\ninblock              2196096        # 2624.72\/sec\nonblock              265480         # 317.29\/sec\ncpu-clock            5887597402841  # 5887.597 seconds\ntask-clock           5887670199615  # 5887.670 seconds\npage faults          53493667       # 9085.711\/sec\ncontext switches     226140         # 38.409\/sec\ncpu migrations       12308          # 2.090\/sec\nmajor page faults    17874          # 3.036\/sec\nminor page faults    53475762       # 9082.669\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             1869589369583  # 59.480 branches per 1000 inst\nbranch misses        10705318795    # 0.57% branch miss\nconditional          1042825476361  # 33.177 conditional branches per 1000 inst\nindirect             129681340683   # 4.126 indirect branches per 1000 inst\ncpu-cycles           26305189422647 # 1.94 GHz\ninstructions         31441737916631 # 1.20 IPC\nslots                52622345068602 #\nretiring             10889011000500 # 20.7% (20.7%)\n-- ucode             30457782863    #     0.1%\n-- fastpath          10858553217637 #    20.6%\nfrontend             1557951220896  #  3.0% ( 3.0%) low\n-- latency           979774237896   #     1.9%\n-- bandwidth         578176983000   #     1.1%\nbackend              40040356750516 # 76.1% (76.1%) high\n-- cpu               6363537379963  #    12.1%\n-- memory            33676819370553 #    64.0%\nspeculation          100408613268   #  0.2% ( 0.2%) low\n-- branch mispredict 91032876463    #     0.2%\n-- pipeline restart  9375736805     #     0.0%\nsmt-contention       34598398476    #  0.1% ( 0.0%)\ncpu-cycles           26161990460125 # 1.94 GHz\ninstructions         31564444349673 # 1.21 IPC\ninstructions         10526717004116 # 17.343 l2 access per 1000 inst\nl2 hit from l1       128701953755   # 39.78% l2 miss\nl2 miss from l1      37747882576    #\nl2 hit from l2 pf    18980760670    #\nl3 hit from l2 pf    4459996432     #\nl3 miss from l2 pf   30423851490    #\ninstructions         10525823050089 # 357.908 float per 1000 inst\nfloat 512            126            # 0.000 AVX-512 per 1000 inst\nfloat 256            1080           # 0.000 AVX-256 per 1000 inst\nfloat 128            3767274378333  # 357.908 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\ninstructions         31499731796006 #\nopcache              3728490971511  # 118.366 opcache per 1000 inst\nopcache miss         92748095595    #  2.5% opcache miss rate\nl1 dTLB miss         12301153494    # 0.391 L1 dTLB per 1000 inst\nl2 dTLB miss         4858292548     # 0.154 L2 dTLB per 1000 inst\ninstructions         31580053403705 #\nicache               191823644658   # 6.074 icache per 1000 inst\nicache miss          6594453889     #  3.4% icache miss rate\nl1 iTLB miss         36650924       # 0.001 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            510515         # 0.000 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics show a 30% dram bound aspect to the backend stalls.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              1048.526\non_cpu               0.668          # 10.69 \/ 16 cores\nutime                10973.031\nstime                231.736\nnvcsw                139902         # 82.38%\nnivcsw               29919          # 17.62%\ninblock              719856         # 686.54\/sec\nonblock              254096         # 242.34\/sec\ncpu-clock            11324229583345 # 11324.230 seconds\ntask-clock           11324321973368 # 11324.322 seconds\npage faults          38708439       # 3418.168\/sec\ncontext switches     339854         # 30.011\/sec\ncpu migrations       45994          # 4.062\/sec\nmajor page faults    19561          # 1.727\/sec\nminor page faults    38688804       # 3416.434\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             5415329661932  # 102.842 branches per 1000 inst\nbranch misses        13712134509    # 0.25% branch miss\nconditional          5415329691340  # 102.842 conditional branches per 1000 inst\nindirect             1101290966057  # 20.914 indirect branches per 1000 inst\nslots                75572942608994 #\nretiring             30196634336577 # 40.0% (40.0%)\n-- ucode             2181580200259  #     2.9%\n-- fastpath          28015054136318 #    37.1%\nfrontend             2905024145432  #  3.8% ( 3.8%) low\n-- latency           1591010362308  #     2.1%\n-- bandwidth         1314013783124  #     1.7%\nbackend              40975150606718 # 54.2% (54.2%)\n-- cpu               10926288753421 #    14.5%\n-- memory            30048861853297 #    39.8%\nspeculation          2197782267550  #  2.9% ( 2.9%)\n-- branch mispredict 1871529610415  #     2.5%\n-- pipeline restart  326252657135   #     0.4%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           33605195706463 # 2.00 GHz\ninstructions         80442309203356 # 2.39 IPC\nl2 access            208504898413   # 6.909 l2 access per 1000 inst\nl2 miss              113006068651   # 54.20% l2 miss\ncpu-cycles           12599040970847 # 41.5% memory latency\nload stalls          5100446679052  #  0.6% l1 bound\nl1 miss              5028041939034  #  4.0% l2 bound\nl2 miss              4528516333239  #  5.3% l3 bound\nl3 miss              3866423296808  # 30.7% dram bound\nstore_stalls         123166211851   #  1.0% store bound\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>A framework for general purpose Computational Fluid Dynamics (CFD). This has five workload of the Cavity3d benchmark with different sizes. The first three work on AMD system and first two work on Intel system. The other larger ones fail. Looks <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/palabos\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2186","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2186","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=2186"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2186\/revisions"}],"predecessor-version":[{"id":2213,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2186\/revisions\/2213"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=2186"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}