{"id":192,"date":"2024-01-03T19:24:04","date_gmt":"2024-01-03T19:24:04","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=192"},"modified":"2024-01-03T19:24:05","modified_gmt":"2024-01-03T19:24:05","slug":"gromacs","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/gromacs\/","title":{"rendered":"gromacs"},"content":{"rendered":"\n<p>Molecular dynamics package for chemical simulations.  Shows a high amount of time in the backend where both CPU and memory contribute.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-8.png\" alt=\"\" class=\"wp-image-193\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-8.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-8-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-8-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show we only spend ~50% of the time on the CPU.  There is a moderate amount of I\/O so likely waiting there.  There is some floating point but much less than other codes such as namd.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              482.655\non_cpu               0.476          # 7.61 \/ 16 cores\nutime                3565.437\nstime                108.786\nnvcsw                72583          # 86.38%\nnivcsw               11441          # 13.62%\ninblock              267168         # 553.54\/sec\nonblock              751504         # 1557.02\/sec\ncpu-clock            3674433961350  # 3674.434 seconds\ntask-clock           3674472081380  # 3674.472 seconds\npage faults          5786497        # 1574.783\/sec\ncontext switches     86216          # 23.464\/sec\ncpu migrations       2090           # 0.569\/sec\nmajor page faults    2125           # 0.578\/sec\nminor page faults    5784372        # 1574.205\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             944150288240   # 31.083 branches per 1000 inst\nbranch misses        13146281378    # 1.39% branch miss\nconditional          715415461727   # 23.552 conditional branches per 1000 inst\nindirect             36389905918    # 1.198 indirect branches per 1000 inst\ncpu-cycles           15950171457573 # 2.07 GHz\ninstructions         30275006717964 # 1.90 IPC\nslots                31897501627062 #\nretiring             10256456755117 # 32.2% (32.2%)\n-- ucode             11587443035    #     0.0%\n-- fastpath          10244869312082 #    32.1%\nfrontend             1782822670565  #  5.6% ( 5.6%)\n-- latency           1041193668954  #     3.3%\n-- bandwidth         741629001611   #     2.3%\nbackend              19397722726257 # 60.8% (60.8%)\n-- cpu               8131906328178  #    25.5%\n-- memory            11265816398079 #    35.3%\nspeculation          442612899648   #  1.4% ( 1.4%)\n-- branch mispredict 424617941550   #     1.3%\n-- pipeline restart  17994958098    #     0.1%\nsmt-contention       17876810980    #  0.1% ( 0.0%)\ncpu-cycles           15914711735792 # 2.07 GHz\ninstructions         30248693256001 # 1.90 IPC\ninstructions         10090949159980 # 24.975 l2 access per 1000 inst\nl2 hit from l1       152624079844   # 18.65% l2 miss\nl2 miss from l1      16245693931    #\nl2 hit from l2 pf    68637530034    #\nl3 hit from l2 pf    3233438783     #\nl3 miss from l2 pf   27522248586    #\ninstructions         10078247012013 # 67.501 float per 1000 inst\nfloat 512            79             # 0.000 AVX-512 per 1000 inst\nfloat 256            124            # 0.000 AVX-256 per 1000 inst\nfloat 128            680293984159   # 67.501 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              832.937\non_cpu               0.731          # 11.69 \/ 16 cores\nutime                9542.274\nstime                198.469\nnvcsw                99850          # 79.04%\nnivcsw               26479          # 20.96%\ninblock              6552\nonblock              2191480\ncpu-clock            9741000680111  # 9741.001 seconds\ntask-clock           9741049746853  # 9741.050 seconds\npage faults          7930295        # 814.111\/sec\ncontext switches     130264         # 13.373\/sec\ncpu migrations       12544          # 1.288\/sec\nmajor page faults    764            # 0.078\/sec\nminor page faults    7929531        # 814.032\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             2682985873307  # 65.358 branches per 1000 inst\nbranch misses        16668425938    # 0.62% branch miss\nconditional          2682985889723  # 65.358 conditional branches per 1000 inst\nindirect             406967729044   # 9.914 indirect branches per 1000 inst\nslots                63845246869628 #\nretiring             34233072745943 # 53.6% (53.6%)\n-- ucode             1863060964672  #     2.9%\n-- fastpath          32370011781271 #    50.7%\nfrontend             8219305390172  # 12.9% (12.9%)\n-- latency           6780205223367  #    10.6%\n-- bandwidth         1439100166805  #     2.3%\nbackend              19478016926828 # 30.5% (30.5%)\n-- cpu               5033487906332  #     7.9%\n-- memory            14444529020496 #    22.6%\nspeculation          2127848416049  #  3.3% ( 3.3%)\n-- branch mispredict 2019288962047  #     3.2%\n-- pipeline restart  108559454002   #     0.2%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           22207184550713 # 1.67 GHz\ninstructions         32171673180975 # 1.45 IPC\nl2 access            432584423400   # 14.223 l2 access per 1000 inst\nl2 miss              150307005673   # 34.75% l2 miss<\/code><\/pre>\n\n\n\n<p>No process information as the program currently hangs with a parallel make.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Molecular dynamics package for chemical simulations. Shows a high amount of time in the backend where both CPU and memory contribute. AMD metrics show we only spend ~50% of the time on the CPU. There is a moderate amount of <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/gromacs\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-192","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/192","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=192"}],"version-history":[{"count":1,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/192\/revisions"}],"predecessor-version":[{"id":194,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/192\/revisions\/194"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=192"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}