{"id":680,"date":"2024-01-19T00:45:05","date_gmt":"2024-01-19T00:45:05","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=680"},"modified":"2024-02-09T10:07:41","modified_gmt":"2024-02-09T10:07:41","slug":"nwchem","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/nwchem\/","title":{"rendered":"nwchem"},"content":{"rendered":"\n<p>nwchem is a computational chemistry package. It does not run successfully on AMD, running for ~2000 seconds before giving an error: <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>        The test quit with a non-zero exit status.\n        E: dlerror: libelf.so.0: cannot open shared object file: No such file or directory<\/code><\/pre>\n\n\n\n<p>I couldn&#8217;t find libelf.so.0 anywhere. I tried putting i a link to libelf.so.1 and then got a different error<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>       The test quit with a non-zero exit status.\n        E: MPI_ABORT was invoked on rank 3 in communicator MPI COMMUNICATOR 3 DUP FROM 0<\/code><\/pre>\n\n\n\n<p>It does run to completion with successful result on Intel CPU. A total of ~10,600 seconds.  <\/p>\n\n\n\n<p>It also runs successfully on my AMD 5950X system suggesting some form of software configuration issue.<\/p>\n\n\n\n<p>So below is a mixed report with Intel and AMD 5950X. Almost all the time running on all half the cores.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-39.png\" alt=\"\" class=\"wp-image-1614\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-39.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-39-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-39-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown information was unavailable for 5950X, so below is Intel. It looks like it goes through some phases and is generally higher on retirement with some backend memory stalls.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-71.png\" alt=\"\" class=\"wp-image-682\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-71.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-71-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-71-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics. The way things crashed suggests we didn&#8217;t get an on-cpu metric.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              2107.596\non_cpu               0.000          # 0.00 \/ 16 cores\nutime                1.093\nstime                0.909\nnvcsw                3372           # 89.14%\nnivcsw               411            # 10.86%\ninblock              0              # 0.00\/sec\nonblock              33792          # 16.03\/sec\ncpu-clock            16785304831788 # 16785.305 seconds\ntask-clock           16785348796091 # 16785.349 seconds\npage faults          2128677        # 126.818\/sec\ncontext switches     48966          # 2.917\/sec\ncpu migrations       14815          # 0.883\/sec\nmajor page faults    239            # 0.014\/sec\nminor page faults    2128438        # 126.803\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             33479246026435 # 148.312 branches per 1000 inst\nbranch misses        127458036324   # 0.38% branch miss\nconditional          24030811610464 # 106.455 conditional branches per 1000 inst\nindirect             2579309668769  # 11.426 indirect branches per 1000 inst\ncpu-cycles           68037069124892 # 2.04 GHz\ninstructions         220813779609057 # 3.25 IPC\nslots                136078686006336 #\nretiring             73750537214329 # 54.2% (54.2%)\n-- ucode             73060690357    #     0.1%\n-- fastpath          73677476523972 #    54.1%\nfrontend             8061338973687  #  5.9% ( 5.9%)\n-- latency           3680832821736  #     2.7%\n-- bandwidth         4380506151951  #     3.2%\nbackend              51412720610197 # 37.8% (37.8%)\n-- cpu               11216925558536 #     8.2%\n-- memory            40195795051661 #    29.5%\nspeculation          2804155947285  #  2.1% ( 2.1%)\n-- branch mispredict 2735326057178  #     2.0%\n-- pipeline restart  68829890107    #     0.1%\nsmt-contention       49897448637    #  0.0% ( 0.0%)\ncpu-cycles           69546700318571 # 2.05 GHz\ninstructions         226896874160851 # 3.26 IPC\ninstructions         75649382586142 # 18.173 l2 access per 1000 inst\nl2 hit from l1       874618649578   # 11.94% l2 miss\nl2 miss from l1      61119177290    #\nl2 hit from l2 pf    397180972301   #\nl3 hit from l2 pf    77188601452    #\nl3 miss from l2 pf   25794341549    #\ninstructions         75638056388884 # 127.044 float per 1000 inst\nfloat 512            62             # 0.000 AVX-512 per 1000 inst\nfloat 256            428            # 0.000 AVX-256 per 1000 inst\nfloat 128            9609391251278  # 127.044 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         6              # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics which should be more reliable. This shows it running on all cores without hyperthreading. It is otherwise a high-IPC code for both AMD and Intel processors.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              10630.268\non_cpu               0.749          # 11.99 \/ 16 cores\nutime                127385.070\nstime                67.039\nnvcsw                31742          # 17.85%\nnivcsw               146085         # 82.15%\ninblock              772168         # 72.64\/sec\nonblock              2151752        # 202.42\/sec\ncpu-clock            127453492566651 # 127453.493 seconds\ntask-clock           127453630675965 # 127453.631 seconds\npage faults          3561720        # 27.945\/sec\ncontext switches     230770         # 1.811\/sec\ncpu migrations       55849          # 0.438\/sec\nmajor page faults    2836           # 0.022\/sec\nminor page faults    3558884        # 27.923\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             135096634141195 # 144.395 branches per 1000 inst\nbranch misses        757304758313   # 0.56% branch miss\nconditional          135096634154123 # 144.395 conditional branches per 1000 ins\nt\nindirect             48506429686542 # 51.845 indirect branches per 1000 inst\nslots                1029829342912016 #\nretiring             618366506443086 # 60.0% (60.0%)\n-- ucode             31525787099198 #     3.1%\n-- fastpath          586840719343888 #    57.0%\nfrontend             137399065336360 # 13.3% (13.3%)\n-- latency           29711672373497 #     2.9%\n-- bandwidth         107687392962863 #    10.5%\nbackend              227323923911572 # 22.1% (22.1%)\n-- cpu               82567509824558 #     8.0%\n-- memory            144756414087014 #    14.1%\nspeculation          39269377439642 #  3.8% ( 3.8%)\n-- branch mispredict 36835817310632 #     3.6%\n-- pipeline restart  2433560129010  #     0.2%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           515977732254096 # 2.19 GHz\ninstructions         2164504183925904 # 4.19 IPC\nl2 access            5524495591811  # 7.655 l2 access per 1000 inst\nl2 miss              1488213950053  # 26.94% l2 miss\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>nwchem is a computational chemistry package. It does not run successfully on AMD, running for ~2000 seconds before giving an error: I couldn&#8217;t find libelf.so.0 anywhere. I tried putting i a link to libelf.so.1 and then got a different error <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/nwchem\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-680","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/680","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=680"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/680\/revisions"}],"predecessor-version":[{"id":1616,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/680\/revisions\/1616"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=680"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}