{"id":188,"date":"2024-01-03T19:01:48","date_gmt":"2024-01-03T19:01:48","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=188"},"modified":"2024-01-03T19:02:55","modified_gmt":"2024-01-03T19:02:55","slug":"namd","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/namd\/","title":{"rendered":"namd"},"content":{"rendered":"\n<p>Benchmark workload of molecular dynamics shows approximately 50% of the slots are retiring.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-7.png\" alt=\"\" class=\"wp-image-189\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-7.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-7-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-7-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics  shows floating point intensive (2\/3 of instructions are 128-bit floating point) without many branches and a small number of L2 access\/miss. This results in slightly higher IPC and many instructions retired.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              329.070\non_cpu               0.952          # 15.24 \/ 16 cores\nutime                4502.819\nstime                511.863\nnvcsw                163532         # 76.29%\nnivcsw               50827          # 23.71%\ninblock              252136         # 766.21\/sec\nonblock              97000          # 294.77\/sec\ncpu-clock            5014981624535  # 5014.982 seconds\ntask-clock           5015066062349  # 5015.066 seconds\npage faults          1061669        # 211.696\/sec\ncontext switches     215829         # 43.036\/sec\ncpu migrations       361            # 0.072\/sec\nmajor page faults    357            # 0.071\/sec\nminor page faults    1061312        # 211.625\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             969457463749   # 27.144 branches per 1000 inst\nbranch misses        96604976909    # 9.96% branch miss\nconditional          598626875725   # 16.761 conditional branches per 1000 inst\nindirect             17759496963    # 0.497 indirect branches per 1000 inst\ncpu-cycles           17843610263885 # 3.86 GHz\ninstructions         33685765494786 # 1.89 IPC\nslots                35676676825020 #\nretiring             11466401474400 # 32.1% (51.7%)\n-- ucode             3261961494     #     0.0%\n-- fastpath          11463139512906 #    32.1%\nfrontend             2110839683999  #  5.9% ( 9.5%)\n-- latency           1773994098696  #     5.0%\n-- bandwidth         336845585303   #     0.9%\nbackend              8262319364493  # 23.2% (37.3%)\n-- cpu               4845859855349  #    13.6%\n-- memory            3416459509144  #     9.6%\nspeculation          326075144679   #  0.9% ( 1.5%)\n-- branch mispredict 322374538533   #     0.9%\n-- pipeline restart  3700606146     #     0.0%\nsmt-contention       13511001433399 # 37.9% ( 0.0%)\ncpu-cycles           17808305476130 # 3.85 GHz\ninstructions         33671204786751 # 1.89 IPC\ninstructions         11222058246597 # 29.363 l2 access per 1000 inst\nl2 hit from l1       216808193838   # 3.37% l2 miss\nl2 miss from l1      3534469784     #\nl2 hit from l2 pf    105145731432   #\nl3 hit from l2 pf    1270687312     #\nl3 miss from l2 pf   6292933832     #\ninstructions         11221325446035 # 670.180 float per 1000 inst\nfloat 512            67             # 0.000 AVX-512 per 1000 inst\nfloat 256            976            # 0.000 AVX-256 per 1000 inst\nfloat 128            7520307722746  # 670.180 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         1008           # 0.000 scalar per 1000 inst<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              354.788\non_cpu               0.958          # 15.32 \/ 16 cores\nutime                4737.499\nstime                699.276\nnvcsw                171768         # 79.34%\nnivcsw               44733          # 20.66%\ninblock              252000         # 710.28\/sec\nonblock              96864          # 273.02\/sec\ncpu-clock            5436887057539  # 5436.887 seconds\ntask-clock           5436946057116  # 5436.946 seconds\npage faults          1078757        # 198.412\/sec\ncontext switches     218112         # 40.117\/sec\ncpu migrations       392            # 0.072\/sec\nmajor page faults    322            # 0.059\/sec\nminor page faults    1078434        # 198.353\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             1210384718995  # 44.673 branches per 1000 inst\nbranch misses        23841288511    # 1.97% branch miss\nconditional          1210384729075  # 44.673 conditional branches per 1000 inst\nindirect             335853367292   # 12.396 indirect branches per 1000 inst\nslots                21038828152550 #\nretiring             13684220792315 # 65.0% (65.0%)\n-- ucode             2099158012732  #    10.0%\n-- fastpath          11585062779583 #    55.1%\nfrontend             4772928176556  # 22.7% (22.7%)\n-- latency           3153746942703  #    15.0%\n-- bandwidth         1619181233853  #     7.7%\nbackend              2168742626552  # 10.3% (10.3%)\n-- cpu               1685615965776  #     8.0%\n-- memory            483126660776   #     2.3%\nspeculation          713253632021   #  3.4% ( 3.4%)\n-- branch mispredict 653289250107   #     3.1%\n-- pipeline restart  59964381914    #     0.3%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           7016247557601  # 1.38 GHz\ninstructions         13494156109832 # 1.92 IPC\nl2 access            218890285131   # 16.221 l2 access per 1000 inst\nl2 miss              32849187777    # 15.01% l2 miss<\/code><\/pre>\n\n\n\n<p>Summary information about the processes where most all time is spent in the namd process.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>333 processes\n\t 51 namd2                71554.11  2492.20\n\t 38 vulkaninfo               0.57     1.15\n\t  6 glxinfo:gdrv0            0.09     0.09\n\t  4 vulkani:disk$0           0.06     0.13\n\t  2 glxinfo                  0.06     0.03\n\t  2 glxinfo:cs0              0.06     0.03\n\t  6 php                      0.05     0.07\n\t  2 glxinfo:disk$0           0.05     0.03\n\t  2 glxinfo:sh0              0.05     0.03\n\t  2 glxinfo:shlo0            0.05     0.03\n\t  6 clang                    0.04     0.02\n\t  2 llvmpipe-0               0.03     0.07\n\t  2 llvmpipe-1               0.03     0.07\n\t  2 llvmpipe-10              0.03     0.07\n\t  2 llvmpipe-11              0.03     0.07\n\t  2 llvmpipe-12              0.03     0.07\n\t  2 llvmpipe-13              0.03     0.07\n\t  2 llvmpipe-14              0.03     0.07\n\t  2 llvmpipe-15              0.03     0.07\n\t  2 llvmpipe-2               0.03     0.07\n\t  2 llvmpipe-3               0.03     0.07\n\t  2 llvmpipe-4               0.03     0.07\n\t  2 llvmpipe-5               0.03     0.07\n\t  2 llvmpipe-6               0.03     0.07\n\t  2 llvmpipe-7               0.03     0.07\n\t  2 llvmpipe-8               0.03     0.07\n\t  2 llvmpipe-9               0.03     0.07\n\t  1 lspci                    0.01     0.03\n\t 85 sh                       0.00     0.00\n\t 12 gcc                      0.00     0.00\n\t  9 stty                     0.00     0.00\n\t  8 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 gmain                    0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  3 dconf worker             0.00     0.00\n\t  3 namd                     0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 cc                       0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 ps                       0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes<\/code><\/pre>\n\n\n\n<p>We run one process per core in the computation blocks<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      499497) namd start=4.84  finish=95.80\n        499498) namd2 start=4.84  finish=95.73\n          499499) namd2 start=4.85  finish=95.73\n          499500) namd2 start=4.85  finish=95.73\n          499501) namd2 start=4.85  finish=95.73\n          499502) namd2 start=4.85  finish=95.73\n          499503) namd2 start=4.85  finish=95.73\n          499504) namd2 start=4.85  finish=95.73\n          499505) namd2 start=4.85  finish=95.73\n          499506) namd2 start=4.85  finish=95.73\n          499507) namd2 start=4.85  finish=95.73\n          499508) namd2 start=4.85  finish=95.73\n          499509) namd2 start=4.85  finish=95.73\n          499510) namd2 start=4.85  finish=95.73\n          499511) namd2 start=4.85  finish=95.73\n          499512) namd2 start=4.85  finish=95.73\n          499513) namd2 start=4.85  finish=95.73\n          499514) namd2 start=4.86  finish=95.73<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Benchmark workload of molecular dynamics shows approximately 50% of the slots are retiring. AMD metrics shows floating point intensive (2\/3 of instructions are 128-bit floating point) without many branches and a small number of L2 access\/miss. This results in slightly <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/namd\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-188","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/188","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=188"}],"version-history":[{"count":1,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/188\/revisions"}],"predecessor-version":[{"id":190,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/188\/revisions\/190"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=188"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}