{"id":2312,"date":"2024-06-02T19:25:10","date_gmt":"2024-06-02T19:25:10","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=2312"},"modified":"2024-06-02T23:32:10","modified_gmt":"2024-06-02T23:32:10","slug":"508-namd_r","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/cpu2017\/508-namd_r\/","title":{"rendered":"508.namd_r"},"content":{"rendered":"\n<p>bwaves is a SPEC CPU(R) benchmark described&nbsp;<a href=\"https:\/\/www.spec.org\/cpu2017\/Docs\/benchmarks\/508.namd_r.html\">here<\/a>. This C++ workload runs consistently on all logical cores.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-7.png\" alt=\"\" class=\"wp-image-2327\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-7.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-7-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-7-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows a high retirement rate with some backend stalls.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-8.png\" alt=\"\" class=\"wp-image-2328\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-8.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-8-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-8-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show backend stalls are more cpu stalls than memory stalls. While there are ~60 L2 access per 1000 instructions, the L2 miss rate is low. The opcache has a very low miss rate.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              705.983\non_cpu               0.988          # 15.81 \/ 16 cores\nutime                11155.218\nstime                8.698\nnvcsw                16557          # 12.97%\nnivcsw               111119         # 87.03%\ninblock              0              # 0.00\/sec\nonblock              31776          # 45.01\/sec\ncpu-clock            11164522824524 # 11164.523 seconds\ntask-clock           11164605623683 # 11164.606 seconds\npage faults          2678594        # 239.918\/sec\ncontext switches     127114         # 11.385\/sec\ncpu migrations       164            # 0.015\/sec\nmajor page faults    1011           # 0.091\/sec\nminor page faults    2677583        # 239.828\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             2064334682529  # 26.587 branches per 1000 inst\nbranch misses        90764635496    # 4.40% branch miss\nconditional          1883524841799  # 24.258 conditional branches per 1000 inst\nindirect             1615455461     # 0.021 indirect branches per 1000 inst\ncpu-cycles           42377952252120 # 3.74 GHz\ninstructions         77659634049358 # 1.83 IPC\nslots                84735322972980 #\nretiring             26508259685731 # 31.3% (51.3%)\n-- ucode             499617852      #     0.0%\n-- fastpath          26507760067879 #    31.3%\nfrontend             2468780097885  #  2.9% ( 4.8%) low\n-- latency           1830776219160  #     2.2%\n-- bandwidth         638003878725   #     0.8%\nbackend              21111048379065 # 24.9% (40.9%)\n-- cpu               14580511723887 #    17.2%\n-- memory            6530536655178  #     7.7%\nspeculation          1549961155223  #  1.8% ( 3.0%)\n-- branch mispredict 1535782887883  #     1.8%\n-- pipeline restart  14178267340    #     0.0%\nsmt-contention       33097226855820 # 39.1% ( 0.0%)\ncpu-cycles           42363963062501 # 3.74 GHz\ninstructions         77682975707638 # 1.83 IPC\ninstructions         25886326183295 # 63.010 l2 access per 1000 inst\nl2 hit from l1       1142360918883  # 1.35% l2 miss\nl2 miss from l1      4871222345     #\nl2 hit from l2 pf    471669934470   #\nl3 hit from l2 pf    3651612350     #\nl3 miss from l2 pf   13423911862    #\ninstructions         25865520397898 # 395.622 float per 1000 inst\nfloat 512            137            # 0.000 AVX-512 per 1000 inst\nfloat 256            16868          # 0.000 AVX-256 per 1000 inst\nfloat 128            10232976318229 # 395.622 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\ninstructions         77643808589000 #\nopcache              7329400605488  # 94.398 opcache per 1000 inst\nopcache miss         17462126951    #  0.2% opcache miss rate\nl1 dTLB miss         10400596229    # 0.134 L1 dTLB per 1000 inst\nl2 dTLB miss         757260290      # 0.010 L2 dTLB per 1000 inst\ninstructions         77643844021012 #\nicache               28943916756    # 0.373 icache per 1000 inst\nicache miss          5435095843     # 18.8% icache miss rate\nl1 iTLB miss         238400540      # 0.003 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            78639          # 0.000 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>The process overviews shows almost all time spent in namd_r_base.mev<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>579 processes\n\t 48 namd_r_base.mev      11114.35     4.70\n\t 69 specperl                12.33     1.47\n\t  1 clang++                  0.01     0.00\n\t  1 lsb_release              0.01     0.00\n\t 10 ps                       0.00     0.01\n\t172 sh                       0.00     0.00\n\t 54 specrxp                  0.00     0.00\n\t 48 bash                     0.00     0.00\n\t 41 specinvoke               0.00     0.00\n\t 21 grep                     0.00     0.00\n\t 20 cat                      0.00     0.00\n\t 12 uniq                     0.00     0.00\n\t 11 sort                     0.00     0.00\n\t 10 expand                   0.00     0.00\n\t  6 pwd                      0.00     0.00\n\t  5 basename                 0.00     0.00\n\t  5 specmake                 0.00     0.00\n\t  5 systemctl                0.00     0.00\n\t  4 specpp                   0.00     0.00\n\t  4 uname                    0.00     0.00\n\t  3 dirname                  0.00     0.00\n\t  3 dmidecode                0.00     0.00\n\t  3 lscpu                    0.00     0.00\n\t  2 df                       0.00     0.00\n\t  2 dpkg                     0.00     0.00\n\t  2 rm                       0.00     0.00\n\t  2 runcpu                   0.00     0.00\n\t  2 specsha512sum            0.00     0.00\n\t  2 specxz                   0.00     0.00\n\t  2 who                      0.00     0.00\n\t  1 cpupower                 0.00     0.00\n\t  1 head                     0.00     0.00\n\t  1 logname                  0.00     0.00\n\t  1 ls                       0.00     0.00\n\t  1 numactl                  0.00     0.00\n\t  1 sysctl                   0.00     0.00\n\t  1 w                        0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 which                    0.00     0.00\n0 processes running\n53 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Specinvoke fires up separate processes for each core.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>    377379) specinvoke       cpu=14 start=3.25  finish=235.52\n      377381) sh               cpu=4 start=3.25  finish=235.10\n        377387) bash             cpu=0 start=3.25  finish=235.10\n          377412) namd_r_base.mev  cpu=0 start=3.26  finish=235.09\n      377382) sh               cpu=1 start=3.25  finish=234.94\n        377392) bash             cpu=1 start=3.25  finish=234.94\n          377416) namd_r_base.mev  cpu=1 start=3.26  finish=234.92\n      377383) sh               cpu=10 start=3.25  finish=234.73\n        377389) bash             cpu=2 start=3.25  finish=234.73\n          377411) namd_r_base.mev  cpu=2 start=3.26  finish=234.72\n      377384) sh               cpu=9 start=3.25  finish=235.38\n        377394) bash             cpu=3 start=3.25  finish=235.38\n          377415) namd_r_base.mev  cpu=3 start=3.26  finish=235.37\n      377385) sh               cpu=9 start=3.25  finish=234.94\n        377395) bash             cpu=4 start=3.25  finish=234.94\n          377418) namd_r_base.mev  cpu=4 start=3.26  finish=234.92\n      377386) sh               cpu=9 start=3.25  finish=235.37\n        377397) bash             cpu=5 start=3.25  finish=235.37\n          377417) namd_r_base.mev  cpu=5 start=3.26  finish=235.35\n      377388) sh               cpu=4 start=3.25  finish=235.25\n        377396) bash             cpu=6 start=3.25  finish=235.25\n          377420) namd_r_base.mev  cpu=6 start=3.26  finish=235.24\n      377390) sh               cpu=12 start=3.25  finish=235.12\n        377400) bash             cpu=7 start=3.25  finish=235.12\n          377419) namd_r_base.mev  cpu=7 start=3.26  finish=235.11\n      377391) sh               cpu=1 start=3.25  finish=235.02\n        377402) bash             cpu=8 start=3.26  finish=235.01\n          377421) namd_r_base.mev  cpu=8 start=3.26  finish=235.00\n      377393) sh               cpu=9 start=3.25  finish=234.92\n        377404) bash             cpu=9 start=3.26  finish=234.92\n          377422) namd_r_base.mev  cpu=9 start=3.26  finish=234.90\n      377398) sh               cpu=10 start=3.25  finish=234.24\n        377407) bash             cpu=10 start=3.26  finish=234.24\n          377424) namd_r_base.mev  cpu=10 start=3.26  finish=234.22\n      377399) sh               cpu=8 start=3.25  finish=235.36\n        377408) bash             cpu=11 start=3.26  finish=235.36\n          377423) namd_r_base.mev  cpu=11 start=3.26  finish=235.34\n      377401) sh               cpu=4 start=3.26  finish=235.10\n        377409) bash             cpu=12 start=3.26  finish=235.10\n          377425) namd_r_base.mev  cpu=12 start=3.26  finish=235.09\n      377403) sh               cpu=11 start=3.26  finish=235.52\n        377410) bash             cpu=13 start=3.26  finish=235.52\n          377426) namd_r_base.mev  cpu=13 start=3.26  finish=235.51\n      377405) sh               cpu=10 start=3.26  finish=234.90\n        377413) bash             cpu=14 start=3.26  finish=234.90\n          377427) namd_r_base.mev  cpu=14 start=3.26  finish=234.88\n      377406) sh               cpu=15 start=3.26  finish=235.15\n        377414) bash             cpu=15 start=3.26  finish=235.15\n          377428) namd_r_base.mev  cpu=15 start=3.26  finish=235.14\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>bwaves is a SPEC CPU(R) benchmark described&nbsp;here. This C++ workload runs consistently on all logical cores. Topdown profile shows a high retirement rate with some backend stalls. AMD metrics show backend stalls are more cpu stalls than memory stalls. While <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/cpu2017\/508-namd_r\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":2297,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2312","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2312","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=2312"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2312\/revisions"}],"predecessor-version":[{"id":2330,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2312\/revisions\/2330"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2297"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=2312"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}