{"id":1736,"date":"2024-02-11T14:51:31","date_gmt":"2024-02-11T14:51:31","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=1736"},"modified":"2024-03-02T00:58:15","modified_gmt":"2024-03-02T00:58:15","slug":"cp2k","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/cp2k\/","title":{"rendered":"cp2k"},"content":{"rendered":"\n<p>cp2k is a molecular dynamics package for quantum chemistry and solid state physics.  There are three workloads but the second one fails. These run with MPI and appear to run only on one thread per hyper-threaded core.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-54.png\" alt=\"\" class=\"wp-image-1743\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-54.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-54-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-54-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows slightly different profiles for workloads but generally with high amounts of backend stalls and moderate retirement rate.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-56.png\" alt=\"\" class=\"wp-image-1745\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-56.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-56-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-56-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show floating point code and low level of L2 cache access.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              314.596\non_cpu               0.403          # 6.44 \/ 16 cores\nutime                1950.571\nstime                76.618\nnvcsw                37942          # 85.96%\nnivcsw               6195           # 14.04%\ninblock              1393752        # 4430.29\/sec\nonblock              4130936        # 13130.91\/sec\ncpu-clock            2250486500161  # 2250.487 seconds\ntask-clock           2250512015123  # 2250.512 seconds\npage faults          17296215       # 7685.458\/sec\ncontext switches     59023          # 26.226\/sec\ncpu migrations       3878           # 1.723\/sec\nmajor page faults    16478          # 7.322\/sec\nminor page faults    17279723       # 7678.130\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             2797017555498  # 118.729 branches per 1000 inst\nbranch misses        16332501305    # 0.58% branch miss\nconditional          2093263099042  # 88.856 conditional branches per 1000 inst\nindirect             103483474302   # 4.393 indirect branches per 1000 inst\ncpu-cycles           9417383943085  # 1.86 GHz\ninstructions         24068965945112 # 2.56 IPC\nslots                18839421656568 #\nretiring             8130515809241  # 43.2% (43.2%)\n-- ucode             18296888116    #     0.1%\n-- fastpath          8112218921125  #    43.1%\nfrontend             1702876357108  #  9.0% ( 9.1%)\n-- latency           812901365928   #     4.3%\n-- bandwidth         889974991180   #     4.7%\nbackend              8497846096167  # 45.1% (45.2%)\n-- cpu               2735878442045  #    14.5%\n-- memory            5761967654122  #    30.6%\nspeculation          483239296612   #  2.6% ( 2.6%)\n-- branch mispredict 466874614374   #     2.5%\n-- pipeline restart  16364682238    #     0.1%\nsmt-contention       24935254612    #  0.1% ( 0.0%)\ncpu-cycles           9414659187497  # 1.86 GHz\ninstructions         23998277195786 # 2.55 IPC\ninstructions         7994669047541  # 14.114 l2 access per 1000 inst\nl2 hit from l1       85551514714    # 17.78% l2 miss\nl2 miss from l1      8305630626     #\nl2 hit from l2 pf    15533727367    #\nl3 hit from l2 pf    4669322426     #\nl3 miss from l2 pf   7084179657     #\ninstructions         7998409850474  # 230.289 float per 1000 inst\nfloat 512            92             # 0.000 AVX-512 per 1000 inst\nfloat 256            2409186741     # 0.301 AVX-256 per 1000 inst\nfloat 128            1839534443120  # 229.988 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\ninstructions         23840085961713 #\nopcache              3733895544700  # 156.623 opcache per 1000 inst\nopcache miss         139454228115   #  3.7% opcache miss rate\nl1 dTLB miss         18022412303    # 0.756 L1 dTLB per 1000 inst\nl2 dTLB miss         1346924796     # 0.056 L2 dTLB per 1000 inst\ninstructions         23808482127795 #\nicache               215006604128   # 9.031 icache per 1000 inst\nicache miss          35602086813    # 16.6% icache miss rate\nl1 iTLB miss         1346273170     # 0.057 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            509622         # 0.000 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Process overview shows cp2k.popt as the primary wworking process<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>329 processes\n\t 72 cp2k.popt             6240.32   291.46\n\t 38 vulkaninfo               0.95     1.52\n\t 12 mpirun                   0.49     3.61\n\t  6 glxinfo:gdrv0            0.15     0.16\n\t  4 vulkani:disk$0           0.10     0.16\n\t  2 glxinfo                  0.08     0.06\n\t  2 glxinfo:cs0              0.08     0.06\n\t  2 glxinfo:disk$0           0.08     0.06\n\t  2 glxinfo:shlo0            0.08     0.06\n\t  2 glxinfo:sh0              0.07     0.06\n\t  6 php                      0.06     3.52\n\t  2 llvmpipe-0               0.05     0.08\n\t  2 llvmpipe-1               0.05     0.08\n\t  2 llvmpipe-10              0.05     0.08\n\t  2 llvmpipe-11              0.05     0.08\n\t  2 llvmpipe-12              0.05     0.08\n\t  2 llvmpipe-13              0.05     0.08\n\t  2 llvmpipe-14              0.05     0.08\n\t  2 llvmpipe-15              0.05     0.08\n\t  2 llvmpipe-2               0.05     0.08\n\t  2 llvmpipe-3               0.05     0.08\n\t  2 llvmpipe-4               0.05     0.08\n\t  2 llvmpipe-5               0.05     0.08\n\t  2 llvmpipe-6               0.05     0.08\n\t  2 llvmpipe-7               0.05     0.08\n\t  2 llvmpipe-8               0.05     0.08\n\t  2 llvmpipe-9               0.05     0.08\n\t  1 lspci                    0.00     0.02\n\t  1 ps                       0.00     0.01\n\t 65 sh                       0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t  9 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  5 gmain                    0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  3 cp2k                     0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 dconf worker             0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 python3                  0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Computation blocks look as follows<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      461986) cp2k             cpu=5 start=5.29  finish=102.53\n        461987) mpirun           cpu=13 start=5.29  finish=102.53\n          461988) mpirun           cpu=8 start=5.84  finish=102.53\n          461989) mpirun           cpu=7 start=6.33  finish=102.53\n          461990) mpirun           cpu=10 start=6.33  finish=102.53\n          461991) cp2k.popt        cpu=6 start=6.33  finish=102.50\n            461996) cp2k.popt        cpu=0 start=6.35  finish=102.50\n            462000) cp2k.popt        cpu=7 start=6.36  finish=102.50\n          461992) cp2k.popt        cpu=1 start=6.34  finish=102.50\n            461998) cp2k.popt        cpu=4 start=6.35  finish=102.50\n            462002) cp2k.popt        cpu=4 start=6.36  finish=102.50\n          461993) cp2k.popt        cpu=12 start=6.34  finish=102.50\n            462001) cp2k.popt        cpu=5 start=6.36  finish=102.50\n            462005) cp2k.popt        cpu=0 start=6.36  finish=102.50\n          461994) cp2k.popt        cpu=11 start=6.34  finish=102.50\n            462004) cp2k.popt        cpu=3 start=6.36  finish=102.50\n            462007) cp2k.popt        cpu=9 start=6.37  finish=102.50\n          461995) cp2k.popt        cpu=2 start=6.35  finish=102.50\n            462006) cp2k.popt        cpu=12 start=6.37  finish=102.50\n            462009) cp2k.popt        cpu=8 start=6.37  finish=102.50\n          461997) cp2k.popt        cpu=14 start=6.35  finish=102.50\n            462008) cp2k.popt        cpu=8 start=6.37  finish=102.50\n            462011) cp2k.popt        cpu=1 start=6.37  finish=102.50\n          461999) cp2k.popt        cpu=5 start=6.36  finish=102.50\n            462010) cp2k.popt        cpu=0 start=6.37  finish=102.50\n            462013) cp2k.popt        cpu=10 start=6.38  finish=102.49\n          462003) cp2k.popt        cpu=15 start=6.36  finish=102.50\n            462012) cp2k.popt        cpu=8 start=6.38  finish=102.50\n            462014) cp2k.popt        cpu=3 start=6.38  finish=102.49\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>cp2k is a molecular dynamics package for quantum chemistry and solid state physics. There are three workloads but the second one fails. These run with MPI and appear to run only on one thread per hyper-threaded core. Topdown profile shows <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/cp2k\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-1736","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1736","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=1736"}],"version-history":[{"count":3,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1736\/revisions"}],"predecessor-version":[{"id":1893,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1736\/revisions\/1893"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=1736"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}