{"id":2116,"date":"2024-03-20T12:13:21","date_gmt":"2024-03-20T12:13:21","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=2116"},"modified":"2024-03-21T12:28:57","modified_gmt":"2024-03-21T12:28:57","slug":"gpaw","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/gpaw\/","title":{"rendered":"gpaw"},"content":{"rendered":"\n<p>Density functional theory (DFT) Python code using the projector-augmented wave (PAW) method for an atomic simulation. There is one workload. This runs on half the threads, one per non-hyperthreaded core.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-34.png\" alt=\"\" class=\"wp-image-2131\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-34.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-34-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-34-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows higher level of backend stalls with a retirement rate of ~25%<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-36.png\" alt=\"\" class=\"wp-image-2133\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-36.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-36-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-36-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show moderate amount of floating point code and some L2 misses.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              879.790\non_cpu               0.489          # 7.83 \/ 16 cores\nutime                6817.162\nstime                73.081\nnvcsw                36307          # 68.18%\nnivcsw               16948          # 31.82%\ninblock              1288           # 1.46\/sec\nonblock              63168          # 71.80\/sec\ncpu-clock            6890744986177  # 6890.745 seconds\ntask-clock           6890812005155  # 6890.812 seconds\npage faults          2689926        # 390.364\/sec\ncontext switches     57455          # 8.338\/sec\ncpu migrations       10220          # 1.483\/sec\nmajor page faults    454            # 0.066\/sec\nminor page faults    2689472        # 390.298\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             5067382619636  # 88.637 branches per 1000 inst\nbranch misses        21956030233    # 0.43% branch miss\nconditional          4570353532552  # 79.943 conditional branches per 1000 inst\nindirect             96880742006    # 1.695 indirect branches per 1000 inst\ncpu-cycles           28645796778898 # 2.03 GHz\ninstructions         57307273149574 # 2.00 IPC\nslots                57301193688450 #\nretiring             18074275342804 # 31.5% (31.6%)\n-- ucode             4569462167     #     0.0%\n-- fastpath          18069705880637 #    31.5%\nfrontend             5577092011479  #  9.7% ( 9.7%)\n-- latency           1689665033616  #     2.9%\n-- bandwidth         3887426977863  #     6.8%\nbackend              33176548791633 # 57.9% (57.9%)\n-- cpu               7819262233119  #    13.6%\n-- memory            25357286558514 #    44.3%\nspeculation          450755148897   #  0.8% ( 0.8%) low\n-- branch mispredict 427854179817   #     0.7%\n-- pipeline restart  22900969080    #     0.0%\nsmt-contention       22505648420    #  0.0% ( 0.0%)\ncpu-cycles           28658532272169 # 2.03 GHz\ninstructions         57400761976266 # 2.00 IPC\ninstructions         19129140187890 # 41.908 l2 access per 1000 inst\nl2 hit from l1       455236816481   # 21.61% l2 miss\nl2 miss from l1      27900360554    #\nl2 hit from l2 pf    201075435807   #\nl3 hit from l2 pf    67459643835    #\nl3 miss from l2 pf   77893717347    #\ninstructions         19136457221452 # 86.698 float per 1000 inst\nfloat 512            58             # 0.000 AVX-512 per 1000 inst\nfloat 256            7550759        # 0.000 AVX-256 per 1000 inst\nfloat 128            1659094467361  # 86.698 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         290            # 0.000 scalar per 1000 inst\ninstructions         57209474789614 #\nopcache              8398454124632  # 146.802 opcache per 1000 inst\nopcache miss         95647728925    #  1.1% opcache miss rate\nl1 dTLB miss         27140697084    # 0.474 L1 dTLB per 1000 inst\nl2 dTLB miss         675797055      # 0.012 L2 dTLB per 1000 inst\ninstructions         56987401832589 #\nicache               152362894146   # 2.674 icache per 1000 inst\nicache miss          40512482340    # 26.6% icache miss rate\nl1 iTLB miss         2868021287     # 0.050 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            91117          # 0.000 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics show some L2 stalls as well at dram<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              1652.916\non_cpu               0.742          # 11.87 \/ 16 cores\nutime                19511.614\nstime                104.588\nnvcsw                62814          # 61.27%\nnivcsw               39708          # 38.73%\ninblock              57208          # 34.61\/sec\nonblock              51880          # 31.39\/sec\ncpu-clock            19616663802704 # 19616.664 seconds\ntask-clock           19616720866028 # 19616.721 seconds\npage faults          2343849        # 119.482\/sec\ncontext switches     110562         # 5.636\/sec\ncpu migrations       12467          # 0.636\/sec\nmajor page faults    1055           # 0.054\/sec\nminor page faults    2342794        # 119.428\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             12046494816063 # 85.081 branches per 1000 inst\nbranch misses        35635764744    # 0.30% branch miss\nconditional          12046494831071 # 85.081 conditional branches per 1000 inst\nindirect             2827703316046  # 19.971 indirect branches per 1000 inst\nslots                116572969610660 #\nretiring             72209870930905 # 61.9% (61.9%) high\n-- ucode             3323646371248  #     2.9%\n-- fastpath          68886224559657 #    59.1%\nfrontend             8583540422122  #  7.4% ( 7.4%)\n-- latency           3414647768949  #     2.9%\n-- bandwidth         5168892653173  #     4.4%\nbackend              29797674212233 # 25.6% (25.6%)\n-- cpu               13703383213043 #    11.8%\n-- memory            16094290999190 #    13.8%\nspeculation          6757381760784  #  5.8% ( 5.8%)\n-- branch mispredict 6310528243422  #     5.4%\n-- pipeline restart  446853517362   #     0.4%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           56388787765085 # 2.13 GHz\ninstructions         214541993236726 # 3.80 IPC high\nl2 access            955320349641   # 12.984 l2 access per 1000 inst\nl2 miss              180229145145   # 18.87% l2 miss\ncpu-cycles           19388196618177 # 19.7% memory latency\nload stalls          3472088794438  #  0.0% l1 bound\nl1 miss              4326972554796  # 11.4% l2 bound\nl2 miss              2116470705514  #  2.3% l3 bound\nl3 miss              1675077840237  #  8.6% dram bound\nstore_stalls         345715962901   #  1.8% store bound\n<\/code><\/pre>\n\n\n\n<p>Process overview shows this as python code run under MPI<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>433 processes\n\t 73 python3              20507.82   206.30\n\t 68 clinfo                  16.52     6.01\n\t 38 vulkaninfo               1.49     1.03\n\t 18 mpirun                   1.05     2.22\n\t  4 vulkani:disk$0           0.15     0.11\n\t  2 llvmpipe-0               0.08     0.06\n\t  2 llvmpipe-1               0.08     0.06\n\t  2 llvmpipe-10              0.08     0.06\n\t  2 llvmpipe-11              0.08     0.06\n\t  2 llvmpipe-12              0.08     0.06\n\t  2 llvmpipe-13              0.08     0.06\n\t  2 llvmpipe-14              0.08     0.06\n\t  2 llvmpipe-15              0.08     0.06\n\t  2 llvmpipe-2               0.08     0.06\n\t  2 llvmpipe-3               0.08     0.06\n\t  2 llvmpipe-4               0.08     0.06\n\t  2 llvmpipe-5               0.08     0.06\n\t  2 llvmpipe-6               0.08     0.06\n\t  2 llvmpipe-7               0.08     0.06\n\t  2 llvmpipe-8               0.08     0.06\n\t  2 llvmpipe-9               0.08     0.06\n\t  6 php                      0.06     0.56\n\t  6 clang                    0.05     0.07\n\t  3 rocminfo                 0.03     0.00\n\t  1 lspci                    0.00     0.02\n\t  1 ps                       0.00     0.01\n\t 85 sh                       0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t  9 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  4 glxinfo                  0.00     0.00\n\t  4 gmain                    0.00     0.00\n\t  3 cat                      0.00     0.00\n\t  3 dconf worker             0.00     0.00\n\t  3 gpaw                     0.00     0.00\n\t  3 rm                       0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 setterm                  0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 python                   0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Computation blocks<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      159587) gpaw             cpu=4 start=297.53 finish=584.66\n        159588) rm               cpu=5 start=297.53 finish=297.53\n        159589) mpirun           cpu=8 start=297.54 finish=584.63\n          159595) mpirun           cpu=4 start=297.77 finish=584.63\n          159596) mpirun           cpu=9 start=297.77 finish=297.77\n          159597) mpirun           cpu=0 start=297.79 finish=584.63\n          159599) mpirun           cpu=15 start=297.90 finish=584.63\n          159600) mpirun           cpu=9 start=297.90 finish=584.63\n          159601) python3          cpu=10 start=297.93 finish=584.60\n            159609) python3          cpu=7 start=298.02 finish=584.60\n            159612) python3          cpu=6 start=298.03 finish=584.60\n          159602) python3          cpu=6 start=297.93 finish=584.60\n            159610) python3          cpu=15 start=298.03 finish=584.60\n            159613) python3          cpu=7 start=298.03 finish=584.60\n          159603) python3          cpu=1 start=297.94 finish=584.60\n            159611) python3          cpu=15 start=298.03 finish=584.60\n            159614) python3          cpu=11 start=298.03 finish=584.60\n          159604) python3          cpu=5 start=297.94 finish=584.60\n            159615) python3          cpu=13 start=298.04 finish=584.60\n            159616) python3          cpu=8 start=298.04 finish=584.60\n          159605) python3          cpu=12 start=297.95 finish=584.60\n            159617) python3          cpu=0 start=298.04 finish=584.60\n            159619) python3          cpu=4 start=298.05 finish=584.60\n          159606) python3          cpu=2 start=297.95 finish=584.60\n            159618) python3          cpu=8 start=298.05 finish=584.60\n            159620) python3          cpu=13 start=298.05 finish=584.60\n          159607) python3          cpu=3 start=297.96 finish=584.60\n            159621) python3          cpu=9 start=298.05 finish=584.60\n            159622) python3          cpu=9 start=298.06 finish=584.60\n          159608) python3          cpu=4 start=297.96 finish=584.60\n            159623) python3          cpu=7 start=298.06 finish=584.60\n            159624) python3          cpu=15 start=298.06 finish=584.60\n        159629) cat              cpu=5 start=584.65 finish=584.66\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Density functional theory (DFT) Python code using the projector-augmented wave (PAW) method for an atomic simulation. There is one workload. This runs on half the threads, one per non-hyperthreaded core. Topdown profile shows higher level of backend stalls with a <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/gpaw\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2116","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2116","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=2116"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2116\/revisions"}],"predecessor-version":[{"id":2134,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2116\/revisions\/2134"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=2116"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}