{"id":2072,"date":"2024-03-08T12:20:09","date_gmt":"2024-03-08T12:20:09","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=2072"},"modified":"2024-03-09T12:23:29","modified_gmt":"2024-03-09T12:23:29","slug":"kripke","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/kripke\/","title":{"rendered":"kripke"},"content":{"rendered":"\n<p>An example of particle transport code. This test fails on my Intel processor because it shows 12 cores and this number does not evenly divide the 192 subdomains.  It runs on AMD with one test reporting throughput.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-30.png\" alt=\"\" class=\"wp-image-2075\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-30.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-30-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-30-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows high backend stalls<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-30.png\" alt=\"\" class=\"wp-image-2077\" style=\"width:1180px;height:auto\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-30.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-30-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-30-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics confirm high levels of backend stalls, low retirement rate. This is floating point code with moderate L2 access. Frontend stalls are low including low opcache misses and icache misses.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              338.975\non_cpu               0.904          # 14.46 \/ 16 cores\nutime                4868.298\nstime                33.381\nnvcsw                38377          # 47.37%\nnivcsw               42643          # 52.63%\ninblock              8              # 0.02\/sec\nonblock              64688          # 190.83\/sec\ncpu-clock            4903440773300  # 4903.441 seconds\ntask-clock           4903610508341  # 4903.611 seconds\npage faults          12944947       # 2639.881\/sec\ncontext switches     82519          # 16.828\/sec\ncpu migrations       1575           # 0.321\/sec\nmajor page faults    247            # 0.050\/sec\nminor page faults    12944700       # 2639.830\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             1570374683161  # 127.305 branches per 1000 inst\nbranch misses        3334562600     # 0.21% branch miss\nconditional          1489339157977  # 120.735 conditional branches per 1000 inst\nindirect             17684445861    # 1.434 indirect branches per 1000 inst\ncpu-cycles           21425296984683 # 3.96 GHz\ninstructions         12365832850477 # 0.58 IPC low\nslots                42847795462728 #\nretiring             3768422249309  #  8.8% ( 9.8%) low\n-- ucode             2331708814     #     0.0%\n-- fastpath          3766090540495  #     8.8%\nfrontend             2271128871110  #  5.3% ( 5.9%)\n-- latency           1030938149040  #     2.4%\n-- bandwidth         1240190722070  #     2.9%\nbackend              32215012757623 # 75.2% (84.1%) high\n-- cpu               3230412385624  #     7.5%\n-- memory            28984600371999 #    67.6%\nspeculation          53466143550    #  0.1% ( 0.1%) low\n-- branch mispredict 42049729003    #     0.1%\n-- pipeline restart  11416414547    #     0.0%\nsmt-contention       4539745528168  # 10.6% ( 0.0%)\ncpu-cycles           21287261506351 # 3.95 GHz\ninstructions         12490617528632 # 0.59 IPC low\ninstructions         4162705943333  # 75.198 l2 access per 1000 inst\nl2 hit from l1       159347998760   # 29.04% l2 miss\nl2 miss from l1      4011291175     #\nl2 hit from l2 pf    66785684621    #\nl3 hit from l2 pf    45675249390    #\nl3 miss from l2 pf   41218623098    #\ninstructions         4160899813361  # 320.435 float per 1000 inst\nfloat 512            85             # 0.000 AVX-512 per 1000 inst\nfloat 256            518            # 0.000 AVX-256 per 1000 inst\nfloat 128            1333298242721  # 320.435 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\ninstructions         12441394927557 #\nopcache              624303591988   # 50.180 opcache per 1000 inst\nopcache miss         18711598680    #  3.0% opcache miss rate\nl1 dTLB miss         6646693200     # 0.534 L1 dTLB per 1000 inst\nl2 dTLB miss         2584799514     # 0.208 L2 dTLB per 1000 inst\ninstructions         12383667677284 #\nicache               37128517492    # 2.998 icache per 1000 inst\nicache miss          1944446624     #  5.2% icache miss rate\nl1 iTLB miss         10303586       # 0.001 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            85556          # 0.000 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Process overview shows the kripke.exe are primary process.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>933 processes\n\t480 kripke.exe           92076.97   608.11\n\t 68 clinfo                  14.55     4.67\n\t 90 mpirun                   4.40    11.17\n\t 38 vulkaninfo               0.96     0.96\n\t  6 php                      0.30     0.83\n\t  4 vulkani:disk$0           0.11     0.11\n\t  6 glxinfo:gdrv0            0.08     0.07\n\t  6 glxinfo:gl0              0.08     0.07\n\t  2 llvmpipe-0               0.06     0.06\n\t  2 llvmpipe-10              0.06     0.06\n\t  2 llvmpipe-11              0.06     0.06\n\t  2 llvmpipe-12              0.06     0.06\n\t  2 llvmpipe-13              0.06     0.06\n\t  2 llvmpipe-14              0.06     0.06\n\t  2 llvmpipe-15              0.06     0.06\n\t  2 llvmpipe-2               0.06     0.06\n\t  2 llvmpipe-3               0.06     0.06\n\t  2 llvmpipe-4               0.06     0.06\n\t  2 llvmpipe-5               0.06     0.06\n\t  2 llvmpipe-6               0.06     0.06\n\t  2 llvmpipe-7               0.06     0.06\n\t  2 llvmpipe-8               0.06     0.06\n\t  2 llvmpipe-9               0.06     0.06\n\t  2 llvmpipe-1               0.05     0.06\n\t  6 clang                    0.04     0.05\n\t  2 glxinfo                  0.04     0.03\n\t  2 glxinfo:cs0              0.04     0.03\n\t  2 glxinfo:disk$0           0.04     0.03\n\t  2 glxinfo:sh0              0.04     0.03\n\t  2 glxinfo:shlo0            0.04     0.03\n\t  1 lspci                    0.00     0.02\n\t 82 sh                       0.00     0.00\n\t 15 kripke                   0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t  8 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 gmain                    0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  3 dconf worker             0.00     0.00\n\t  3 rocminfo                 0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 ps                       0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Computation blocks<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      1068858) kripke           cpu=1 start=5.33  finish=146.13\n        1068859) mpirun           cpu=9 start=5.33  finish=146.10\n          1068942) mpirun           cpu=8 start=5.89  finish=146.10\n          1068943) mpirun           cpu=1 start=5.89  finish=5.89 \n          1068944) mpirun           cpu=9 start=5.90  finish=146.09\n          1069134) mpirun           cpu=9 start=6.38  finish=146.09\n          1069138) mpirun           cpu=9 start=6.39  finish=146.09\n          1069204) kripke.exe       cpu=0 start=6.46  finish=146.02\n            1069207) kripke.exe       cpu=13 start=6.47  finish=145.69\n            1069211) kripke.exe       cpu=15 start=6.47  finish=145.68\n            1069402) kripke.exe       cpu=0 start=6.81  finish=146.02\n          1069206) kripke.exe       cpu=14 start=6.46  finish=146.04\n            1069212) kripke.exe       cpu=3 start=6.48  finish=145.69\n            1069226) kripke.exe       cpu=14 start=6.52  finish=145.68\n            1069696) kripke.exe       cpu=6 start=7.62  finish=146.05\n          1069209) kripke.exe       cpu=11 start=6.47  finish=146.02\n            1069216) kripke.exe       cpu=11 start=6.49  finish=145.68\n            1069221) kripke.exe       cpu=15 start=6.50  finish=145.68\n            1069674) kripke.exe       cpu=13 start=7.57  finish=146.02\n          1069215) kripke.exe       cpu=5 start=6.48  finish=146.06\n            1069222) kripke.exe       cpu=5 start=6.50  finish=145.69\n            1069233) kripke.exe       cpu=15 start=6.52  finish=145.68\n            1069676) kripke.exe       cpu=10 start=7.58  finish=146.06\n          1069220) kripke.exe       cpu=15 start=6.50  finish=146.08\n            1069224) kripke.exe       cpu=9 start=6.51  finish=145.69\n            1069231) kripke.exe       cpu=1 start=6.52  finish=145.68\n            1069681) kripke.exe       cpu=15 start=7.59  finish=146.08\n          1069223) kripke.exe       cpu=13 start=6.51  finish=146.02\n            1069241) kripke.exe       cpu=13 start=6.53  finish=145.68\n            1069246) kripke.exe       cpu=13 start=6.54  finish=145.68\n            1069686) kripke.exe       cpu=13 start=7.60  finish=146.02\n          1069228) kripke.exe       cpu=2 start=6.52  finish=146.03\n            1069240) kripke.exe       cpu=6 start=6.53  finish=145.68\n            1069249) kripke.exe       cpu=15 start=6.54  finish=145.68\n            1069683) kripke.exe       cpu=4 start=7.59  finish=146.04\n          1069238) kripke.exe       cpu=9 start=6.52  finish=146.03\n            1069248) kripke.exe       cpu=13 start=6.54  finish=145.69\n            1069253) kripke.exe       cpu=1 start=6.55  finish=145.68\n            1069707) kripke.exe       cpu=8 start=7.65  finish=146.03\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>An example of particle transport code. This test fails on my Intel processor because it shows 12 cores and this number does not evenly divide the 192 subdomains. It runs on AMD with one test reporting throughput. Topdown profile shows <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/kripke\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2072","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2072","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=2072"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2072\/revisions"}],"predecessor-version":[{"id":2078,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2072\/revisions\/2078"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=2072"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}