{"id":1657,"date":"2024-02-10T08:08:26","date_gmt":"2024-02-10T08:08:26","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=1657"},"modified":"2024-02-10T21:27:47","modified_gmt":"2024-02-10T21:27:47","slug":"hpcg","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/hpcg\/","title":{"rendered":"hpcg"},"content":{"rendered":"\n<p>High performance conjugate gradient benchmark. &#8211; <a href=\"https:\/\/www.hpcg-benchmark.org\/\" data-type=\"link\" data-id=\"https:\/\/www.hpcg-benchmark.org\/\">link<\/a> &#8211; used to measure super computers. There are multiple sizes from 104x104x104 to 192x192x192 and two runtimes from 60 to 1800 where 1800 is what is used to officially submit scores. I picked the 144x144x144 size. It reports a metric in GFLOP\/s. Overall profile runs consistently on eight threads, suggesting a MPI run.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-49.png\" alt=\"\" class=\"wp-image-1676\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-49.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-49-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-49-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows mostly backend bound with a low retirement rate.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-51.png\" alt=\"\" class=\"wp-image-1677\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-51.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-51-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-51-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show ~150 floating point instructions per 1000, and confirm the low retirement, high backend memory stalls and low speculation.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              875.134\non_cpu               0.488          # 7.81 \/ 16 cores\nutime                6797.617\nstime                41.324\nnvcsw                35941          # 66.67%\nnivcsw               17964          # 33.33%\ninblock              6880           # 7.86\/sec\nonblock              62312          # 71.20\/sec\ncpu-clock            6840078255450  # 6840.078 seconds\ntask-clock           6840172325218  # 6840.172 seconds\npage faults          16133100       # 2358.581\/sec\ncontext switches     58112          # 8.496\/sec\ncpu migrations       9711           # 1.420\/sec\nmajor page faults    331            # 0.048\/sec\nminor page faults    16132769       # 2358.533\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             2634432109888  # 143.289 branches per 1000 inst\nbranch misses        3672582283     # 0.14% branch miss\nconditional          2568043704731  # 139.678 conditional branches per 1000 inst\nindirect             6032629780     # 0.328 indirect branches per 1000 inst\ncpu-cycles           30968079590724 # 2.21 GHz\ninstructions         18395333044799 # 0.59 IPC low\nslots                61921102171122 #\nretiring             6078821159224  #  9.8% ( 9.8%) low\n-- ucode             839491495      #     0.0%\n-- fastpath          6077981667729  #     9.8%\nfrontend             4307346648297  #  7.0% ( 7.0%)\n-- latency           1798699792212  #     2.9%\n-- bandwidth         2508646856085  #     4.1%\nbackend              51459872447555 # 83.1% (83.2%) high\n-- cpu               8352863384115  #    13.5%\n-- memory            43107009063440 #    69.6%\nspeculation          26131829493    #  0.0% ( 0.0%) low\n-- branch mispredict 24835216381    #     0.0%\n-- pipeline restart  1296613112     #     0.0%\nsmt-contention       48912860156    #  0.1% ( 0.0%)\ncpu-cycles           30958977523557 # 2.21 GHz\ninstructions         18352926259460 # 0.59 IPC low\ninstructions         6114645283233  # 65.650 l2 access per 1000 inst\nl2 hit from l1       224097131601   # 42.88% l2 miss\nl2 miss from l1      5729334541     #\nl2 hit from l2 pf    10927118257    #\nl3 hit from l2 pf    778865718      #\nl3 miss from l2 pf   165624449263   #\ninstructions         6111909121253  # 145.740 float per 1000 inst\nfloat 512            49             # 0.000 AVX-512 per 1000 inst\nfloat 256            772            # 0.000 AVX-256 per 1000 inst\nfloat 128            890750308112   # 145.740 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         45             # 0.000 scalar per 1000 inst\ninstructions         2664431        #\nopcache              983447         # 369.102 opcache per 1000 inst\nopcache miss         528168         # 53.7% opcache miss rate\nl1 dTLB miss         5740           # 2.154 L1 dTLB per 1000 inst\nl2 dTLB miss         1194           # 0.448 L2 dTLB per 1000 inst\ninstructions         2681848        #\nicache               1285641        # 479.386 icache per 1000 inst\nicache miss          107266         #  8.3% icache miss rate\nl1 iTLB miss         15             # 0.006 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            20             # 0.007 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Process overview shows xhpcg taking almost all the time.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>447 processes\n\t 72 xhpcg                20371.80   101.43\n\t 68 clinfo                  17.54     5.97\n\t 18 mpirun                   0.93     2.65\n\t 38 vulkaninfo               0.91     1.34\n\t  6 php                      0.11     1.11\n\t  4 vulkani:disk$0           0.09     0.15\n\t  6 glxinfo:gdrv0            0.08     0.10\n\t  6 glxinfo:gl0              0.08     0.09\n\t  6 clang                    0.06     0.06\n\t  2 llvmpipe-0               0.05     0.07\n\t  2 llvmpipe-1               0.05     0.07\n\t  2 llvmpipe-10              0.05     0.07\n\t  2 llvmpipe-11              0.05     0.07\n\t  2 llvmpipe-12              0.05     0.07\n\t  2 llvmpipe-13              0.05     0.07\n\t  2 llvmpipe-14              0.05     0.07\n\t  2 llvmpipe-15              0.05     0.07\n\t  2 llvmpipe-2               0.05     0.07\n\t  2 llvmpipe-3               0.05     0.07\n\t  2 llvmpipe-4               0.05     0.07\n\t  2 llvmpipe-5               0.05     0.07\n\t  2 llvmpipe-6               0.05     0.07\n\t  2 llvmpipe-7               0.05     0.07\n\t  2 llvmpipe-8               0.05     0.07\n\t  2 llvmpipe-9               0.05     0.07\n\t  2 glxinfo                  0.04     0.04\n\t  2 glxinfo:cs0              0.04     0.04\n\t  2 glxinfo:disk$0           0.04     0.04\n\t  2 glxinfo:sh0              0.04     0.04\n\t  2 glxinfo:shlo0            0.04     0.04\n\t  3 rocminfo                 0.03     0.00\n\t  1 lspci                    0.00     0.01\n\t  1 ps                       0.00     0.01\n\t 82 sh                       0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  7 gsettings                0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 gmain                    0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  4 dconf worker             0.00     0.00\n\t  3 cat                      0.00     0.00\n\t  3 hpcg                     0.00     0.00\n\t  3 rm                       0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Computation blocks confirm this is run with MPI<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      402268) hpcg             cpu=4 start=6.71  finish=292.31\n        402269) rm               cpu=6 start=6.71  finish=6.72 \n        402270) mpirun           cpu=9 start=6.72  finish=292.28\n          402274) mpirun           cpu=5 start=7.30  finish=292.28\n          402275) mpirun           cpu=14 start=7.30  finish=7.30 \n          402276) mpirun           cpu=15 start=7.32  finish=292.27\n          402278) mpirun           cpu=10 start=7.80  finish=292.27\n          402279) mpirun           cpu=3 start=7.80  finish=292.27\n          402280) xhpcg            cpu=6 start=7.83  finish=292.06\n            402282) xhpcg            cpu=6 start=7.84  finish=292.00\n            402285) xhpcg            cpu=11 start=7.85  finish=291.99\n          402281) xhpcg            cpu=12 start=7.84  finish=292.05\n            402284) xhpcg            cpu=6 start=7.85  finish=291.99\n            402288) xhpcg            cpu=7 start=7.85  finish=291.99\n          402283) xhpcg            cpu=2 start=7.84  finish=292.06\n            402287) xhpcg            cpu=4 start=7.85  finish=291.99\n            402290) xhpcg            cpu=14 start=7.86  finish=291.99\n          402286) xhpcg            cpu=15 start=7.85  finish=292.04\n            402291) xhpcg            cpu=4 start=7.86  finish=291.99\n            402294) xhpcg            cpu=13 start=7.86  finish=291.99\n          402289) xhpcg            cpu=1 start=7.85  finish=292.04\n            402293) xhpcg            cpu=6 start=7.86  finish=291.99\n            402298) xhpcg            cpu=8 start=7.87  finish=291.99\n          402292) xhpcg            cpu=3 start=7.86  finish=292.05\n            402296) xhpcg            cpu=6 start=7.87  finish=291.99\n            402300) xhpcg            cpu=4 start=7.87  finish=291.99\n          402295) xhpcg            cpu=5 start=7.86  finish=292.05\n            402299) xhpcg            cpu=11 start=7.87  finish=291.99\n            402302) xhpcg            cpu=9 start=7.88  finish=291.99\n          402297) xhpcg            cpu=8 start=7.87  finish=292.05\n            402301) xhpcg            cpu=11 start=7.87  finish=291.99\n            402303) xhpcg            cpu=2 start=7.88  finish=291.99\n        402308) cat              cpu=7 start=292.30 finish=292.31\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>High performance conjugate gradient benchmark. &#8211; link &#8211; used to measure super computers. There are multiple sizes from 104x104x104 to 192x192x192 and two runtimes from 60 to 1800 where 1800 is what is used to officially submit scores. I picked <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/hpcg\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-1657","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1657","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=1657"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1657\/revisions"}],"predecessor-version":[{"id":1678,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1657\/revisions\/1678"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=1657"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}