{"id":2307,"date":"2024-06-02T19:16:49","date_gmt":"2024-06-02T19:16:49","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=2307"},"modified":"2024-06-02T19:21:59","modified_gmt":"2024-06-02T19:21:59","slug":"507-cactubssn_r","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/cpu2017\/507-cactubssn_r\/","title":{"rendered":"507.cactuBSSN_r"},"content":{"rendered":"\n<p>cactuBSSN is a SPEC CPU(R) benchmark described <a href=\"https:\/\/www.spec.org\/cpu2017\/Docs\/benchmarks\/507.cactuBSSN_r.html\">here<\/a> and written in C, C++ and Fortran. The workload runs on all logical cores.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-6.png\" alt=\"\" class=\"wp-image-2308\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-6.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-6-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-6-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows this is a backend-bound workload.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-7.png\" alt=\"\" class=\"wp-image-2309\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-7.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-7-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-7-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics confirm the benchmark is memory bound. Only ~55 floating point instructions per 1000.  Not many branches. There is a large percentage of L2 access including ~10% L2 misses.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              609.922\non_cpu               0.981          # 15.69 \/ 16 cores\nutime                9531.490\nstime                37.821\nnvcsw                15443          # 11.89%\nnivcsw               114431         # 88.11%\ninblock              0              # 0.00\/sec\nonblock              302096         # 495.30\/sec\ncpu-clock            9571161124089  # 9571.161 seconds\ntask-clock           9571308947593  # 9571.309 seconds\npage faults          10865926       # 1135.260\/sec\ncontext switches     129126         # 13.491\/sec\ncpu migrations       250            # 0.026\/sec\nmajor page faults    1384           # 0.145\/sec\nminor page faults    10864542       # 1135.116\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             458027626592   # 48.787 branches per 1000 inst\nbranch misses        2930723831     # 0.64% branch miss\nconditional          313602888445   # 33.403 conditional branches per 1000 inst\nindirect             35057805161    # 3.734 indirect branches per 1000 inst\ncpu-cycles           41339157355858 # 4.21 GHz\ninstructions         9387529160935  # 0.23 IPC low\nslots                82669018078590 #\nretiring             3387165858961  #  4.1% ( 4.3%) low\n-- ucode             385306245      #     0.0%\n-- fastpath          3386780552716  #     4.1%\nfrontend             3820302772231  #  4.6% ( 4.8%) low\n-- latency           2938260448878  #     3.6%\n-- bandwidth         882042323353   #     1.1%\nbackend              72418746717494 # 87.6% (90.9%) high\n-- cpu               7511969518602  #     9.1%\n-- memory            64906777198892 #    78.5%\nspeculation          54605694491    #  0.1% ( 0.1%) low\n-- branch mispredict 33703731722    #     0.0%\n-- pipeline restart  20901962769    #     0.0%\nsmt-contention       2988168305306  #  3.6% ( 0.0%)\ncpu-cycles           41172189685093 # 4.21 GHz\ninstructions         9393665948727  # 0.23 IPC low\ninstructions         3125620100025  # 476.467 l2 access per 1000 inst\nl2 hit from l1       1102457333145  # 10.76% l2 miss\nl2 miss from l1      96542777657    #\nl2 hit from l2 pf    323167259778   #\nl3 hit from l2 pf    24513021077    #\nl3 miss from l2 pf   39118477073    #\ninstructions         3128526163741  # 55.472 float per 1000 inst\nfloat 512            308            # 0.000 AVX-512 per 1000 inst\nfloat 256            1236272        # 0.000 AVX-256 per 1000 inst\nfloat 128            173544139570   # 55.472 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         1              # 0.000 scalar per 1000 inst\ninstructions         9383775361054  #\nopcache              1267171993600  # 135.039 opcache per 1000 inst\nopcache miss         604865222155   # 47.7% opcache miss rate\nl1 dTLB miss         534977958244   # 57.011 L1 dTLB per 1000 inst\nl2 dTLB miss         7159514212     # 0.763 L2 dTLB per 1000 inst\ninstructions         9383603598466  #\nicache               648567980402   # 69.117 icache per 1000 inst\nicache miss          271389699675   # 41.8% icache miss rate\nl1 iTLB miss         569831957      # 0.061 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            106503         # 0.000 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Process overview shows cactusBSSN_r_ba as the primary process.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>775 processes\n\t 48 cactusBSSN_r_ba       9598.22    27.75\n\t165 specperl                25.72     3.22\n\t 41 specinvoke               0.01     0.00\n\t  1 clang++                  0.01     0.00\n\t  1 flang                    0.01     0.00\n\t  1 lsb_release              0.01     0.00\n\t  1 clang                    0.00     0.01\n\t270 sh                       0.00     0.00\n\t 54 specrxp                  0.00     0.00\n\t 48 bash                     0.00     0.00\n\t 21 grep                     0.00     0.00\n\t 20 cat                      0.00     0.00\n\t 12 uniq                     0.00     0.00\n\t 11 sort                     0.00     0.00\n\t 10 expand                   0.00     0.00\n\t 10 ps                       0.00     0.00\n\t  6 pwd                      0.00     0.00\n\t  5 basename                 0.00     0.00\n\t  5 specmake                 0.00     0.00\n\t  5 systemctl                0.00     0.00\n\t  4 specpp                   0.00     0.00\n\t  4 uname                    0.00     0.00\n\t  3 dirname                  0.00     0.00\n\t  3 dmidecode                0.00     0.00\n\t  3 lscpu                    0.00     0.00\n\t  2 df                       0.00     0.00\n\t  2 dpkg                     0.00     0.00\n\t  2 rm                       0.00     0.00\n\t  2 runcpu                   0.00     0.00\n\t  2 specsha512sum            0.00     0.00\n\t  2 specxz                   0.00     0.00\n\t  2 who                      0.00     0.00\n\t  1 cpupower                 0.00     0.00\n\t  1 head                     0.00     0.00\n\t  1 logname                  0.00     0.00\n\t  1 ls                       0.00     0.00\n\t  1 numactl                  0.00     0.00\n\t  1 sysctl                   0.00     0.00\n\t  1 w                        0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 which                    0.00     0.00\n0 processes running\n53 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Computation blocks are simple with the spec harness invoking one copy on each logical core.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>    371157) specinvoke       cpu=10 start=3.69  finish=206.29\n      371159) sh               cpu=5 start=3.69  finish=205.12\n        371166) bash             cpu=0 start=3.69  finish=205.12\n          371191) cactusBSSN_r_ba  cpu=0 start=3.69  finish=205.05\n      371160) sh               cpu=9 start=3.69  finish=204.08\n        371168) bash             cpu=1 start=3.69  finish=204.08\n          371197) cactusBSSN_r_ba  cpu=1 start=3.69  finish=203.99\n      371161) sh               cpu=2 start=3.69  finish=203.74\n        371169) bash             cpu=2 start=3.69  finish=203.74\n          371192) cactusBSSN_r_ba  cpu=2 start=3.69  finish=203.61\n      371162) sh               cpu=7 start=3.69  finish=203.55\n        371174) bash             cpu=3 start=3.69  finish=203.55\n          371193) cactusBSSN_r_ba  cpu=3 start=3.69  finish=203.45\n      371163) sh               cpu=13 start=3.69  finish=206.29\n        371175) bash             cpu=4 start=3.69  finish=206.29\n          371196) cactusBSSN_r_ba  cpu=4 start=3.69  finish=206.23\n      371164) sh               cpu=9 start=3.69  finish=204.12\n        371178) bash             cpu=5 start=3.69  finish=204.12\n          371194) cactusBSSN_r_ba  cpu=5 start=3.69  finish=204.02\n      371165) sh               cpu=10 start=3.69  finish=204.49\n        371179) bash             cpu=6 start=3.69  finish=204.49\n          371195) cactusBSSN_r_ba  cpu=6 start=3.69  finish=204.40\n      371167) sh               cpu=7 start=3.69  finish=202.92\n        371176) bash             cpu=7 start=3.69  finish=202.92\n          371199) cactusBSSN_r_ba  cpu=7 start=3.69  finish=202.80\n      371170) sh               cpu=9 start=3.69  finish=205.12\n        371181) bash             cpu=8 start=3.69  finish=205.12\n          371198) cactusBSSN_r_ba  cpu=8 start=3.69  finish=205.05\n      371171) sh               cpu=10 start=3.69  finish=204.07\n        371182) bash             cpu=9 start=3.69  finish=204.07\n          371201) cactusBSSN_r_ba  cpu=9 start=3.69  finish=203.97\n      371172) sh               cpu=3 start=3.69  finish=203.74\n        371185) bash             cpu=10 start=3.69  finish=203.74\n          371202) cactusBSSN_r_ba  cpu=10 start=3.69  finish=203.61\n      371173) sh               cpu=11 start=3.69  finish=203.56\n        371186) bash             cpu=11 start=3.69  finish=203.55\n          371200) cactusBSSN_r_ba  cpu=11 start=3.69  finish=203.45\n      371177) sh               cpu=10 start=3.69  finish=206.29\n        371187) bash             cpu=12 start=3.69  finish=206.29\n          371203) cactusBSSN_r_ba  cpu=12 start=3.69  finish=206.23\n      371180) sh               cpu=10 start=3.69  finish=204.12\n        371188) bash             cpu=13 start=3.69  finish=204.12\n          371204) cactusBSSN_r_ba  cpu=13 start=3.70  finish=204.02\n      371183) sh               cpu=1 start=3.69  finish=204.48\n        371189) bash             cpu=14 start=3.69  finish=204.48\n          371205) cactusBSSN_r_ba  cpu=14 start=3.70  finish=204.40\n      371184) sh               cpu=14 start=3.69  finish=204.69\n        371190) bash             cpu=15 start=3.69  finish=204.69\n          371206) cactusBSSN_r_ba  cpu=15 start=3.70  finish=204.62\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>cactuBSSN is a SPEC CPU(R) benchmark described here and written in C, C++ and Fortran. The workload runs on all logical cores. Topdown profile shows this is a backend-bound workload. AMD metrics confirm the benchmark is memory bound. Only ~55 <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/cpu2017\/507-cactubssn_r\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":2297,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2307","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2307","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=2307"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2307\/revisions"}],"predecessor-version":[{"id":2314,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2307\/revisions\/2314"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2297"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=2307"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}