{"id":2300,"date":"2024-06-01T13:55:32","date_gmt":"2024-06-01T13:55:32","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=2300"},"modified":"2024-06-02T16:52:07","modified_gmt":"2024-06-02T16:52:07","slug":"503-bwaves_r","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/cpu2017\/503-bwaves_r\/","title":{"rendered":"503.bwaves_r"},"content":{"rendered":"\n<p>bwaves is a SPEC CPU(R) benchmark described <a href=\"https:\/\/spec.org\/cpu2017\/Docs\/benchmarks\/503.bwaves_r.html\">here<\/a>. This Fortran workload runs consistently on all logical cores.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-5.png\" alt=\"\" class=\"wp-image-2303\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-5.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-5-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-5-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows this as a backend-bound workload.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-6.png\" alt=\"\" class=\"wp-image-2305\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-6.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-6-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-6-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics for a 7840 processor confirms a backend-bound and particularly memory-bound process.  The L2 access rate is ~133 per 1000 instructions and half of these are misses. There are very few branches.  Approximately 1\/4 of the instructions are floating point.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              4632.332\non_cpu               0.990          # 15.85 \/ 16 cores\nutime                73220.919\nstime                181.889\nnvcsw                96986          # 11.93%\nnivcsw               715815         # 88.07%\ninblock              0              # 0.00\/sec\nonblock              21864          # 4.72\/sec\ncpu-clock            73420734604801 # 73420.735 seconds\ntask-clock           73422148131842 # 73422.148 seconds\npage faults          41516197       # 565.445\/sec\ncontext switches     811557         # 11.053\/sec\ncpu migrations       521            # 0.007\/sec\nmajor page faults    1855           # 0.025\/sec\nminor page faults    41514342       # 565.420\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             833760239357   # 19.172 branches per 1000 inst\nbranch misses        11492503371    # 1.38% branch miss\nconditional          651248492065   # 14.975 conditional branches per 1000 inst\nindirect             60811077930    # 1.398 indirect branches per 1000 inst\ncpu-cycles           332144753197546 # 4.47 GHz\ninstructions         43479218645688 # 0.13 IPC low\nslots                664161383718216 #\nretiring             14604192045965 #  2.2% ( 2.3%) low\n-- ucode             3068808550     #     0.0%\n-- fastpath          14601123237415 #     2.2%\nfrontend             8982526992280  #  1.4% ( 1.4%) low\n-- latency           7953765511854  #     1.2%\n-- bandwidth         1028761480426  #     0.2%\nbackend              619986995494093 # 93.3% (96.3%) high\n-- cpu               45180566558648 #     6.8%\n-- memory            574806428935445 #    86.5%\nspeculation          473785697223   #  0.1% ( 0.1%) low\n-- branch mispredict 231332716080   #     0.0%\n-- pipeline restart  242452981143   #     0.0%\nsmt-contention       20113692014147 #  3.0% ( 0.0%)\ncpu-cycles           332893296051422 # 4.47 GHz\ninstructions         43483758640393 # 0.13 IPC low\ninstructions         14498863832586 # 132.988 l2 access per 1000 inst\nl2 hit from l1       1503844106419  # 49.20% l2 miss\nl2 miss from l1      707659574769   #\nl2 hit from l2 pf    183385007792   #\nl3 hit from l2 pf    9300133264     #\nl3 miss from l2 pf   231652380877   #\ninstructions         14491614030746 # 260.237 float per 1000 inst\nfloat 512            580            # 0.000 AVX-512 per 1000 inst\nfloat 256            2082           # 0.000 AVX-256 per 1000 inst\nfloat 128            3771247626425  # 260.237 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         2              # 0.000 scalar per 1000 inst\ninstructions         43471275755268 #\nopcache              5086289745292  # 117.003 opcache per 1000 inst\nopcache miss         495051701654   #  9.7% opcache miss rate\nl1 dTLB miss         79670864160    # 1.833 L1 dTLB per 1000 inst\nl2 dTLB miss         48345344515    # 1.112 L2 dTLB per 1000 inst\ninstructions         43470965049493 #\nicache               600576925212   # 13.816 icache per 1000 inst\nicache miss          39215141772    #  6.5% icache miss rate\nl1 iTLB miss         578741019      # 0.013 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            272150         # 0.000 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>The process overview shows bwaves_r as primary process with the small amount of rest being the spec harness.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>1016 processes\n\t144 bwaves_r_base.m      53541.39    89.25\n\t142 specperl                23.53     4.73\n\t  1 lsb_release              0.01     0.00\n\t 33 specinvoke               0.00     0.09\n\t144 bash                     0.00     0.03\n\t 10 ps                       0.00     0.02\n\t  1 flang                    0.00     0.02\n\t348 sh                       0.00     0.00\n\t 21 grep                     0.00     0.00\n\t 20 cat                      0.00     0.00\n\t 12 uniq                     0.00     0.00\n\t 11 sort                     0.00     0.00\n\t 10 expand                   0.00     0.00\n\t  6 pwd                      0.00     0.00\n\t  5 basename                 0.00     0.00\n\t  5 specmake                 0.00     0.00\n\t  5 specrxp                  0.00     0.00\n\t  5 systemctl                0.00     0.00\n\t  4 specpp                   0.00     0.00\n\t  4 uname                    0.00     0.00\n\t  3 dirname                  0.00     0.00\n\t  3 dmidecode                0.00     0.00\n\t  3 lscpu                    0.00     0.00\n\t  2 df                       0.00     0.00\n\t  2 dpkg                     0.00     0.00\n\t  2 rm                       0.00     0.00\n\t  2 runcpu                   0.00     0.00\n\t  2 specsha512sum            0.00     0.00\n\t  2 specxz                   0.00     0.00\n\t  2 who                      0.00     0.00\n\t  1 cpupower                 0.00     0.00\n\t  1 head                     0.00     0.00\n\t  1 logname                  0.00     0.00\n\t  1 ls                       0.00     0.00\n\t  1 numactl                  0.00     0.00\n\t  1 sysctl                   0.00     0.00\n\t  1 w                        0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 which                    0.00     0.00\n53 processes running\n53 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Specinvoke fires up separate processes for each core.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>    356101) specinvoke       cpu=7 start=3.24  finish=1546.35\n      356103) sh               cpu=0 start=3.24  finish=295.94\n        356109) bash             cpu=0 start=3.24  finish=295.94\n          356135) bwaves_r_base.m  cpu=0 start=3.24  finish=295.69\n      356104) sh               cpu=1 start=3.24  finish=289.88\n        356110) bash             cpu=1 start=3.24  finish=289.88\n          356133) bwaves_r_base.m  cpu=1 start=3.24  finish=289.62\n      356105) sh               cpu=2 start=3.24  finish=294.24\n        356114) bash             cpu=2 start=3.24  finish=294.24\n          356137) bwaves_r_base.m  cpu=2 start=3.24  finish=294.02\n      356106) sh               cpu=3 start=3.24  finish=294.03\n        356117) bash             cpu=3 start=3.24  finish=294.03\n          356140) bwaves_r_base.m  cpu=3 start=3.24  finish=293.80\n      356107) sh               cpu=4 start=3.24  finish=287.40\n        356115) bash             cpu=4 start=3.24  finish=287.39\n          356138) bwaves_r_base.m  cpu=4 start=3.24  finish=287.17\n      356108) sh               cpu=5 start=3.24  finish=293.24\n        356113) bash             cpu=5 start=3.24  finish=293.23\n          356139) bwaves_r_base.m  cpu=5 start=3.24  finish=292.99\n      356111) sh               cpu=6 start=3.24  finish=303.56\n        356120) bash             cpu=6 start=3.24  finish=303.56\n          356141) bwaves_r_base.m  cpu=6 start=3.24  finish=303.33\n      356112) sh               cpu=7 start=3.24  finish=295.06\n        356122) bash             cpu=7 start=3.24  finish=295.06\n          356142) bwaves_r_base.m  cpu=7 start=3.24  finish=294.80\n      356116) sh               cpu=8 start=3.24  finish=291.15\n        356125) bash             cpu=8 start=3.24  finish=291.15\n          356143) bwaves_r_base.m  cpu=8 start=3.25  finish=290.90\n      356118) sh               cpu=9 start=3.24  finish=291.24\n        356127) bash             cpu=9 start=3.24  finish=291.24\n          356147) bwaves_r_base.m  cpu=9 start=3.25  finish=291.03\n      356119) sh               cpu=10 start=3.24  finish=296.70\n        356128) bash             cpu=10 start=3.24  finish=296.70\n          356149) bwaves_r_base.m  cpu=10 start=3.25  finish=296.45\n      356121) sh               cpu=11 start=3.24  finish=294.39\n        356130) bash             cpu=11 start=3.24  finish=294.39\n          356148) bwaves_r_base.m  cpu=11 start=3.25  finish=294.19\n      356123) sh               cpu=12 start=3.24  finish=288.30\n        356131) bash             cpu=12 start=3.24  finish=288.30\n          356146) bwaves_r_base.m  cpu=12 start=3.25  finish=288.11\n      356124) sh               cpu=13 start=3.24  finish=293.74\n        356132) bash             cpu=13 start=3.24  finish=293.74\n          356145) bwaves_r_base.m  cpu=13 start=3.25  finish=293.56\n      356126) sh               cpu=14 start=3.24  finish=298.69\n        356134) bash             cpu=14 start=3.24  finish=298.69\n          356144) bwaves_r_base.m  cpu=14 start=3.25  finish=298.44\n      356129) sh               cpu=15 start=3.24  finish=302.56\n        356136) bash             cpu=15 start=3.24  finish=302.56\n          356150) bwaves_r_base.m  cpu=15 start=3.25  finish=302.30\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>bwaves is a SPEC CPU(R) benchmark described here. This Fortran workload runs consistently on all logical cores. Topdown profile shows this as a backend-bound workload. AMD metrics for a 7840 processor confirms a backend-bound and particularly memory-bound process. The L2 <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/cpu2017\/503-bwaves_r\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":2297,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2300","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2300","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=2300"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2300\/revisions"}],"predecessor-version":[{"id":2306,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2300\/revisions\/2306"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2297"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=2300"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}