{"id":2331,"date":"2024-06-02T23:37:33","date_gmt":"2024-06-02T23:37:33","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=2331"},"modified":"2024-06-04T08:45:55","modified_gmt":"2024-06-04T08:45:55","slug":"521-wrf_r","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/cpu2017\/521-wrf_r\/","title":{"rendered":"521.wrf_r"},"content":{"rendered":"\n<p>wrf is a SPEC CPU(R) benchmark described&nbsp;&nbsp;<a href=\"https:\/\/spec.org\/cpu2017\/Docs\/benchmarks\/521.wrf_r.html\">here <\/a>and written in C and Fortran. The workload runs on all logical cores.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-11.png\" alt=\"\" class=\"wp-image-2367\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-11.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-11-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-11-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows this is backend-bound workload.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-12.png\" alt=\"\" class=\"wp-image-2368\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-12.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-12-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-12-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics on 7840 processor show memory stalls as the largest issue.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              1953.561\non_cpu               0.987          # 15.79 \/ 16 cores\nutime                30800.669\nstime                53.643\nnvcsw                41538          # 11.04%\nnivcsw               334706         # 88.96%\ninblock              0              # 0.00\/sec\nonblock              2841304        # 1454.42\/sec\ncpu-clock            30864242737605 # 30864.243 seconds\ntask-clock           30864977164171 # 30864.977 seconds\npage faults          6126037        # 198.479\/sec\ncontext switches     375594         # 12.169\/sec\ncpu migrations       215            # 0.007\/sec\nmajor page faults    1090           # 0.035\/sec\nminor page faults    6124947        # 198.443\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             6636680282416  # 113.379 branches per 1000 inst\nbranch misses        59682724436    # 0.90% branch miss\nconditional          4546842780846  # 77.677 conditional branches per 1000 inst\nindirect             816253071028   # 13.945 indirect branches per 1000 inst\ncpu-cycles           136220022944715 # 4.35 GHz\ninstructions         58535248389856 # 0.43 IPC low\nslots                272403659326590 #\nretiring             20112540460746 #  7.4% ( 7.8%) low\n-- ucode             2521696306     #     0.0%\n-- fastpath          20110018764440 #     7.4%\nfrontend             20130144250554 #  7.4% ( 7.9%)\n-- latency           16833110598090 #     6.2%\n-- bandwidth         3297033652464  #     1.2%\nbackend              214918740039030 # 78.9% (83.8%) high\n-- cpu               42070191486014 #    15.4%\n-- memory            172848548553016 #    63.5%\nspeculation          1185885235261  #  0.4% ( 0.5%) low\n-- branch mispredict 1078145850915  #     0.4%\n-- pipeline restart  107739384346   #     0.0%\nsmt-contention       16056201938033 #  5.9% ( 0.0%)\ncpu-cycles           136075231851736 # 4.34 GHz\ninstructions         58546847147625 # 0.43 IPC low\ninstructions         19516296439585 # 77.326 l2 access per 1000 inst\nl2 hit from l1       1160015123136  # 26.15% l2 miss\nl2 miss from l1      126776457921   #\nl2 hit from l2 pf    81216020790    #\nl3 hit from l2 pf    60953233587    #\nl3 miss from l2 pf   206927874633   #\ninstructions         19507693906773 # 278.072 float per 1000 inst\nfloat 512            220            # 0.000 AVX-512 per 1000 inst\nfloat 256            42784064091    # 2.193 AVX-256 per 1000 inst\nfloat 128            5381755505920  # 275.879 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         3              # 0.000 scalar per 1000 inst\ninstructions         58532587204494 #\nopcache              9384365925865  # 160.327 opcache per 1000 inst\nopcache miss         287236589767   #  3.1% opcache miss rate\nl1 dTLB miss         186729836520   # 3.190 L1 dTLB per 1000 inst\nl2 dTLB miss         13005504047    # 0.222 L2 dTLB per 1000 inst\ninstructions         58532845110136 #\nicache               380662582609   # 6.503 icache per 1000 inst\nicache miss          103844206789   # 27.3% icache miss rate\nl1 iTLB miss         1719496360     # 0.029 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            613828         # 0.000 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Process overview shows time spent in wrf_r_base.mev<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>691 processes\n\t 48 wrf_r_base.mev-      30711.47    40.93\n\t 71 specperl                25.07     6.50\n\t 48 diffwrf_521_bas          4.92     0.33\n\t  2 clang                    0.02     0.00\n\t  2 flang                    0.01     0.02\n\t  1 lsb_release              0.01     0.00\n\t 10 ps                       0.00     0.01\n\t224 sh                       0.00     0.00\n\t 54 specrxp                  0.00     0.00\n\t 48 bash                     0.00     0.00\n\t 41 specinvoke               0.00     0.00\n\t 22 cat                      0.00     0.00\n\t 21 grep                     0.00     0.00\n\t 12 uniq                     0.00     0.00\n\t 11 sort                     0.00     0.00\n\t 10 expand                   0.00     0.00\n\t  7 specmake                 0.00     0.00\n\t  6 pwd                      0.00     0.00\n\t  5 basename                 0.00     0.00\n\t  5 systemctl                0.00     0.00\n\t  4 rm                       0.00     0.00\n\t  4 specpp                   0.00     0.00\n\t  4 uname                    0.00     0.00\n\t  3 dirname                  0.00     0.00\n\t  3 dmidecode                0.00     0.00\n\t  3 lscpu                    0.00     0.00\n\t  2 df                       0.00     0.00\n\t  2 dpkg                     0.00     0.00\n\t  2 runcpu                   0.00     0.00\n\t  2 specsha512sum            0.00     0.00\n\t  2 specxz                   0.00     0.00\n\t  2 who                      0.00     0.00\n\t  1 cpupower                 0.00     0.00\n\t  1 head                     0.00     0.00\n\t  1 logname                  0.00     0.00\n\t  1 ls                       0.00     0.00\n\t  1 numactl                  0.00     0.00\n\t  1 sysctl                   0.00     0.00\n\t  1 w                        0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 which                    0.00     0.00\n1 processes running\n54 maximum processes\n<\/code><\/pre>\n\n\n\n<p>specinvoke fires off separate copies on each logical core.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>    420317) specinvoke       cpu=1 start=4.39  finish=652.66\n      420319) sh               cpu=7 start=4.40  finish=648.02\n        420330) bash             cpu=0 start=4.40  finish=648.02\n          420353) wrf_r_base.mev-  cpu=0 start=4.40  finish=648.00\n      420320) ?? cpu=0 start=4.40  finish=0.00 \n        420329) bash             cpu=1 start=4.40  finish=643.64\n          420355) wrf_r_base.mev-  cpu=1 start=4.40  finish=643.60\n      420321) sh               cpu=12 start=4.40  finish=646.09\n        420331) bash             cpu=2 start=4.40  finish=646.08\n          420352) wrf_r_base.mev-  cpu=2 start=4.40  finish=646.06\n      420322) sh               cpu=12 start=4.40  finish=652.66\n        420333) bash             cpu=3 start=4.40  finish=652.66\n          420351) wrf_r_base.mev-  cpu=3 start=4.40  finish=652.65\n      420323) sh               cpu=8 start=4.40  finish=644.40\n        420334) bash             cpu=4 start=4.40  finish=644.40\n          420354) wrf_r_base.mev-  cpu=4 start=4.40  finish=644.38\n      420324) sh               cpu=8 start=4.40  finish=650.22\n        420343) bash             cpu=5 start=4.40  finish=650.22\n          420357) wrf_r_base.mev-  cpu=5 start=4.40  finish=650.21\n      420325) sh               cpu=1 start=4.40  finish=646.64\n        420344) bash             cpu=6 start=4.40  finish=646.64\n          420358) wrf_r_base.mev-  cpu=6 start=4.40  finish=646.62\n      420326) sh               cpu=8 start=4.40  finish=646.89\n        420336) bash             cpu=7 start=4.40  finish=646.89\n          420359) wrf_r_base.mev-  cpu=7 start=4.40  finish=646.88\n      420327) sh               cpu=14 start=4.40  finish=643.50\n        420338) bash             cpu=8 start=4.40  finish=643.50\n          420363) wrf_r_base.mev-  cpu=8 start=4.40  finish=643.46\n      420328) sh               cpu=1 start=4.40  finish=643.75\n        420340) bash             cpu=9 start=4.40  finish=643.75\n          420356) wrf_r_base.mev-  cpu=9 start=4.40  finish=643.72\n      420332) sh               cpu=1 start=4.40  finish=646.08\n        420341) bash             cpu=10 start=4.40  finish=646.08\n          420360) wrf_r_base.mev-  cpu=10 start=4.40  finish=646.06\n      420335) sh               cpu=12 start=4.40  finish=652.19\n        420346) bash             cpu=11 start=4.40  finish=652.19\n          420361) wrf_r_base.mev-  cpu=11 start=4.40  finish=652.17\n      420337) sh               cpu=12 start=4.40  finish=643.75\n        420347) bash             cpu=12 start=4.40  finish=643.75\n          420362) wrf_r_base.mev-  cpu=12 start=4.40  finish=643.72\n      420339) sh               cpu=0 start=4.40  finish=649.31\n        420348) bash             cpu=13 start=4.40  finish=649.31\n          420364) wrf_r_base.mev-  cpu=13 start=4.40  finish=649.29\n      420342) sh               cpu=14 start=4.40  finish=643.44\n        420349) bash             cpu=14 start=4.40  finish=643.44\n          420365) wrf_r_base.mev-  cpu=14 start=4.40  finish=643.39\n      420345) sh               cpu=15 start=4.40  finish=646.88\n        420350) bash             cpu=15 start=4.40  finish=646.88\n          420366) wrf_r_base.mev-  cpu=15 start=4.40  finish=646.85\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>wrf is a SPEC CPU(R) benchmark described&nbsp;&nbsp;here and written in C and Fortran. The workload runs on all logical cores. Topdown profile shows this is backend-bound workload. AMD metrics on 7840 processor show memory stalls as the largest issue. Process <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/cpu2017\/521-wrf_r\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":2297,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2331","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2331","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=2331"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2331\/revisions"}],"predecessor-version":[{"id":2370,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2331\/revisions\/2370"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2297"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=2331"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}