{"id":2391,"date":"2024-06-04T12:29:35","date_gmt":"2024-06-04T12:29:35","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=2391"},"modified":"2024-06-07T00:56:34","modified_gmt":"2024-06-07T00:56:34","slug":"541-leela_r","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/cpu2017\/541-leela_r\/","title":{"rendered":"541.leela_r"},"content":{"rendered":"\n<p>leela is a SPEC CPU(R) benchmark written in C++ and described <a href=\"https:\/\/spec.org\/cpu2017\/Docs\/benchmarks\/541.leela_r.html\">here<\/a>. The workload runs on all logical cores.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-25.png\" alt=\"\" class=\"wp-image-2466\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-25.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-25-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-25-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows a high rate of frontend stalls and lower backend stalls. Branch misprediction is also surprisingly high.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-26.png\" alt=\"\" class=\"wp-image-2467\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-26.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-26-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-26-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics on 7840 confirms a moderate branch count. Unclear exactly why the frontend stalls are as high as they are other than missing 12% of the branches.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              1062.974\non_cpu               0.985          # 15.76 \/ 16 cores\nutime                16746.664\nstime                10.843\nnvcsw                23905          # 13.58%\nnivcsw               152068         # 86.42%\ninblock              0              # 0.00\/sec\nonblock              153096         # 144.03\/sec\ncpu-clock            16757954170204 # 16757.954 seconds\ntask-clock           16758041250804 # 16758.041 seconds\npage faults          2621721        # 156.446\/sec\ncontext switches     175408         # 10.467\/sec\ncpu migrations       136            # 0.008\/sec\nmajor page faults    1134           # 0.068\/sec\nminor page faults    2620587        # 156.378\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             10306429534367 # 141.333 branches per 1000 inst\nbranch misses        1254180631762  # 12.17% branch miss\nconditional          8665643643389  # 118.833 conditional branches per 1000 inst\nindirect             13613050061    # 0.187 indirect branches per 1000 inst\ncpu-cycles           69276351926448 # 4.07 GHz\ninstructions         72927455351535 # 1.05 IPC\nslots                138563812553184 #\nretiring             25252882198147 # 18.2% (23.6%)\n-- ucode             365836893      #     0.0%\n-- fastpath          25252516361254 #    18.2%\nfrontend             53606931164145 # 38.7% (50.0%) high\n-- latency           38042108363622 #    27.5%\n-- bandwidth         15564822800523 #    11.2%\nbackend              13591797279860 #  9.8% (12.7%) low\n-- cpu               4113205360771  #     3.0%\n-- memory            9478591919089  #     6.8%\nspeculation          14685401691972 # 10.6% (13.7%) high\n-- branch mispredict 14528567000506 #    10.5%\n-- pipeline restart  156834691466   #     0.1%\nsmt-contention       31426664933396 # 22.7% ( 0.0%)\ncpu-cycles           69260152059467 # 4.08 GHz\ninstructions         72944694306125 # 1.05 IPC\ninstructions         24309175729662 # 18.021 l2 access per 1000 inst\nl2 hit from l1       319328782407   # 4.14% l2 miss\nl2 miss from l1      9776623105     #\nl2 hit from l2 pf    110379924818   #\nl3 hit from l2 pf    6132748151     #\nl3 miss from l2 pf   2234093578     #\ninstructions         24301522652778 # 81.302 float per 1000 inst\nfloat 512            217            # 0.000 AVX-512 per 1000 inst\nfloat 256            2388431966     # 0.098 AVX-256 per 1000 inst\nfloat 128            1973374916489  # 81.204 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         20             # 0.000 scalar per 1000 inst\ninstructions         72921672042387 #\nopcache              19636741457450 # 269.285 opcache per 1000 inst\nopcache miss         503412573061   #  2.6% opcache miss rate\nl1 dTLB miss         71515941686    # 0.981 L1 dTLB per 1000 inst\nl2 dTLB miss         3218929660     # 0.044 L2 dTLB per 1000 inst\ninstructions         72921543862704 #\nicache               572134287027   # 7.846 icache per 1000 inst\nicache miss          47743495211    #  8.3% icache miss rate\nl1 iTLB miss         255447231      # 0.004 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            101123         # 0.000 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Process profile shows time spent in leela_r_base.me<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>581 processes\n\t 48 leela_r_base.me      16706.75     7.13\n\t 69 specperl                 9.26     1.64\n\t  1 clang++                  0.01     0.00\n\t  1 lsb_release              0.01     0.00\n\t 11 ps                       0.00     0.01\n\t173 sh                       0.00     0.00\n\t 54 specrxp                  0.00     0.00\n\t 48 bash                     0.00     0.00\n\t 41 specinvoke               0.00     0.00\n\t 21 grep                     0.00     0.00\n\t 20 cat                      0.00     0.00\n\t 12 uniq                     0.00     0.00\n\t 11 sort                     0.00     0.00\n\t 10 expand                   0.00     0.00\n\t  6 pwd                      0.00     0.00\n\t  5 basename                 0.00     0.00\n\t  5 specmake                 0.00     0.00\n\t  5 systemctl                0.00     0.00\n\t  4 specpp                   0.00     0.00\n\t  4 uname                    0.00     0.00\n\t  3 dirname                  0.00     0.00\n\t  3 dmidecode                0.00     0.00\n\t  3 lscpu                    0.00     0.00\n\t  2 df                       0.00     0.00\n\t  2 dpkg                     0.00     0.00\n\t  2 rm                       0.00     0.00\n\t  2 runcpu                   0.00     0.00\n\t  2 specsha512sum            0.00     0.00\n\t  2 specxz                   0.00     0.00\n\t  2 who                      0.00     0.00\n\t  1 cpupower                 0.00     0.00\n\t  1 head                     0.00     0.00\n\t  1 logname                  0.00     0.00\n\t  1 ls                       0.00     0.00\n\t  1 numactl                  0.00     0.00\n\t  1 sysctl                   0.00     0.00\n\t  1 w                        0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 which                    0.00     0.00\n0 processes running\n53 maximum processes\n\n<\/code><\/pre>\n\n\n\n<p>specinvoke fires up separate copies on each logical processor<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>    58477) specinvoke       cpu=5 start=3.33  finish=354.44\n      58479) sh               cpu=9 start=3.33  finish=351.88\n        58488) bash             cpu=0 start=3.33  finish=351.88\n          58513) leela_r_base.me  cpu=0 start=3.33  finish=351.87\n      58480) sh               cpu=4 start=3.33  finish=351.79\n        58490) bash             cpu=1 start=3.33  finish=351.79\n          58511) leela_r_base.me  cpu=1 start=3.33  finish=351.79\n      58481) sh               cpu=5 start=3.33  finish=354.44\n        58491) bash             cpu=2 start=3.33  finish=354.44\n          58517) leela_r_base.me  cpu=2 start=3.33  finish=354.44\n      58482) sh               cpu=3 start=3.33  finish=350.76\n        58494) bash             cpu=3 start=3.33  finish=350.76\n          58515) leela_r_base.me  cpu=3 start=3.33  finish=350.75\n      58483) sh               cpu=3 start=3.33  finish=350.83\n        58505) bash             cpu=4 start=3.33  finish=350.83\n          58518) leela_r_base.me  cpu=4 start=3.33  finish=350.83\n      58484) sh               cpu=5 start=3.33  finish=351.25\n        58492) bash             cpu=5 start=3.33  finish=351.25\n          58516) leela_r_base.me  cpu=5 start=3.33  finish=351.25\n      58485) sh               cpu=14 start=3.33  finish=352.85\n        58496) bash             cpu=6 start=3.33  finish=352.85\n          58512) leela_r_base.me  cpu=6 start=3.33  finish=352.85\n      58486) sh               cpu=9 start=3.33  finish=352.07\n        58503) bash             cpu=7 start=3.33  finish=352.07\n          58520) leela_r_base.me  cpu=7 start=3.33  finish=352.06\n      58487) sh               cpu=12 start=3.33  finish=352.24\n        58499) bash             cpu=8 start=3.33  finish=352.24\n          58514) leela_r_base.me  cpu=8 start=3.33  finish=352.24\n      58489) sh               cpu=13 start=3.33  finish=351.74\n        58502) bash             cpu=9 start=3.33  finish=351.74\n          58519) leela_r_base.me  cpu=9 start=3.33  finish=351.73\n      58493) sh               cpu=4 start=3.33  finish=351.78\n        58501) bash             cpu=10 start=3.33  finish=351.78\n          58521) leela_r_base.me  cpu=10 start=3.33  finish=351.77\n      58495) sh               cpu=4 start=3.33  finish=351.60\n        58506) bash             cpu=11 start=3.33  finish=351.60\n          58522) leela_r_base.me  cpu=11 start=3.33  finish=351.60\n      58497) sh               cpu=4 start=3.33  finish=352.05\n        58507) bash             cpu=12 start=3.33  finish=352.05\n          58523) leela_r_base.me  cpu=12 start=3.33  finish=352.04\n      58498) sh               cpu=3 start=3.33  finish=350.95\n        58508) bash             cpu=13 start=3.33  finish=350.95\n          58525) leela_r_base.me  cpu=13 start=3.33  finish=350.94\n      58500) sh               cpu=0 start=3.33  finish=352.84\n        58509) bash             cpu=14 start=3.33  finish=352.84\n          58524) leela_r_base.me  cpu=14 start=3.33  finish=352.83\n      58504) sh               cpu=15 start=3.33  finish=352.67\n        58510) bash             cpu=15 start=3.33  finish=352.67\n          58526) leela_r_base.me  cpu=15 start=3.33  finish=352.66\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>leela is a SPEC CPU(R) benchmark written in C++ and described here. The workload runs on all logical cores. Topdown profile shows a high rate of frontend stalls and lower backend stalls. Branch misprediction is also surprisingly high. AMD metrics <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/cpu2017\/541-leela_r\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":2297,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2391","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2391","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=2391"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2391\/revisions"}],"predecessor-version":[{"id":2469,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2391\/revisions\/2469"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2297"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=2391"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}