{"id":2385,"date":"2024-06-04T12:25:07","date_gmt":"2024-06-04T12:25:07","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=2385"},"modified":"2024-06-06T11:32:11","modified_gmt":"2024-06-06T11:32:11","slug":"523-xalancbmk_r","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/cpu2017\/523-xalancbmk_r\/","title":{"rendered":"523.xalancbmk_r"},"content":{"rendered":"\n<p>xalancbmk is a SPEC CPU(R) benchmark written in C++ and described <a href=\"https:\/\/spec.org\/cpu2017\/Docs\/benchmarks\/523.xalancbmk_r.html\">here<\/a>. The workload runs on all logical cores.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-22.png\" alt=\"\" class=\"wp-image-2454\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-22.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-22-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-22-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows two different workload regions with the first with higher backend stalls<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-23.png\" alt=\"\" class=\"wp-image-2456\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-23.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-23-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-23-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics on 7840 show over 1\/4 of instructions are branches and that memory stalls are ~40% of the time.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              716.422\non_cpu               0.983          # 15.72 \/ 16 cores\nutime                11200.976\nstime                62.076\nnvcsw                16868          # 14.21%\nnivcsw               101844         # 85.79%\ninblock              0              # 0.00\/sec\nonblock              5941288        # 8293.00\/sec\ncpu-clock            11264034913550 # 11264.035 seconds\ntask-clock           11264141447655 # 11264.141 seconds\npage faults          10479547       # 930.346\/sec\ncontext switches     118151         # 10.489\/sec\ncpu migrations       156            # 0.014\/sec\nmajor page faults    1111           # 0.099\/sec\nminor page faults    10478436       # 930.247\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             10322703995232 # 267.659 branches per 1000 inst\nbranch misses        34592086095    # 0.34% branch miss\nconditional          9047084912417  # 234.583 conditional branches per 1000 inst\nindirect             287264629665   # 7.449 indirect branches per 1000 inst\ncpu-cycles           49888709082342 # 4.32 GHz\ninstructions         38564460869272 # 0.77 IPC\nslots                99786843927960 #\nretiring             12243245308066 # 12.3% (19.0%)\n-- ucode             71983786453    #     0.1%\n-- fastpath          12171261521613 #    12.2%\nfrontend             5763112454235  #  5.8% ( 8.9%)\n-- latency           3553925492466  #     3.6%\n-- bandwidth         2209186961769  #     2.2%\nbackend              45856010835439 # 46.0% (71.1%) high\n-- cpu               2998476011384  #     3.0%\n-- memory            42857534824055 #    42.9%\nspeculation          590816854271   #  0.6% ( 0.9%) low\n-- branch mispredict 530059406767   #     0.5%\n-- pipeline restart  60757447504    #     0.1%\nsmt-contention       35333602096762 # 35.4% ( 0.0%)\ncpu-cycles           49821898939519 # 4.31 GHz\ninstructions         38564401740635 # 0.77 IPC\ninstructions         12856131091504 # 75.229 l2 access per 1000 inst\nl2 hit from l1       817320093125   # 15.96% l2 miss\nl2 miss from l1      49304847483    #\nl2 hit from l2 pf    44749019911    #\nl3 hit from l2 pf    33478037235    #\nl3 miss from l2 pf   71608977929    #\ninstructions         12853296360811 # 34.341 float per 1000 inst\nfloat 512            183            # 0.000 AVX-512 per 1000 inst\nfloat 256            334329         # 0.000 AVX-256 per 1000 inst\nfloat 128            441397062278   # 34.341 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         7              # 0.000 scalar per 1000 inst\ninstructions         38562362441417 #\nopcache              5403832937953  # 140.132 opcache per 1000 inst\nopcache miss         214599945034   #  4.0% opcache miss rate\nl1 dTLB miss         245187730687   # 6.358 L1 dTLB per 1000 inst\nl2 dTLB miss         7784051070     # 0.202 L2 dTLB per 1000 inst\ninstructions         38561867880756 #\nicache               295547014102   # 7.664 icache per 1000 inst\nicache miss          61133657855    # 20.7% icache miss rate\nl1 iTLB miss         61677389585    # 1.599 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            87745          # 0.000 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p> Process overview shows computation primarily in cpuxaln_r_base<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>581 processes\n\t 48 cpuxalan_r_base      11171.98    53.53\n\t 69 specperl                10.35     3.47\n\t  1 clang++                  0.01     0.00\n\t  1 lsb_release              0.01     0.00\n\t 11 ps                       0.00     0.01\n\t173 sh                       0.00     0.00\n\t 54 specrxp                  0.00     0.00\n\t 48 bash                     0.00     0.00\n\t 41 specinvoke               0.00     0.00\n\t 21 grep                     0.00     0.00\n\t 20 cat                      0.00     0.00\n\t 12 uniq                     0.00     0.00\n\t 11 sort                     0.00     0.00\n\t 10 expand                   0.00     0.00\n\t  6 pwd                      0.00     0.00\n\t  5 basename                 0.00     0.00\n\t  5 specmake                 0.00     0.00\n\t  5 systemctl                0.00     0.00\n\t  4 specpp                   0.00     0.00\n\t  4 uname                    0.00     0.00\n\t  3 dirname                  0.00     0.00\n\t  3 dmidecode                0.00     0.00\n\t  3 lscpu                    0.00     0.00\n\t  2 df                       0.00     0.00\n\t  2 dpkg                     0.00     0.00\n\t  2 rm                       0.00     0.00\n\t  2 runcpu                   0.00     0.00\n\t  2 specsha512sum            0.00     0.00\n\t  2 specxz                   0.00     0.00\n\t  2 who                      0.00     0.00\n\t  1 cpupower                 0.00     0.00\n\t  1 head                     0.00     0.00\n\t  1 logname                  0.00     0.00\n\t  1 ls                       0.00     0.00\n\t  1 numactl                  0.00     0.00\n\t  1 sysctl                   0.00     0.00\n\t  1 w                        0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 which                    0.00     0.00\n0 processes running\n53 maximum processes\n<\/code><\/pre>\n\n\n\n<p>specinvoke fires up separate copies on each logical core.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>    47048) specinvoke       cpu=8 start=3.51  finish=239.04\n      47050) sh               cpu=2 start=3.51  finish=235.55\n        47056) bash             cpu=0 start=3.51  finish=235.54\n          47082) cpuxalan_r_base  cpu=0 start=3.51  finish=235.48\n      47051) sh               cpu=3 start=3.51  finish=237.87\n        47058) bash             cpu=1 start=3.51  finish=237.87\n          47083) cpuxalan_r_base  cpu=1 start=3.51  finish=237.83\n      47052) sh               cpu=2 start=3.51  finish=235.00\n        47059) bash             cpu=2 start=3.51  finish=235.00\n          47080) cpuxalan_r_base  cpu=2 start=3.51  finish=234.93\n      47053) sh               cpu=0 start=3.51  finish=236.37\n        47060) bash             cpu=3 start=3.51  finish=236.37\n          47081) cpuxalan_r_base  cpu=3 start=3.51  finish=236.30\n      47054) sh               cpu=8 start=3.51  finish=236.94\n        47068) bash             cpu=4 start=3.51  finish=236.94\n          47090) cpuxalan_r_base  cpu=4 start=3.52  finish=236.88\n      47055) sh               cpu=4 start=3.51  finish=238.19\n        47066) bash             cpu=5 start=3.51  finish=238.19\n          47088) cpuxalan_r_base  cpu=5 start=3.52  finish=238.15\n      47057) sh               cpu=8 start=3.51  finish=236.78\n        47063) bash             cpu=6 start=3.51  finish=236.77\n          47086) cpuxalan_r_base  cpu=6 start=3.52  finish=236.71\n      47061) sh               cpu=0 start=3.51  finish=239.04\n        47067) bash             cpu=7 start=3.51  finish=239.04\n          47087) cpuxalan_r_base  cpu=7 start=3.52  finish=239.01\n      47062) sh               cpu=8 start=3.51  finish=235.89\n        47072) bash             cpu=8 start=3.51  finish=235.88\n          47093) cpuxalan_r_base  cpu=8 start=3.52  finish=235.83\n      47064) sh               cpu=8 start=3.51  finish=236.63\n        47071) bash             cpu=9 start=3.51  finish=236.63\n          47089) cpuxalan_r_base  cpu=9 start=3.52  finish=236.56\n      47065) sh               cpu=12 start=3.51  finish=237.21\n        47075) bash             cpu=10 start=3.51  finish=237.21\n          47091) cpuxalan_r_base  cpu=10 start=3.52  finish=237.16\n      47069) sh               cpu=15 start=3.51  finish=237.53\n        47077) bash             cpu=11 start=3.51  finish=237.53\n          47092) cpuxalan_r_base  cpu=11 start=3.52  finish=237.49\n      47070) sh               cpu=0 start=3.51  finish=236.97\n        47078) bash             cpu=12 start=3.51  finish=236.97\n          47094) cpuxalan_r_base  cpu=12 start=3.52  finish=236.92\n      47073) sh               cpu=15 start=3.51  finish=237.90\n        47079) bash             cpu=13 start=3.51  finish=237.90\n          47096) cpuxalan_r_base  cpu=13 start=3.52  finish=237.84\n      47074) sh               cpu=12 start=3.51  finish=237.30\n        47085) bash             cpu=14 start=3.51  finish=237.30\n          47097) cpuxalan_r_base  cpu=14 start=3.52  finish=237.26\n      47076) sh               cpu=0 start=3.51  finish=237.23\n        47084) bash             cpu=15 start=3.51  finish=237.23\n          47095) cpuxalan_r_base  cpu=15 start=3.52  finish=237.17\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>xalancbmk is a SPEC CPU(R) benchmark written in C++ and described here. The workload runs on all logical cores. Topdown profile shows two different workload regions with the first with higher backend stalls AMD metrics on 7840 show over 1\/4 <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/cpu2017\/523-xalancbmk_r\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":2297,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2385","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2385","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=2385"}],"version-history":[{"count":3,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2385\/revisions"}],"predecessor-version":[{"id":2457,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2385\/revisions\/2457"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2297"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=2385"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}