{"id":2393,"date":"2024-06-04T12:31:07","date_gmt":"2024-06-04T12:31:07","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=2393"},"modified":"2024-06-07T01:01:11","modified_gmt":"2024-06-07T01:01:11","slug":"548-exchange2_r","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/cpu2017\/548-exchange2_r\/","title":{"rendered":"548.exchange2_r"},"content":{"rendered":"\n<p>exchange2 is a SPEC CPU(R) benchmark written in Fortran and described <a href=\"https:\/\/spec.org\/cpu2017\/Docs\/benchmarks\/548.exchange2_r.html\">here<\/a>. The workload runs on all logical cores.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-26.png\" alt=\"\" class=\"wp-image-2470\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-26.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-26-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-26-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows a higher retirement rate with some frontend stalls.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-27.png\" alt=\"\" class=\"wp-image-2471\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-27.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-27-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-27-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics on 7840 confirm a high frontend stalls.  Almost no L2 access and a higher amount of floating point for a specint benchmark. Also many conditional branches.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              557.819\non_cpu               0.985          # 15.77 \/ 16 cores\nutime                8791.937\nstime                3.393\nnvcsw                13463          # 14.26%\nnivcsw               80950          # 85.74%\ninblock              0              # 0.00\/sec\nonblock              13256          # 23.76\/sec\ncpu-clock            8795402064565  # 8795.402 seconds\ntask-clock           8795445283596  # 8795.445 seconds\npage faults          624893         # 71.047\/sec\ncontext switches     93851          # 10.670\/sec\ncpu migrations       147            # 0.017\/sec\nmajor page faults    1009           # 0.115\/sec\nminor page faults    623884         # 70.933\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             11217717609323 # 165.361 branches per 1000 inst\nbranch misses        145504068187   # 1.30% branch miss\nconditional          10697240158366 # 157.689 conditional branches per 1000 inst\nindirect             69326381734    # 1.022 indirect branches per 1000 inst\ncpu-cycles           35854583839515 # 3.99 GHz\ninstructions         67837522004252 # 1.89 IPC\nslots                71717533143000 #\nretiring             23163952966436 # 32.3% (46.4%)\n-- ucode             259468890      #     0.0%\n-- fastpath          23163693497546 #    32.3%\nfrontend             18418772344064 # 25.7% (36.9%)\n-- latency           8637724239246  #    12.0%\n-- bandwidth         9781048104818  #    13.6%\nbackend              7099767779884  #  9.9% (14.2%) low\n-- cpu               3272086047054  #     4.6%\n-- memory            3827681732830  #     5.3%\nspeculation          1281839646420  #  1.8% ( 2.6%)\n-- branch mispredict 1276995326057  #     1.8%\n-- pipeline restart  4844320363     #     0.0%\nsmt-contention       21753082387430 # 30.3% ( 0.0%)\ncpu-cycles           35884277065587 # 4.00 GHz\ninstructions         67837542981099 # 1.89 IPC\ninstructions         22616818444610 # 0.827 l2 access per 1000 inst\nl2 hit from l1       18636602936    # 0.76% l2 miss\nl2 miss from l1      110746788      #\nl2 hit from l2 pf    33961066       #\nl3 hit from l2 pf    19084265       #\nl3 miss from l2 pf   12431161       #\ninstructions         22610660127757 # 126.182 float per 1000 inst\nfloat 512            193            # 0.000 AVX-512 per 1000 inst\nfloat 256            7254           # 0.000 AVX-256 per 1000 inst\nfloat 128            2853049989985  # 126.182 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         31             # 0.000 scalar per 1000 inst\ninstructions         67837487855656 #\nopcache              13230204734504 # 195.028 opcache per 1000 inst\nopcache miss         207974013352   #  1.6% opcache miss rate\nl1 dTLB miss         580068533      # 0.009 L1 dTLB per 1000 inst\nl2 dTLB miss         26646136       # 0.000 L2 dTLB per 1000 inst\ninstructions         67837507613280 #\nicache               289194679273   # 4.263 icache per 1000 inst\nicache miss          51247209309    # 17.7% icache miss rate\nl1 iTLB miss         884245096      # 0.013 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            57707          # 0.000 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Process summary shows time spent in exchange2_r_bas<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>581 processes\n\t 48 exchange2_r_bas       8758.33     0.49\n\t 69 specperl                 7.61     1.29\n\t  1 flang                    0.01     0.00\n\t  1 lsb_release              0.01     0.00\n\t 11 ps                       0.00     0.03\n\t173 sh                       0.00     0.00\n\t 54 specrxp                  0.00     0.00\n\t 48 bash                     0.00     0.00\n\t 41 specinvoke               0.00     0.00\n\t 21 grep                     0.00     0.00\n\t 20 cat                      0.00     0.00\n\t 12 uniq                     0.00     0.00\n\t 11 sort                     0.00     0.00\n\t 10 expand                   0.00     0.00\n\t  6 pwd                      0.00     0.00\n\t  5 basename                 0.00     0.00\n\t  5 specmake                 0.00     0.00\n\t  5 systemctl                0.00     0.00\n\t  4 specpp                   0.00     0.00\n\t  4 uname                    0.00     0.00\n\t  3 dirname                  0.00     0.00\n\t  3 dmidecode                0.00     0.00\n\t  3 lscpu                    0.00     0.00\n\t  2 df                       0.00     0.00\n\t  2 dpkg                     0.00     0.00\n\t  2 rm                       0.00     0.00\n\t  2 runcpu                   0.00     0.00\n\t  2 specsha512sum            0.00     0.00\n\t  2 specxz                   0.00     0.00\n\t  2 who                      0.00     0.00\n\t  1 cpupower                 0.00     0.00\n\t  1 head                     0.00     0.00\n\t  1 logname                  0.00     0.00\n\t  1 ls                       0.00     0.00\n\t  1 numactl                  0.00     0.00\n\t  1 sysctl                   0.00     0.00\n\t  1 w                        0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 which                    0.00     0.00\n0 processes running\n53 maximum processes\n<\/code><\/pre>\n\n\n\n<p>specinvoke fires up separate copies on each logical processor<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>    63700) specinvoke       cpu=13 start=3.09  finish=186.40\n      63702) sh               cpu=10 start=3.10  finish=185.68\n        63708) bash             cpu=0 start=3.10  finish=185.68\n          63737) exchange2_r_bas  cpu=0 start=3.10  finish=185.68\n      63703) sh               cpu=1 start=3.10  finish=185.77\n        63710) bash             cpu=1 start=3.10  finish=185.77\n          63734) exchange2_r_bas  cpu=1 start=3.10  finish=185.77\n      63704) sh               cpu=2 start=3.10  finish=185.21\n        63716) bash             cpu=2 start=3.10  finish=185.21\n          63735) exchange2_r_bas  cpu=2 start=3.10  finish=185.21\n      63705) sh               cpu=3 start=3.10  finish=185.40\n        63719) bash             cpu=3 start=3.10  finish=185.40\n          63738) exchange2_r_bas  cpu=3 start=3.10  finish=185.40\n      63706) sh               cpu=6 start=3.10  finish=185.58\n        63720) bash             cpu=4 start=3.10  finish=185.58\n          63742) exchange2_r_bas  cpu=4 start=3.10  finish=185.57\n      63707) sh               cpu=6 start=3.10  finish=185.31\n        63715) bash             cpu=5 start=3.10  finish=185.31\n          63740) exchange2_r_bas  cpu=5 start=3.10  finish=185.31\n      63709) sh               cpu=12 start=3.10  finish=185.05\n        63722) bash             cpu=6 start=3.10  finish=185.05\n          63739) exchange2_r_bas  cpu=6 start=3.10  finish=185.05\n      63711) sh               cpu=2 start=3.10  finish=185.41\n        63718) bash             cpu=7 start=3.10  finish=185.41\n          63736) exchange2_r_bas  cpu=7 start=3.10  finish=185.41\n      63712) sh               cpu=12 start=3.10  finish=185.21\n        63726) bash             cpu=8 start=3.10  finish=185.21\n          63747) exchange2_r_bas  cpu=8 start=3.10  finish=185.21\n      63713) sh               cpu=15 start=3.10  finish=186.40\n        63725) bash             cpu=9 start=3.10  finish=186.40\n          63741) exchange2_r_bas  cpu=9 start=3.10  finish=186.40\n      63714) sh               cpu=13 start=3.10  finish=185.39\n        63728) bash             cpu=10 start=3.10  finish=185.39\n          63744) exchange2_r_bas  cpu=10 start=3.10  finish=185.39\n      63717) sh               cpu=10 start=3.10  finish=185.51\n        63729) bash             cpu=11 start=3.10  finish=185.51\n          63743) exchange2_r_bas  cpu=11 start=3.10  finish=185.51\n      63721) sh               cpu=12 start=3.10  finish=185.02\n        63730) bash             cpu=12 start=3.10  finish=185.02\n          63745) exchange2_r_bas  cpu=12 start=3.10  finish=185.02\n      63723) sh               cpu=8 start=3.10  finish=185.28\n        63731) bash             cpu=13 start=3.10  finish=185.28\n          63749) exchange2_r_bas  cpu=13 start=3.10  finish=185.28\n      63724) sh               cpu=13 start=3.10  finish=185.44\n        63732) bash             cpu=14 start=3.10  finish=185.44\n          63746) exchange2_r_bas  cpu=14 start=3.10  finish=185.44\n      63727) sh               cpu=15 start=3.10  finish=185.38\n        63733) bash             cpu=15 start=3.10  finish=185.38\n          63748) exchange2_r_bas  cpu=15 start=3.10  finish=185.38\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>exchange2 is a SPEC CPU(R) benchmark written in Fortran and described here. The workload runs on all logical cores. Topdown profile shows a higher retirement rate with some frontend stalls. AMD metrics on 7840 confirm a high frontend stalls. Almost <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/cpu2017\/548-exchange2_r\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":2297,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2393","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2393","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=2393"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2393\/revisions"}],"predecessor-version":[{"id":2473,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2393\/revisions\/2473"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2297"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=2393"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}