{"id":2395,"date":"2024-06-04T12:32:24","date_gmt":"2024-06-04T12:32:24","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=2395"},"modified":"2024-06-07T01:06:00","modified_gmt":"2024-06-07T01:06:00","slug":"557-xz_r","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/cpu2017\/557-xz_r\/","title":{"rendered":"557.xz_r"},"content":{"rendered":"\n<p>xz is a SPEC CPU(R) benchmark written in C and described <a href=\"https:\/\/spec.org\/cpu2017\/Docs\/benchmarks\/557.xz_r.html\">here<\/a>. The workload runs on all logical cores.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-27.png\" alt=\"\" class=\"wp-image-2474\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-27.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-27-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-27-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows a mixture of patterns, separate invocations?  Several are dominated by backend stalls, others have more moderate retirement rate.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-28.png\" alt=\"\" class=\"wp-image-2475\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-28.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-28-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-28-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics on 7840 confirm a 40% time spent in backend stalls.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              1413.262\non_cpu               0.985          # 15.76 \/ 16 cores\nutime                22190.676\nstime                80.712\nnvcsw                32429          # 15.28%\nnivcsw               179827         # 84.72%\ninblock              0              # 0.00\/sec\nonblock              16224          # 11.48\/sec\ncpu-clock            22272829155088 # 22272.829 seconds\ntask-clock           22272971535008 # 22272.972 seconds\npage faults          30366431       # 1363.376\/sec\ncontext switches     211245         # 9.484\/sec\ncpu migrations       308            # 0.014\/sec\nmajor page faults    1859           # 0.083\/sec\nminor page faults    30364572       # 1363.292\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             8835980080090  # 115.607 branches per 1000 inst\nbranch misses        400050420474   # 4.53% branch miss\nconditional          8020681407191  # 104.940 conditional branches per 1000 inst\nindirect             102437988137   # 1.340 indirect branches per 1000 inst\ncpu-cycles           99542805444678 # 4.41 GHz\ninstructions         76459027618205 # 0.77 IPC\nslots                199034406547422 #\nretiring             25007933242215 # 12.6% (19.4%)\n-- ucode             9082417521     #     0.0%\n-- fastpath          24998850824694 #    12.6%\nfrontend             12250748557591 #  6.2% ( 9.5%)\n-- latency           8927773104210  #     4.5%\n-- bandwidth         3322975453381  #     1.7%\nbackend              83594205106690 # 42.0% (64.8%)\n-- cpu               4410158936378  #     2.2%\n-- memory            79184046170312 #    39.8%\nspeculation          8145268353751  #  4.1% ( 6.3%)\n-- branch mispredict 8111088189146  #     4.1%\n-- pipeline restart  34180164605    #     0.0%\nsmt-contention       70036180616910 # 35.2% ( 0.0%)\ncpu-cycles           99654865202476 # 4.38 GHz\ninstructions         76432260626044 # 0.77 IPC\ninstructions         25476490458593 # 23.253 l2 access per 1000 inst\nl2 hit from l1       463932078609   # 30.11% l2 miss\nl2 miss from l1      105562068767   #\nl2 hit from l2 pf    55657371236    #\nl3 hit from l2 pf    20795496614    #\nl3 miss from l2 pf   52023516791    #\ninstructions         25468766780522 # 21.356 float per 1000 inst\nfloat 512            494            # 0.000 AVX-512 per 1000 inst\nfloat 256            7138           # 0.000 AVX-256 per 1000 inst\nfloat 128            543923656222   # 21.356 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         62             # 0.000 scalar per 1000 inst\ninstructions         76418964558481 #\nopcache              11659264508468 # 152.570 opcache per 1000 inst\nopcache miss         120217644516   #  1.0% opcache miss rate\nl1 dTLB miss         605368087508   # 7.922 L1 dTLB per 1000 inst\nl2 dTLB miss         103705778625   # 1.357 L2 dTLB per 1000 inst\ninstructions         76419112519088 #\nicache               208931580918   # 2.734 icache per 1000 inst\nicache miss          37140932573    # 17.8% icache miss rate\nl1 iTLB miss         335027956      # 0.004 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            127274         # 0.000 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Process overview shows three copies per benchmark, corresponding to profiles above.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>1059 processes\n\t143 xz_r_base.mev-a      21915.42    63.60\n\t164 specperl                20.24     3.10\n\t  1 clang                    0.01     0.00\n\t 41 specinvoke               0.00     0.04\n\t 11 ps                       0.00     0.02\n\t362 sh                       0.00     0.00\n\t142 bash                     0.00     0.00\n\t 54 specrxp                  0.00     0.00\n\t 21 grep                     0.00     0.00\n\t 20 cat                      0.00     0.00\n\t 12 uniq                     0.00     0.00\n\t 11 sort                     0.00     0.00\n\t 10 expand                   0.00     0.00\n\t  6 pwd                      0.00     0.00\n\t  5 basename                 0.00     0.00\n\t  5 specmake                 0.00     0.00\n\t  5 systemctl                0.00     0.00\n\t  4 specpp                   0.00     0.00\n\t  4 uname                    0.00     0.00\n\t  3 dirname                  0.00     0.00\n\t  3 dmidecode                0.00     0.00\n\t  3 lscpu                    0.00     0.00\n\t  2 df                       0.00     0.00\n\t  2 dpkg                     0.00     0.00\n\t  2 rm                       0.00     0.00\n\t  2 runcpu                   0.00     0.00\n\t  2 specsha512sum            0.00     0.00\n\t  2 specxz                   0.00     0.00\n\t  2 who                      0.00     0.00\n\t  1 cpupower                 0.00     0.00\n\t  1 head                     0.00     0.00\n\t  1 logname                  0.00     0.00\n\t  1 ls                       0.00     0.00\n\t  1 lsb_release              0.00     0.00\n\t  1 numactl                  0.00     0.00\n\t  1 sysctl                   0.00     0.00\n\t  1 w                        0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 which                    0.00     0.00\n2 processes running\n54 maximum processes\n<\/code><\/pre>\n\n\n\n<p>specinvoke fires off separate copies on each logical core<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>    72891) specinvoke       cpu=2 start=3.38  finish=467.09\n      72893) sh               cpu=0 start=3.38  finish=197.18\n        72899) bash             cpu=0 start=3.38  finish=197.18\n          72924) xz_r_base.mev-a  cpu=0 start=3.38  finish=197.11\n      72894) sh               cpu=1 start=3.38  finish=198.36\n        72901) bash             cpu=1 start=3.38  finish=198.36\n          72925) xz_r_base.mev-a  cpu=1 start=3.38  finish=198.29\n      72895) sh               cpu=2 start=3.38  finish=197.79\n        72906) bash             cpu=2 start=3.38  finish=197.79\n          72929) xz_r_base.mev-a  cpu=2 start=3.38  finish=197.72\n      72896) sh               cpu=3 start=3.38  finish=199.71\n        72902) bash             cpu=3 start=3.38  finish=199.71\n          72923) xz_r_base.mev-a  cpu=3 start=3.38  finish=199.64\n      72897) sh               cpu=4 start=3.38  finish=200.78\n        72909) ?? cpu=0 start=3.38  finish=0.00 \n          72931) xz_r_base.mev-a  cpu=4 start=3.38  finish=200.71\n      72898) sh               cpu=5 start=3.38  finish=196.56\n        72904) bash             cpu=5 start=3.38  finish=196.56\n          72928) xz_r_base.mev-a  cpu=5 start=3.38  finish=196.49\n      72900) sh               cpu=6 start=3.38  finish=200.04\n        72907) bash             cpu=6 start=3.38  finish=200.04\n          72930) xz_r_base.mev-a  cpu=6 start=3.38  finish=199.96\n      72903) sh               cpu=7 start=3.38  finish=199.53\n        72911) bash             cpu=7 start=3.38  finish=199.53\n          72932) xz_r_base.mev-a  cpu=7 start=3.38  finish=199.46\n      72905) sh               cpu=8 start=3.38  finish=197.07\n        72914) bash             cpu=8 start=3.38  finish=197.07\n          72933) xz_r_base.mev-a  cpu=8 start=3.38  finish=197.00\n      72908) sh               cpu=9 start=3.38  finish=198.40\n        72917) bash             cpu=9 start=3.38  finish=198.40\n          72934) xz_r_base.mev-a  cpu=9 start=3.38  finish=198.34\n      72910) sh               cpu=10 start=3.38  finish=197.73\n        72919) bash             cpu=10 start=3.38  finish=197.73\n          72938) xz_r_base.mev-a  cpu=10 start=3.38  finish=197.67\n      72912) sh               cpu=11 start=3.38  finish=201.10\n        72920) bash             cpu=11 start=3.38  finish=201.10\n          72937) xz_r_base.mev-a  cpu=11 start=3.38  finish=201.03\n      72913) sh               cpu=12 start=3.38  finish=200.53\n        72921) bash             cpu=12 start=3.38  finish=200.53\n          72935) xz_r_base.mev-a  cpu=12 start=3.38  finish=200.46\n      72915) sh               cpu=13 start=3.38  finish=196.95\n        72922) bash             cpu=13 start=3.38  finish=196.95\n          72936) xz_r_base.mev-a  cpu=13 start=3.38  finish=196.88\n      72916) sh               cpu=14 start=3.38  finish=201.27\n        72926) bash             cpu=14 start=3.38  finish=201.27\n          72939) xz_r_base.mev-a  cpu=14 start=3.38  finish=201.20\n      72918) sh               cpu=15 start=3.38  finish=199.85\n        72927) bash             cpu=15 start=3.38  finish=199.85\n          72940) xz_r_base.mev-a  cpu=15 start=3.38  finish=199.78\n      72942) sh               cpu=5 start=196.56 finish=330.05\n        72943) bash             cpu=5 start=196.56 finish=330.05\n          72944) xz_r_base.mev-a  cpu=5 start=196.56 finish=329.98\n      72945) sh               cpu=13 start=196.95 finish=327.88\n        72946) bash             cpu=13 start=196.96 finish=327.88\n          72947) xz_r_base.mev-a  cpu=13 start=196.96 finish=327.81\n      72948) sh               cpu=8 start=197.07 finish=329.51\n        72949) bash             cpu=8 start=197.07 finish=329.51\n          72950) xz_r_base.mev-a  cpu=8 start=197.07 finish=329.44\n      72951) sh               cpu=0 start=197.18 finish=329.49\n        72952) bash             cpu=0 start=197.18 finish=329.49\n          72953) xz_r_base.mev-a  cpu=0 start=197.18 finish=329.42\n      72954) sh               cpu=10 start=197.74 finish=324.20\n        72955) bash             cpu=10 start=197.74 finish=324.20\n          72956) xz_r_base.mev-a  cpu=10 start=197.74 finish=324.13\n      72957) sh               cpu=2 start=197.79 finish=328.08\n        72958) bash             cpu=2 start=197.79 finish=328.08\n          72959) xz_r_base.mev-a  cpu=2 start=197.79 finish=328.02\n      72960) sh               cpu=1 start=198.36 finish=328.48\n        72961) bash             cpu=1 start=198.36 finish=328.48\n          72962) xz_r_base.mev-a  cpu=1 start=198.36 finish=328.42\n      72963) sh               cpu=9 start=198.40 finish=327.91\n        72964) bash             cpu=9 start=198.40 finish=327.91\n          72965) xz_r_base.mev-a  cpu=9 start=198.41 finish=327.84\n      72966) sh               cpu=7 start=199.53 finish=330.72\n        72967) bash             cpu=7 start=199.53 finish=330.72\n          72968) xz_r_base.mev-a  cpu=7 start=199.54 finish=330.64\n      72969) sh               cpu=3 start=199.71 finish=331.64\n        72970) bash             cpu=3 start=199.71 finish=331.64\n          72971) xz_r_base.mev-a  cpu=3 start=199.72 finish=331.57\n      72972) sh               cpu=15 start=199.85 finish=330.73\n        72973) bash             cpu=15 start=199.85 finish=330.73\n          72974) xz_r_base.mev-a  cpu=15 start=199.85 finish=330.67\n      72975) sh               cpu=6 start=200.04 finish=329.46\n        72976) bash             cpu=6 start=200.04 finish=329.46\n          72977) xz_r_base.mev-a  cpu=6 start=200.04 finish=329.39\n      72978) sh               cpu=12 start=200.53 finish=331.00\n        72979) bash             cpu=12 start=200.53 finish=331.00\n          72980) xz_r_base.mev-a  cpu=12 start=200.54 finish=330.93\n      72981) sh               cpu=4 start=200.78 finish=331.31\n        72982) bash             cpu=4 start=200.78 finish=331.31\n          72983) xz_r_base.mev-a  cpu=4 start=200.78 finish=331.25\n      72984) sh               cpu=11 start=201.10 finish=332.58\n        72985) bash             cpu=11 start=201.10 finish=332.58\n          72986) xz_r_base.mev-a  cpu=11 start=201.10 finish=332.51\n      72987) sh               cpu=14 start=201.27 finish=327.62\n        72988) bash             cpu=14 start=201.27 finish=327.62\n          72989) xz_r_base.mev-a  cpu=14 start=201.28 finish=327.56\n      72990) sh               cpu=10 start=324.20 finish=458.38\n        72991) bash             cpu=10 start=324.20 finish=458.38\n          72992) xz_r_base.mev-a  cpu=10 start=324.20 finish=458.32\n      72993) sh               cpu=14 start=327.62 finish=463.25\n        72994) bash             cpu=14 start=327.62 finish=463.25\n          72995) xz_r_base.mev-a  cpu=14 start=327.62 finish=463.18\n      72996) sh               cpu=13 start=327.88 finish=464.83\n        72997) bash             cpu=13 start=327.88 finish=464.83\n          72998) xz_r_base.mev-a  cpu=13 start=327.88 finish=464.76\n      72999) sh               cpu=9 start=327.91 finish=462.41\n        73000) bash             cpu=9 start=327.91 finish=462.41\n          73001) xz_r_base.mev-a  cpu=9 start=327.92 finish=462.34\n      73002) sh               cpu=2 start=328.08 finish=461.42\n        73003) bash             cpu=2 start=328.08 finish=461.42\n          73004) xz_r_base.mev-a  cpu=2 start=328.09 finish=461.37\n      73005) sh               cpu=1 start=328.48 finish=462.96\n        73006) bash             cpu=1 start=328.48 finish=462.96\n          73007) xz_r_base.mev-a  cpu=1 start=328.49 finish=462.90\n      73008) sh               cpu=6 start=329.46 finish=464.42\n        73009) bash             cpu=6 start=329.46 finish=464.42\n          73010) xz_r_base.mev-a  cpu=6 start=329.46 finish=464.37\n      73011) sh               cpu=0 start=329.49 finish=463.63\n        73012) bash             cpu=0 start=329.49 finish=463.63\n          73013) xz_r_base.mev-a  cpu=0 start=329.49 finish=463.56\n      73014) sh               cpu=8 start=329.51 finish=464.08\n        73015) bash             cpu=8 start=329.51 finish=464.08\n          73016) xz_r_base.mev-a  cpu=8 start=329.51 finish=464.03\n      73017) sh               cpu=5 start=330.05 finish=466.11\n        73018) bash             cpu=5 start=330.05 finish=466.11\n          73019) xz_r_base.mev-a  cpu=5 start=330.05 finish=466.06\n      73020) sh               cpu=7 start=330.72 finish=466.35\n        73021) bash             cpu=7 start=330.72 finish=466.35\n          73022) xz_r_base.mev-a  cpu=7 start=330.72 finish=466.29\n      73023) sh               cpu=15 start=330.73 finish=466.33\n        73024) bash             cpu=15 start=330.73 finish=466.33\n          73025) xz_r_base.mev-a  cpu=15 start=330.74 finish=466.26\n      73026) sh               cpu=12 start=331.00 finish=465.28\n        73027) bash             cpu=12 start=331.00 finish=465.28\n          73028) xz_r_base.mev-a  cpu=12 start=331.00 finish=465.23\n      73029) sh               cpu=4 start=331.31 finish=465.58\n        73030) bash             cpu=4 start=331.32 finish=465.58\n          73031) xz_r_base.mev-a  cpu=4 start=331.32 finish=465.54\n      73032) sh               cpu=3 start=331.64 finish=466.62\n        73033) bash             cpu=3 start=331.64 finish=466.61\n          73034) xz_r_base.mev-a  cpu=3 start=331.65 finish=466.56\n      73035) sh               cpu=11 start=332.58 finish=467.09\n        73036) bash             cpu=11 start=332.58 finish=467.09\n          73037) xz_r_base.mev-a  cpu=11 start=332.58 finish=467.05\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>xz is a SPEC CPU(R) benchmark written in C and described here. The workload runs on all logical cores. Topdown profile shows a mixture of patterns, separate invocations? Several are dominated by backend stalls, others have more moderate retirement rate. <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/cpu2017\/557-xz_r\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":2297,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2395","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2395","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=2395"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2395\/revisions"}],"predecessor-version":[{"id":2477,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2395\/revisions\/2477"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2297"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=2395"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}