{"id":2362,"date":"2024-06-04T00:03:23","date_gmt":"2024-06-04T00:03:23","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=2362"},"modified":"2024-06-05T00:39:44","modified_gmt":"2024-06-05T00:39:44","slug":"554-roms_r","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/cpu2017\/554-roms_r\/","title":{"rendered":"554.roms_r"},"content":{"rendered":"\n<p>roms is a SPEC CPU(R) benchmark described <a href=\"https:\/\/spec.org\/cpu2017\/Docs\/benchmarks\/554.roms_r.html\">here<\/a> and written in Fortran. The workload runs on all logical cores.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-16.png\" alt=\"\" class=\"wp-image-2422\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-16.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-16-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-16-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows a backend bound application.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-17.png\" alt=\"\" class=\"wp-image-2424\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-17.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-17-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-17-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics confirm almost 200 L2 accesses per 1000 instructions and over 35% miss rate.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              2913.604\non_cpu               0.989          # 15.82 \/ 16 cores\nutime                45952.410\nstime                146.539\nnvcsw                60191          # 11.00%\nnivcsw               486901         # 89.00%\ninblock              0              # 0.00\/sec\nonblock              23672          # 8.12\/sec\ncpu-clock            46110694195743 # 46110.694 seconds\ntask-clock           46111603372414 # 46111.603 seconds\npage faults          27680650       # 600.297\/sec\ncontext switches     546536         # 11.852\/sec\ncpu migrations       206            # 0.004\/sec\nmajor page faults    736            # 0.016\/sec\nminor page faults    27679914       # 600.281\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             2529052541972  # 76.566 branches per 1000 inst\nbranch misses        10962054534    # 0.43% branch miss\nconditional          1893553882793  # 57.327 conditional branches per 1000 inst\nindirect             220110756070   # 6.664 indirect branches per 1000 inst\ncpu-cycles           209575823621539 # 4.45 GHz\ninstructions         33028652913423 # 0.16 IPC low\nslots                419100802566912 #\nretiring             11541439578728 #  2.8% ( 2.8%) low\n-- ucode             2996339981     #     0.0%\n-- fastpath          11538443238747 #     2.8%\nfrontend             9239383775568  #  2.2% ( 2.3%) low\n-- latency           7530373522626  #     1.8%\n-- bandwidth         1709010252942  #     0.4%\nbackend              389455334744808 # 92.9% (94.9%) high\n-- cpu               30852021068967 #     7.4%\n-- memory            358603313675841 #    85.6%\nspeculation          311627888135   #  0.1% ( 0.1%) low\n-- branch mispredict 194460337886   #     0.0%\n-- pipeline restart  117167550249   #     0.0%\nsmt-contention       8552850550540  #  2.0% ( 0.0%)\ncpu-cycles           208979424152334 # 4.46 GHz\ninstructions         33022374135409 # 0.16 IPC low\ninstructions         11010464374416 # 196.455 l2 access per 1000 inst\nl2 hit from l1       1690964709861  # 36.36% l2 miss\nl2 miss from l1      463953169216   #\nl2 hit from l2 pf    149577757193   #\nl3 hit from l2 pf    137767015603   #\nl3 miss from l2 pf   184753487636   #\ninstructions         11009639724002 # 129.670 float per 1000 inst\nfloat 512            178            # 0.000 AVX-512 per 1000 inst\nfloat 256            9876442032     # 0.897 AVX-256 per 1000 inst\nfloat 128            1417741198942  # 128.773 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         1              # 0.000 scalar per 1000 inst\ninstructions         33019001845309 #\nopcache              4407926813123  # 133.497 opcache per 1000 inst\nopcache miss         193858813401   #  4.4% opcache miss rate\nl1 dTLB miss         1096430967354  # 33.206 L1 dTLB per 1000 inst\nl2 dTLB miss         99268449419    # 3.006 L2 dTLB per 1000 inst\ninstructions         33019153737204 #\nicache               248333528124   # 7.521 icache per 1000 inst\nicache miss          53829148554    # 21.7% icache miss rate\nl1 iTLB miss         421656863      # 0.013 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            296659         # 0.000 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Process overview shows time spent in roms_r_base.mev<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>581 processes\n\t 48 roms_r_base.mev      46122.14   129.71\n\t 69 specperl                26.28     5.91\n\t  1 flang                    0.01     0.00\n\t  1 lsb_release              0.01     0.00\n\t 11 ps                       0.00     0.02\n\t173 sh                       0.00     0.00\n\t 54 specrxp                  0.00     0.00\n\t 48 bash                     0.00     0.00\n\t 41 specinvoke               0.00     0.00\n\t 21 grep                     0.00     0.00\n\t 20 cat                      0.00     0.00\n\t 12 uniq                     0.00     0.00\n\t 11 sort                     0.00     0.00\n\t 10 expand                   0.00     0.00\n\t  6 pwd                      0.00     0.00\n\t  5 basename                 0.00     0.00\n\t  5 specmake                 0.00     0.00\n\t  5 systemctl                0.00     0.00\n\t  4 specpp                   0.00     0.00\n\t  4 uname                    0.00     0.00\n\t  3 dirname                  0.00     0.00\n\t  3 dmidecode                0.00     0.00\n\t  3 lscpu                    0.00     0.00\n\t  2 df                       0.00     0.00\n\t  2 dpkg                     0.00     0.00\n\t  2 rm                       0.00     0.00\n\t  2 runcpu                   0.00     0.00\n\t  2 specsha512sum            0.00     0.00\n\t  2 specxz                   0.00     0.00\n\t  2 who                      0.00     0.00\n\t  1 cpupower                 0.00     0.00\n\t  1 head                     0.00     0.00\n\t  1 logname                  0.00     0.00\n\t  1 ls                       0.00     0.00\n\t  1 numactl                  0.00     0.00\n\t  1 sysctl                   0.00     0.00\n\t  1 w                        0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 which                    0.00     0.00\n0 processes running\n53 maximum processes\n<\/code><\/pre>\n\n\n\n<p>specinvoke launches separate copies on each logical core.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>    461157) specinvoke       cpu=8 start=3.29  finish=970.82\n      461159) sh               cpu=0 start=3.29  finish=963.96\n        461164) bash             cpu=0 start=3.29  finish=963.96\n          461188) roms_r_base.mev  cpu=0 start=3.29  finish=963.71\n      461160) sh               cpu=11 start=3.29  finish=970.82\n        461168) bash             cpu=1 start=3.29  finish=970.82\n          461194) roms_r_base.mev  cpu=1 start=3.30  finish=970.77\n      461161) sh               cpu=5 start=3.29  finish=970.29\n        461170) bash             cpu=2 start=3.29  finish=970.29\n          461191) roms_r_base.mev  cpu=2 start=3.30  finish=970.22\n      461162) sh               cpu=3 start=3.29  finish=967.90\n        461172) bash             cpu=3 start=3.29  finish=967.90\n          461199) roms_r_base.mev  cpu=3 start=3.30  finish=967.76\n      461163) sh               cpu=8 start=3.29  finish=967.84\n        461174) bash             cpu=4 start=3.29  finish=967.84\n          461196) roms_r_base.mev  cpu=4 start=3.30  finish=967.66\n      461165) sh               cpu=15 start=3.29  finish=969.88\n        461173) bash             cpu=5 start=3.29  finish=969.88\n          461195) roms_r_base.mev  cpu=5 start=3.30  finish=969.75\n      461166) sh               cpu=6 start=3.29  finish=969.56\n        461171) bash             cpu=6 start=3.29  finish=969.55\n          461193) roms_r_base.mev  cpu=6 start=3.30  finish=969.46\n      461167) sh               cpu=4 start=3.29  finish=970.25\n        461176) bash             cpu=7 start=3.29  finish=970.25\n          461197) roms_r_base.mev  cpu=7 start=3.30  finish=970.17\n      461169) sh               cpu=0 start=3.29  finish=966.43\n        461179) bash             cpu=8 start=3.29  finish=966.43\n          461198) roms_r_base.mev  cpu=8 start=3.30  finish=966.27\n      461175) sh               cpu=15 start=3.29  finish=970.28\n        461182) bash             cpu=9 start=3.29  finish=970.28\n          461200) roms_r_base.mev  cpu=9 start=3.30  finish=970.18\n      461177) sh               cpu=11 start=3.29  finish=968.03\n        461185) bash             cpu=10 start=3.29  finish=968.03\n          461201) roms_r_base.mev  cpu=10 start=3.30  finish=967.85\n      461178) sh               cpu=0 start=3.29  finish=966.25\n        461186) bash             cpu=11 start=3.29  finish=966.25\n          461203) roms_r_base.mev  cpu=11 start=3.30  finish=966.01\n      461180) sh               cpu=10 start=3.29  finish=968.99\n        461187) bash             cpu=12 start=3.29  finish=968.99\n          461202) roms_r_base.mev  cpu=12 start=3.30  finish=968.87\n      461181) sh               cpu=10 start=3.29  finish=969.91\n        461189) bash             cpu=13 start=3.29  finish=969.91\n          461204) roms_r_base.mev  cpu=13 start=3.30  finish=969.81\n      461183) sh               cpu=8 start=3.29  finish=968.30\n        461190) bash             cpu=14 start=3.30  finish=968.30\n          461206) roms_r_base.mev  cpu=14 start=3.30  finish=968.15\n      461184) sh               cpu=3 start=3.29  finish=969.02\n        461192) bash             cpu=15 start=3.30  finish=969.02\n          461205) roms_r_base.mev  cpu=15 start=3.30  finish=968.87\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>roms is a SPEC CPU(R) benchmark described here and written in Fortran. The workload runs on all logical cores. Topdown profile shows a backend bound application. AMD metrics confirm almost 200 L2 accesses per 1000 instructions and over 35% miss <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/cpu2017\/554-roms_r\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":2297,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2362","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2362","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=2362"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2362\/revisions"}],"predecessor-version":[{"id":2425,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2362\/revisions\/2425"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2297"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=2362"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}