{"id":2341,"date":"2024-06-03T10:26:47","date_gmt":"2024-06-03T10:26:47","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=2341"},"modified":"2024-06-05T00:26:25","modified_gmt":"2024-06-05T00:26:25","slug":"527-cam4_r","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/cpu2017\/527-cam4_r\/","title":{"rendered":"527.cam4_r"},"content":{"rendered":"\n<p>cam4 is a SPEC CPU(R) benchmark described <a href=\"https:\/\/spec.org\/cpu2017\/Docs\/benchmarks\/527.cam4_r.html\">here<\/a> and written in Fortran and C. The workload runs on all logical cores.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-13.png\" alt=\"\" class=\"wp-image-2410\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-13.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-13-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-13-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows it is dominated by backend stalls but with a varying profile over time.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-14.png\" alt=\"\" class=\"wp-image-2411\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-14.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-14-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-14-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics confirm this as ~40% memory bound and ~20% CPU bound.  Only ~60 L2 accesses per 1000 instructions with a 20% miss rate.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              1312.508\non_cpu               0.988          # 15.81 \/ 16 cores\nutime                20528.728\nstime                225.953\nnvcsw                29409          # 12.94%\nnivcsw               197920         # 87.06%\ninblock              9840           # 7.50\/sec\nonblock              1228368        # 935.89\/sec\ncpu-clock            20757774966166 # 20757.775 seconds\ntask-clock           20757987236593 # 20757.987 seconds\npage faults          75899772       # 3656.413\/sec\ncontext switches     226656         # 10.919\/sec\ncpu migrations       203            # 0.010\/sec\nmajor page faults    1395           # 0.067\/sec\nminor page faults    75898377       # 3656.346\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             9015331689309  # 124.383 branches per 1000 inst\nbranch misses        103625466297   # 1.15% branch miss\nconditional          6474016811013  # 89.321 conditional branches per 1000 inst\nindirect             635818303619   # 8.772 indirect branches per 1000 inst\ncpu-cycles           85797916011836 # 4.07 GHz\ninstructions         72472521907425 # 0.84 IPC\nslots                171567233970366 #\nretiring             24958749784066 # 14.5% (16.4%)\n-- ucode             5475009509     #     0.0%\n-- fastpath          24953274774557 #    14.5%\nfrontend             24472498750726 # 14.3% (16.0%)\n-- latency           15983869130694 #     9.3%\n-- bandwidth         8488629620032  #     4.9%\nbackend              102011177142788 # 59.5% (66.9%)\n-- cpu               33453535278844 #    19.5%\n-- memory            68557641863944 #    40.0%\nspeculation          1135201453063  #  0.7% ( 0.7%) low\n-- branch mispredict 1086579188867  #     0.6%\n-- pipeline restart  48622264196    #     0.0%\nsmt-contention       18989527666963 # 11.1% ( 0.0%)\ncpu-cycles           86324947438235 # 4.07 GHz\ninstructions         72455714789886 # 0.84 IPC\ninstructions         24154762289200 # 62.695 l2 access per 1000 inst\nl2 hit from l1       1211423015562  # 20.78% l2 miss\nl2 miss from l1      176222621478   #\nl2 hit from l2 pf    164446062366   #\nl3 hit from l2 pf    60313265001    #\nl3 miss from l2 pf   78199571898    #\ninstructions         24154893211351 # 189.588 float per 1000 inst\nfloat 512            299            # 0.000 AVX-512 per 1000 inst\nfloat 256            23880669948    # 0.989 AVX-256 per 1000 inst\nfloat 128            4555597500473  # 188.599 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\ninstructions         72455795340647 #\nopcache              11296118454852 # 155.904 opcache per 1000 inst\nopcache miss         898107672809   #  8.0% opcache miss rate\nl1 dTLB miss         224971601225   # 3.105 L1 dTLB per 1000 inst\nl2 dTLB miss         9665338857     # 0.133 L2 dTLB per 1000 inst\ninstructions         72441840668917 #\nicache               1522351345670  # 21.015 icache per 1000 inst\nicache miss          394640134744   # 25.9% icache miss rate\nl1 iTLB miss         13022382440    # 0.180 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            260298         # 0.000 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Process overview shows time spent in cam4_r_base.mev<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>693 processes\n\t 48 cam4_r_base.mev      20627.37   219.18\n\t 71 specperl                13.77     2.95\n\t 48 cam4_validate_5          1.01     0.41\n\t  2 clang                    0.01     0.01\n\t  2 flang                    0.01     0.01\n\t  1 lsb_release              0.01     0.00\n\t 11 ps                       0.00     0.01\n\t226 sh                       0.00     0.00\n\t 54 specrxp                  0.00     0.00\n\t 48 bash                     0.00     0.00\n\t 41 specinvoke               0.00     0.00\n\t 22 cat                      0.00     0.00\n\t 21 grep                     0.00     0.00\n\t 12 uniq                     0.00     0.00\n\t 11 sort                     0.00     0.00\n\t 10 expand                   0.00     0.00\n\t  7 specmake                 0.00     0.00\n\t  6 pwd                      0.00     0.00\n\t  5 basename                 0.00     0.00\n\t  5 systemctl                0.00     0.00\n\t  4 rm                       0.00     0.00\n\t  4 specpp                   0.00     0.00\n\t  4 uname                    0.00     0.00\n\t  3 dirname                  0.00     0.00\n\t  3 dmidecode                0.00     0.00\n\t  3 lscpu                    0.00     0.00\n\t  2 df                       0.00     0.00\n\t  2 dpkg                     0.00     0.00\n\t  2 runcpu                   0.00     0.00\n\t  2 specsha512sum            0.00     0.00\n\t  2 specxz                   0.00     0.00\n\t  2 who                      0.00     0.00\n\t  1 cpupower                 0.00     0.00\n\t  1 head                     0.00     0.00\n\t  1 logname                  0.00     0.00\n\t  1 ls                       0.00     0.00\n\t  1 numactl                  0.00     0.00\n\t  1 sysctl                   0.00     0.00\n\t  1 w                        0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 which                    0.00     0.00\n0 processes running\n53 maximum processes\n<\/code><\/pre>\n\n\n\n<p>specinvoke runs separate copies on each core.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>440422) specinvoke       cpu=14 start=4.62  finish=438.07\n  440424) sh               cpu=13 start=4.62  finish=435.22\n    440430) bash             cpu=0 start=4.63  finish=435.22\n      440455) cam4_r_base.mev  cpu=0 start=4.63  finish=435.11\n  440425) sh               cpu=8 start=4.63  finish=434.75\n    440431) bash             cpu=1 start=4.63  finish=434.75\n      440456) cam4_r_base.mev  cpu=1 start=4.63  finish=434.62\n  440426) sh               cpu=10 start=4.63  finish=436.92\n    440433) bash             cpu=2 start=4.63  finish=436.91\n      440457) cam4_r_base.mev  cpu=2 start=4.63  finish=436.84\n  440427) sh               cpu=9 start=4.63  finish=437.71\n    440437) bash             cpu=3 start=4.63  finish=437.71\n      440458) cam4_r_base.mev  cpu=3 start=4.63  finish=437.64\n  440428) sh               cpu=4 start=4.63  finish=438.07\n    440438) bash             cpu=4 start=4.63  finish=438.07\n      440462) cam4_r_base.mev  cpu=4 start=4.63  finish=437.99\n  440429) sh               cpu=13 start=4.63  finish=436.44\n    440440) bash             cpu=5 start=4.63  finish=436.44\n      440459) cam4_r_base.mev  cpu=5 start=4.63  finish=436.37\n  440432) sh               cpu=14 start=4.63  finish=435.07\n    440439) bash             cpu=6 start=4.63  finish=435.07\n      440460) cam4_r_base.mev  cpu=6 start=4.63  finish=434.98\n  440434) sh               cpu=7 start=4.63  finish=437.53\n    440443) bash             cpu=7 start=4.63  finish=437.53\n      440464) cam4_r_base.mev  cpu=7 start=4.63  finish=437.43\n  440435) sh               cpu=8 start=4.63  finish=432.12\n    440446) bash             cpu=8 start=4.63  finish=432.12\n      440463) cam4_r_base.mev  cpu=8 start=4.63  finish=431.97\n  440436) sh               cpu=1 start=4.63  finish=435.17\n    440448) bash             cpu=9 start=4.63  finish=435.17\n      440466) cam4_r_base.mev  cpu=9 start=4.63  finish=435.07\n  440441) sh               cpu=9 start=4.63  finish=436.55\n    440450) bash             cpu=10 start=4.63  finish=436.55\n      440465) cam4_r_base.mev  cpu=10 start=4.63  finish=436.46\n  440442) sh               cpu=5 start=4.63  finish=437.18\n    440451) bash             cpu=11 start=4.63  finish=437.18\n      440469) cam4_r_base.mev  cpu=11 start=4.63  finish=437.08\n  440444) sh               cpu=12 start=4.63  finish=438.04\n    440452) bash             cpu=12 start=4.63  finish=438.04\n      440468) cam4_r_base.mev  cpu=12 start=4.63  finish=437.94\n  440445) sh               cpu=14 start=4.63  finish=435.20\n    440453) bash             cpu=13 start=4.63  finish=435.20\n      440467) cam4_r_base.mev  cpu=13 start=4.63  finish=435.07\n  440447) sh               cpu=8 start=4.63  finish=432.74\n    440454) bash             cpu=14 start=4.63  finish=432.74\n      440470) cam4_r_base.mev  cpu=14 start=4.63  finish=432.62\n  440449) sh               cpu=15 start=4.63  finish=437.93\n    440461) bash             cpu=15 start=4.63  finish=437.92\n      440471) cam4_r_base.mev  cpu=15 start=4.63  finish=437.86<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>cam4 is a SPEC CPU(R) benchmark described here and written in Fortran and C. The workload runs on all logical cores. Topdown profile shows it is dominated by backend stalls but with a varying profile over time. AMD metrics confirm <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/cpu2017\/527-cam4_r\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":2297,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2341","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2341","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=2341"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2341\/revisions"}],"predecessor-version":[{"id":2413,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2341\/revisions\/2413"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2297"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=2341"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}