{"id":2356,"date":"2024-06-03T23:59:11","date_gmt":"2024-06-03T23:59:11","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=2356"},"modified":"2024-06-05T00:35:56","modified_gmt":"2024-06-05T00:35:56","slug":"544-nab_r","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/cpu2017\/544-nab_r\/","title":{"rendered":"544.nab_r"},"content":{"rendered":"\n<p>nab is a SPEC CPU(R) benchmark described <a href=\"https:\/\/spec.org\/cpu2017\/Docs\/benchmarks\/544.nab_r.html\">here <\/a>and written in C. The workload runs on all logical cores.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-15.png\" alt=\"\" class=\"wp-image-2418\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-15.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-15-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-15-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows medium level backend-bound stalls with a ~35% retirement rate.  There is period at end with different characteristic.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-16.png\" alt=\"\" class=\"wp-image-2419\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-16.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-16-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-16-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics on 7840 processor show more CPU stalls than memory stalls. Overall memory is only 50 L2 access per 1000 instructions and ~5% miss rate.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              581.066\non_cpu               0.970          # 15.53 \/ 16 cores\nutime                9013.279\nstime                8.575\nnvcsw                13853          # 13.67%\nnivcsw               87522          # 86.33%\ninblock              0              # 0.00\/sec\nonblock              13480          # 23.20\/sec\ncpu-clock            9022371301663  # 9022.371 seconds\ntask-clock           9022438159718  # 9022.438 seconds\npage faults          2719530        # 301.419\/sec\ncontext switches     100813         # 11.174\/sec\ncpu migrations       151            # 0.017\/sec\nmajor page faults    761            # 0.084\/sec\nminor page faults    2718769        # 301.334\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             3705536285749  # 83.289 branches per 1000 inst\nbranch misses        48320502361    # 1.30% branch miss\nconditional          3211079075725  # 72.175 conditional branches per 1000 inst\nindirect             78933292050    # 1.774 indirect branches per 1000 inst\ncpu-cycles           35173554892555 # 3.77 GHz\ninstructions         44502941930878 # 1.27 IPC\nslots                70335260112720 #\nretiring             17957481268465 # 25.5% (36.9%)\n-- ucode             49152961542    #     0.1%\n-- fastpath          17908328306923 #    25.5%\nfrontend             4243956072338  #  6.0% ( 8.7%)\n-- latency           3490353506760  #     5.0%\n-- bandwidth         753602565578   #     1.1%\nbackend              25363097670219 # 36.1% (52.1%)\n-- cpu               16028170538565 #    22.8%\n-- memory            9334927131654  #    13.3%\nspeculation          1119054942785  #  1.6% ( 2.3%)\n-- branch mispredict 1084377989807  #     1.5%\n-- pipeline restart  34676952978    #     0.0%\nsmt-contention       21651631104051 # 30.8% ( 0.0%)\ncpu-cycles           35197471408100 # 3.77 GHz\ninstructions         44501705968789 # 1.26 IPC\ninstructions         14829986452623 # 52.578 l2 access per 1000 inst\nl2 hit from l1       563992892841   # 4.94% l2 miss\nl2 miss from l1      11308778556    #\nl2 hit from l2 pf    188511013533   #\nl3 hit from l2 pf    10720306269    #\nl3 miss from l2 pf   16503721139    #\ninstructions         14826609038424 # 318.530 float per 1000 inst\nfloat 512            249            # 0.000 AVX-512 per 1000 inst\nfloat 256            24571061449    # 1.657 AVX-256 per 1000 inst\nfloat 128            4698147760247  # 316.873 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\ninstructions         44489194256996 #\nopcache              4756323710148  # 106.910 opcache per 1000 inst\nopcache miss         66379189344    #  1.4% opcache miss rate\nl1 dTLB miss         41837658678    # 0.940 L1 dTLB per 1000 inst\nl2 dTLB miss         1090929870     # 0.025 L2 dTLB per 1000 inst\ninstructions         44489204211481 #\nicache               80661600392    # 1.813 icache per 1000 inst\nicache miss          12168515583    # 15.1% icache miss rate\nl1 iTLB miss         173686412      # 0.004 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            182039         # 0.000 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Process overview shows time spent in nab_r_base.mev-<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>581 processes\n\t 48 nab_r_base.mev-       9011.77     4.93\n\t 69 specperl                 8.59     1.55\n\t  1 clang                    0.01     0.00\n\t 11 ps                       0.00     0.02\n\t  1 lsb_release              0.00     0.01\n\t173 sh                       0.00     0.00\n\t 54 specrxp                  0.00     0.00\n\t 48 bash                     0.00     0.00\n\t 41 specinvoke               0.00     0.00\n\t 21 grep                     0.00     0.00\n\t 20 cat                      0.00     0.00\n\t 12 uniq                     0.00     0.00\n\t 11 sort                     0.00     0.00\n\t 10 expand                   0.00     0.00\n\t  6 pwd                      0.00     0.00\n\t  5 basename                 0.00     0.00\n\t  5 specmake                 0.00     0.00\n\t  5 systemctl                0.00     0.00\n\t  4 specpp                   0.00     0.00\n\t  4 uname                    0.00     0.00\n\t  3 dirname                  0.00     0.00\n\t  3 dmidecode                0.00     0.00\n\t  3 lscpu                    0.00     0.00\n\t  2 df                       0.00     0.00\n\t  2 dpkg                     0.00     0.00\n\t  2 rm                       0.00     0.00\n\t  2 runcpu                   0.00     0.00\n\t  2 specsha512sum            0.00     0.00\n\t  2 specxz                   0.00     0.00\n\t  2 who                      0.00     0.00\n\t  1 cpupower                 0.00     0.00\n\t  1 head                     0.00     0.00\n\t  1 logname                  0.00     0.00\n\t  1 ls                       0.00     0.00\n\t  1 numactl                  0.00     0.00\n\t  1 sysctl                   0.00     0.00\n\t  1 w                        0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 which                    0.00     0.00\n0 processes running\n53 maximum processes\n<\/code><\/pre>\n\n\n\n<p>specinvoke starts separate processes on each logical core.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>    452134) specinvoke       cpu=1 start=3.26  finish=195.57\n      452136) sh               cpu=0 start=3.26  finish=191.47\n        452143) bash             cpu=0 start=3.26  finish=191.47\n          452167) nab_r_base.mev-  cpu=0 start=3.26  finish=191.46\n      452137) sh               cpu=10 start=3.26  finish=191.13\n        452144) bash             cpu=1 start=3.26  finish=191.13\n          452165) nab_r_base.mev-  cpu=1 start=3.26  finish=191.12\n      452138) sh               cpu=10 start=3.26  finish=191.07\n        452147) bash             cpu=2 start=3.26  finish=191.07\n          452170) nab_r_base.mev-  cpu=2 start=3.27  finish=191.06\n      452139) sh               cpu=9 start=3.26  finish=191.56\n        452150) bash             cpu=3 start=3.26  finish=191.56\n          452174) nab_r_base.mev-  cpu=3 start=3.27  finish=191.54\n      452140) sh               cpu=9 start=3.26  finish=195.48\n        452154) bash             cpu=4 start=3.26  finish=195.48\n          452173) nab_r_base.mev-  cpu=4 start=3.27  finish=195.47\n      452141) sh               cpu=5 start=3.26  finish=189.13\n        452152) bash             cpu=5 start=3.26  finish=189.13\n          452172) nab_r_base.mev-  cpu=5 start=3.27  finish=189.11\n      452142) sh               cpu=7 start=3.26  finish=191.44\n        452149) bash             cpu=6 start=3.26  finish=191.44\n          452171) nab_r_base.mev-  cpu=6 start=3.27  finish=191.43\n      452145) sh               cpu=11 start=3.26  finish=191.33\n        452156) bash             cpu=7 start=3.26  finish=191.33\n          452175) nab_r_base.mev-  cpu=7 start=3.27  finish=191.31\n      452146) sh               cpu=2 start=3.26  finish=191.26\n        452158) bash             cpu=8 start=3.26  finish=191.26\n          452177) nab_r_base.mev-  cpu=8 start=3.27  finish=191.25\n      452148) sh               cpu=10 start=3.26  finish=191.28\n        452160) bash             cpu=9 start=3.26  finish=191.28\n          452179) nab_r_base.mev-  cpu=9 start=3.27  finish=191.27\n      452151) sh               cpu=5 start=3.26  finish=190.86\n        452162) bash             cpu=10 start=3.26  finish=190.86\n          452178) nab_r_base.mev-  cpu=10 start=3.27  finish=190.84\n      452153) sh               cpu=8 start=3.26  finish=191.28\n        452163) bash             cpu=11 start=3.26  finish=191.28\n          452176) nab_r_base.mev-  cpu=11 start=3.27  finish=191.26\n      452155) sh               cpu=11 start=3.26  finish=195.57\n        452164) bash             cpu=12 start=3.26  finish=195.57\n          452180) nab_r_base.mev-  cpu=12 start=3.27  finish=195.56\n      452157) sh               cpu=5 start=3.26  finish=189.76\n        452166) bash             cpu=13 start=3.26  finish=189.76\n          452181) nab_r_base.mev-  cpu=13 start=3.27  finish=189.75\n      452159) sh               cpu=13 start=3.26  finish=191.16\n        452168) bash             cpu=14 start=3.26  finish=191.16\n          452182) nab_r_base.mev-  cpu=14 start=3.27  finish=191.14\n      452161) sh               cpu=11 start=3.26  finish=191.66\n        452169) bash             cpu=15 start=3.27  finish=191.66\n          452183) nab_r_base.mev-  cpu=15 start=3.27  finish=191.65\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>nab is a SPEC CPU(R) benchmark described here and written in C. The workload runs on all logical cores. Topdown profile shows medium level backend-bound stalls with a ~35% retirement rate. There is period at end with different characteristic. AMD <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/cpu2017\/544-nab_r\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":2297,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2356","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2356","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=2356"}],"version-history":[{"count":3,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2356\/revisions"}],"predecessor-version":[{"id":2421,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2356\/revisions\/2421"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2297"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=2356"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}