{"id":2321,"date":"2024-06-02T19:33:30","date_gmt":"2024-06-02T19:33:30","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=2321"},"modified":"2024-06-03T23:56:34","modified_gmt":"2024-06-03T23:56:34","slug":"519-lbm_r","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/cpu2017\/519-lbm_r\/","title":{"rendered":"519.lbm_r"},"content":{"rendered":"\n<p>lbm is a SPEC CPU(R) benchmark written in C and described <a href=\"https:\/\/spec.org\/cpu2017\/Docs\/benchmarks\/519.lbm_r.html\">here.<\/a>\u00a0The workload runs on all logical cores with a slight decrease at end of the run<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-10.png\" alt=\"\" class=\"wp-image-2352\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-10.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-10-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-10-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows this as a backend bound workload.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-11.png\" alt=\"\" class=\"wp-image-2353\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-11.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-11-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-11-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics confirm this is backend-bound with stalls predominantly due to memory. There are ~175 L2 accesses per 1000 instructions and a 25% miss rate.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              1416.599\non_cpu               0.970          # 15.53 \/ 16 cores\nutime                21972.055\nstime                23.825\nnvcsw                30569          # 13.09%\nnivcsw               202931         # 86.91%\ninblock              0              # 0.00\/sec\nonblock              10096          # 7.13\/sec\ncpu-clock            22000514289041 # 22000.514 seconds\ntask-clock           22000903635495 # 22000.904 seconds\npage faults          5653553        # 256.969\/sec\ncontext switches     232950         # 10.588\/sec\ncpu migrations       159            # 0.007\/sec\nmajor page faults    967            # 0.044\/sec\nminor page faults    5652586        # 256.925\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             3618987222588  # 138.203 branches per 1000 inst\nbranch misses        2898200956     # 0.08% branch miss\nconditional          3606565747126  # 137.728 conditional branches per 1000 inst\nindirect             676460163      # 0.026 indirect branches per 1000 inst\ncpu-cycles           100871453520770 # 4.42 GHz\ninstructions         26185191570506 # 0.26 IPC low\nslots                201718434237312 #\nretiring             9174297118733  #  4.5% ( 4.7%) low\n-- ucode             275434458      #     0.0%\n-- fastpath          9174021684275  #     4.5%\nfrontend             4164703812385  #  2.1% ( 2.1%) low\n-- latency           2149956707448  #     1.1%\n-- bandwidth         2014747104937  #     1.0%\nbackend              183730128254661 # 91.1% (93.2%) high\n-- cpu               4395824123476  #     2.2%\n-- memory            179334304131185 #    88.9%\nspeculation          79406452189    #  0.0% ( 0.0%) low\n-- branch mispredict 43379310136    #     0.0%\n-- pipeline restart  36027142053    #     0.0%\nsmt-contention       4569836071252  #  2.3% ( 0.0%)\ncpu-cycles           101787311448863 # 4.45 GHz\ninstructions         26186712477699 # 0.26 IPC low\ninstructions         8732031506039  # 172.647 l2 access per 1000 inst\nl2 hit from l1       1295773889975  # 24.72% l2 miss\nl2 miss from l1      213380182619   #\nl2 hit from l2 pf    52568231880    #\nl3 hit from l2 pf    107403392082   #\nl3 miss from l2 pf   51815243240    #\ninstructions         8727315001190  # 51.354 float per 1000 inst\nfloat 512            211            # 0.000 AVX-512 per 1000 inst\nfloat 256            428894495414   # 49.144 AVX-256 per 1000 inst\nfloat 128            19289027765    # 2.210 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         1              # 0.000 scalar per 1000 inst\ninstructions         26183673451995 #\nopcache              1202720385896  # 45.934 opcache per 1000 inst\nopcache miss         183734119286   # 15.3% opcache miss rate\nl1 dTLB miss         344774133836   # 13.168 L1 dTLB per 1000 inst\nl2 dTLB miss         14788093995    # 0.565 L2 dTLB per 1000 inst\ninstructions         26184096721565 #\nicache               210993424923   # 8.058 icache per 1000 inst\nicache miss          12611848176    #  6.0% icache miss rate\nl1 iTLB miss         158643848      # 0.006 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            87594          # 0.000 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Process overview shows almost all time spent in lbm_r_base.mev<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>573 processes\n\t 48 lbm_r_base.mev-      22168.48    14.21\n\t 69 specperl                15.44     3.37\n\t  1 clang                    0.01     0.00\n\t  7 ps                       0.00     0.01\n\t  1 lsb_release              0.00     0.01\n\t169 sh                       0.00     0.00\n\t 54 specrxp                  0.00     0.00\n\t 48 bash                     0.00     0.00\n\t 41 specinvoke               0.00     0.00\n\t 21 grep                     0.00     0.00\n\t 20 cat                      0.00     0.00\n\t 12 uniq                     0.00     0.00\n\t 11 sort                     0.00     0.00\n\t 10 expand                   0.00     0.00\n\t  6 pwd                      0.00     0.00\n\t  5 basename                 0.00     0.00\n\t  5 specmake                 0.00     0.00\n\t  5 systemctl                0.00     0.00\n\t  4 specpp                   0.00     0.00\n\t  4 uname                    0.00     0.00\n\t  3 dirname                  0.00     0.00\n\t  3 dmidecode                0.00     0.00\n\t  3 lscpu                    0.00     0.00\n\t  2 df                       0.00     0.00\n\t  2 dpkg                     0.00     0.00\n\t  2 rm                       0.00     0.00\n\t  2 runcpu                   0.00     0.00\n\t  2 specsha512sum            0.00     0.00\n\t  2 specxz                   0.00     0.00\n\t  2 who                      0.00     0.00\n\t  1 cpupower                 0.00     0.00\n\t  1 head                     0.00     0.00\n\t  1 logname                  0.00     0.00\n\t  1 ls                       0.00     0.00\n\t  1 numactl                  0.00     0.00\n\t  1 sysctl                   0.00     0.00\n\t  1 w                        0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 which                    0.00     0.00\n0 processes running\n53 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Computation blocks show specinvoke firing off separate copies on each logical core.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>    407711) specinvoke       cpu=1 start=3.12  finish=474.61\n      407713) sh               cpu=5 start=3.12  finish=469.35\n        407719) bash             cpu=0 start=3.12  finish=469.35\n          407745) lbm_r_base.mev-  cpu=0 start=3.12  finish=469.26\n      407714) sh               cpu=15 start=3.12  finish=437.70\n        407724) bash             cpu=1 start=3.12  finish=437.69\n          407747) lbm_r_base.mev-  cpu=1 start=3.12  finish=437.56\n      407715) sh               cpu=2 start=3.12  finish=453.48\n        407725) bash             cpu=2 start=3.12  finish=453.48\n          407750) lbm_r_base.mev-  cpu=2 start=3.12  finish=453.36\n      407716) sh               cpu=1 start=3.12  finish=465.12\n        407726) bash             cpu=3 start=3.12  finish=465.12\n          407746) lbm_r_base.mev-  cpu=3 start=3.12  finish=465.02\n      407717) sh               cpu=10 start=3.12  finish=471.82\n        407728) bash             cpu=4 start=3.12  finish=471.82\n          407749) lbm_r_base.mev-  cpu=4 start=3.12  finish=471.77\n      407718) sh               cpu=10 start=3.12  finish=468.23\n        407731) bash             cpu=5 start=3.12  finish=468.23\n          407748) lbm_r_base.mev-  cpu=5 start=3.12  finish=468.16\n      407720) sh               cpu=14 start=3.12  finish=474.61\n        407730) bash             cpu=6 start=3.12  finish=474.61\n          407753) lbm_r_base.mev-  cpu=6 start=3.12  finish=474.58\n      407721) sh               cpu=7 start=3.12  finish=473.20\n        407733) bash             cpu=7 start=3.12  finish=473.20\n          407751) lbm_r_base.mev-  cpu=7 start=3.12  finish=473.16\n      407722) sh               cpu=3 start=3.12  finish=473.33\n        407735) bash             cpu=8 start=3.12  finish=473.33\n          407752) lbm_r_base.mev-  cpu=8 start=3.12  finish=473.30\n      407723) sh               cpu=9 start=3.12  finish=471.65\n        407737) bash             cpu=9 start=3.12  finish=471.65\n          407754) lbm_r_base.mev-  cpu=9 start=3.12  finish=471.60\n      407727) sh               cpu=10 start=3.12  finish=461.01\n        407739) bash             cpu=10 start=3.12  finish=461.01\n          407755) lbm_r_base.mev-  cpu=10 start=3.12  finish=460.94\n      407729) sh               cpu=1 start=3.12  finish=471.46\n        407740) bash             cpu=11 start=3.12  finish=471.46\n          407756) lbm_r_base.mev-  cpu=11 start=3.12  finish=471.40\n      407732) sh               cpu=15 start=3.12  finish=445.70\n        407741) bash             cpu=12 start=3.12  finish=445.70\n          407760) lbm_r_base.mev-  cpu=12 start=3.12  finish=445.57\n      407734) sh               cpu=15 start=3.12  finish=447.07\n        407742) bash             cpu=13 start=3.12  finish=447.07\n          407757) lbm_r_base.mev-  cpu=13 start=3.12  finish=446.95\n      407736) sh               cpu=13 start=3.12  finish=473.05\n        407744) bash             cpu=14 start=3.12  finish=473.05\n          407759) lbm_r_base.mev-  cpu=14 start=3.12  finish=472.98\n      407738) sh               cpu=15 start=3.12  finish=431.52\n        407743) bash             cpu=15 start=3.12  finish=431.52\n          407758) lbm_r_base.mev-  cpu=15 start=3.12  finish=431.38\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>lbm is a SPEC CPU(R) benchmark written in C and described here.\u00a0The workload runs on all logical cores with a slight decrease at end of the run Topdown profile shows this as a backend bound workload. AMD metrics confirm this <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/cpu2017\/519-lbm_r\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":2297,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2321","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2321","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=2321"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2321\/revisions"}],"predecessor-version":[{"id":2355,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2321\/revisions\/2355"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2297"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=2321"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}