{"id":1487,"date":"2024-02-04T10:17:10","date_gmt":"2024-02-04T10:17:10","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=1487"},"modified":"2024-02-08T10:48:48","modified_gmt":"2024-02-08T10:48:48","slug":"build-eigen","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/build-eigen\/","title":{"rendered":"build-eigen"},"content":{"rendered":"\n<p>Measuring the time to build the Eigen examples. This is a single-threaded workload.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-35.png\" alt=\"\" class=\"wp-image-1591\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-35.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-35-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-35-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile has frontend stalls highest, retirement medium and backend stalls lower.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-37.png\" alt=\"\" class=\"wp-image-1593\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-37.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-37-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-37-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics confirm the single-threaded nature. There is not much floating point and a higher amount of branches. The page fault rate is high.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              192.066\non_cpu               0.058          # 0.93 \/ 16 cores\nutime                155.424\nstime                22.525\nnvcsw                8485           # 81.00%\nnivcsw               1990           # 19.00%\ninblock              0              # 0.00\/sec\nonblock              1036120        # 5394.60\/sec\ncpu-clock            177932719483   # 177.933 seconds\ntask-clock           177939721084   # 177.940 seconds\npage faults          10960537       # 61596.910\/sec\ncontext switches     10634          # 59.762\/sec\ncpu migrations       340            # 1.911\/sec\nmajor page faults    3              # 0.017\/sec\nminor page faults    10960534       # 61596.893\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             304119200242   # 212.340 branches per 1000 inst\nbranch misses        7766876223     # 2.55% branch miss\nconditional          233491815928   # 163.027 conditional branches per 1000 inst\nindirect             6247171032     # 4.362 indirect branches per 1000 inst\ncpu-cycles           809822141694   # 0.27 GHz\ninstructions         1420863958147  # 1.75 IPC\nslots                1632435207282  #\nretiring             461919959768   # 28.3% (28.3%)\n-- ucode             409300568      #     0.0%\n-- fastpath          461510659200   #    28.3%\nfrontend             688181256897   # 42.2% (42.2%)\n-- latency           489140540280   #    30.0%\n-- bandwidth         199040716617   #    12.2%\nbackend              372531805791   # 22.8% (22.8%)\n-- cpu               48735530739    #     3.0%\n-- memory            323796275052   #    19.8%\nspeculation          109563846712   #  6.7% ( 6.7%)\n-- branch mispredict 107887873935   #     6.6%\n-- pipeline restart  1675972777     #     0.1%\nsmt-contention       237974821      #  0.0% ( 0.0%)\ncpu-cycles           810688764115   # 0.27 GHz\ninstructions         1419010189378  # 1.75 IPC\ninstructions         475183462637   # 48.484 l2 access per 1000 inst\nl2 hit from l1       20764952358    # 19.77% l2 miss\nl2 miss from l1      3424321293     #\nl2 hit from l2 pf    1144198001     #\nl3 hit from l2 pf    771547943      #\nl3 miss from l2 pf   358295900      #\ninstructions         477486479175   # 18.548 float per 1000 inst\nfloat 512            172            # 0.000 AVX-512 per 1000 inst\nfloat 256            257086         # 0.001 AVX-256 per 1000 inst\nfloat 128            8856160036     # 18.547 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\ninstructions         2692214        #\nopcache              995150         # 369.640 opcache per 1000 inst\nopcache miss         535421         # 53.8% opcache miss rate\nl1 dTLB miss         6949           # 2.581 L1 dTLB per 1000 inst\nl2 dTLB miss         1265           # 0.470 L2 dTLB per 1000 inst\ninstructions         2738628        #\nicache               1321546        # 482.558 icache per 1000 inst\nicache miss          110332         #  8.3% icache miss rate\nl1 iTLB miss         7              # 0.003 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            19             # 0.007 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              208.928\non_cpu               0.058          # 0.93 \/ 16 cores\nutime                178.327\nstime                15.590\nnvcsw                11389          # 84.90%\nnivcsw               2026           # 15.10%\ninblock              47560          # 227.64\/sec\nonblock              1024896        # 4905.50\/sec\ncpu-clock            193849295915   # 193.849 seconds\ntask-clock           193858763133   # 193.859 seconds\npage faults          10950372       # 56486.340\/sec\ncontext switches     13663          # 70.479\/sec\ncpu migrations       422            # 2.177\/sec\nmajor page faults    267            # 1.377\/sec\nminor page faults    10950105       # 56484.963\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             300412569450   # 210.542 branches per 1000 inst\nbranch misses        5055599168     # 1.68% branch miss\nconditional          300412601450   # 210.542 conditional branches per 1000 inst\nindirect             6253736930     # 4.383 indirect branches per 1000 inst\nslots                4349057855702  #\nretiring             1330460791228  # 30.6% (30.6%)\n-- ucode             96253339275    #     2.2%\n-- fastpath          1234207451953  #    28.4%\nfrontend             1618202986437  # 37.2% (37.2%)\n-- latency           857127981988   #    19.7%\n-- bandwidth         761075004449   #    17.5%\nbackend              810368367002   # 18.6% (18.6%)\n-- cpu               282251961137   #     6.5%\n-- memory            528116405865   #    12.1%\nspeculation          598085359038   # 13.8% (13.8%) high\n-- branch mispredict 570655432259   #    13.1%\n-- pipeline restart  27429926779    #     0.6%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           726240052738   # 0.22 GHz\ninstructions         1422860814808  # 1.96 IPC\nl2 access            62747872551    # 44.119 l2 access per 1000 inst\nl2 miss              16238642601    # 25.88% l2 miss\ncpu-cycles           725580737432   # 22.6% memory latency\nload stalls          158209076663   #  1.9% l1 bound\nl1 miss              144170544495   # 10.3% l2 bound\nl2 miss              69226808624    #  4.1% l3 bound\nl3 miss              39673165073    #  5.5% dram bound\nstore_stalls         5452926964     #  0.8% store bound\n<\/code><\/pre>\n\n\n\n<p>Process summary suggests c++ code with time spent in the front end.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>981 processes\n\t204 cc1plus                147.93    16.46\n\t 68 clinfo                  18.51     7.32\n\t204 as                       3.55     0.81\n\t 38 vulkaninfo               1.43     1.33\n\t  3 bzip2                    0.51     0.01\n\t  4 vulkani:disk$0           0.16     0.14\n\t  6 glxinfo:gdrv0            0.13     0.07\n\t  6 glxinfo:gl0              0.13     0.07\n\t  2 llvmpipe-0               0.09     0.07\n\t  2 llvmpipe-1               0.09     0.07\n\t  2 llvmpipe-10              0.09     0.07\n\t  2 llvmpipe-11              0.09     0.07\n\t  2 llvmpipe-12              0.09     0.07\n\t  2 llvmpipe-13              0.09     0.07\n\t  2 llvmpipe-14              0.09     0.07\n\t  2 llvmpipe-15              0.09     0.07\n\t  2 llvmpipe-2               0.09     0.07\n\t  2 llvmpipe-3               0.09     0.07\n\t  2 llvmpipe-4               0.09     0.07\n\t  2 llvmpipe-5               0.09     0.07\n\t  2 llvmpipe-6               0.09     0.07\n\t  2 llvmpipe-7               0.09     0.07\n\t  2 llvmpipe-8               0.09     0.07\n\t  2 llvmpipe-9               0.09     0.07\n\t  6 php                      0.08     0.07\n\t  2 glxinfo                  0.07     0.03\n\t  2 glxinfo:cs0              0.07     0.03\n\t  2 glxinfo:disk$0           0.07     0.03\n\t  2 glxinfo:sh0              0.07     0.03\n\t  2 glxinfo:shlo0            0.07     0.03\n\t  6 clang                    0.05     0.07\n\t  3 rocminfo                 0.03     0.00\n\t  1 lspci                    0.01     0.02\n\t  3 tar                      0.00     0.16\n\t  4 rm                       0.00     0.09\n\t  3 build-eigen              0.00     0.03\n\t  1 ps                       0.00     0.01\n\t204 c++                      0.00     0.00\n\t 86 sh                       0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t 11 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  4 bash                     0.00     0.00\n\t  4 gmain                    0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dconf worker             0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Straightforward computation with the pattern of compiling one file after the next. I suspect there is a way to make these parallel since there isn&#8217;t a link step.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      877731) build-eigen      cpu=15 start=5.76  finish=64.25\n        877732) c++              cpu=13 start=5.76  finish=6.44 \n          877733) cc1plus          cpu=8 start=5.76  finish=6.41 \n          877734) as               cpu=6 start=6.42  finish=6.44 \n        877735) c++              cpu=8 start=6.44  finish=6.95 \n          877736) cc1plus          cpu=2 start=6.44  finish=6.93 \n          877737) as               cpu=1 start=6.94  finish=6.95 \n        877738) c++              cpu=2 start=6.95  finish=7.38 \n          877739) cc1plus          cpu=3 start=6.95  finish=7.36 \n          877740) as               cpu=4 start=7.37  finish=7.38 \n        877741) c++              cpu=0 start=7.38  finish=7.80 \n          877742) cc1plus          cpu=9 start=7.38  finish=7.79 \n          877744) as               cpu=10 start=7.80  finish=7.80 \n        877745) c++              cpu=9 start=7.80  finish=8.25 \n          877746) cc1plus          cpu=4 start=7.81  finish=8.24 \n          877747) as               cpu=10 start=8.24  finish=8.25 \n        877748) c++              cpu=0 start=8.25  finish=8.71 \n          877749) cc1plus          cpu=3 start=8.26  finish=8.69 \n          877750) as               cpu=9 start=8.70  finish=8.71 \n        877751) c++              cpu=10 start=8.71  finish=9.18 \n          877752) cc1plus          cpu=11 start=8.71  finish=9.16 \n          877753) as               cpu=4 start=9.16  finish=9.17 \n        877754) c++              cpu=0 start=9.18  finish=9.64 \n          877755) cc1plus          cpu=3 start=9.18  finish=9.62 \n          877756) as               cpu=10 start=9.63  finish=9.64 \n        877757) c++              cpu=3 start=9.64  finish=10.12\n          877758) cc1plus          cpu=4 start=9.64  finish=10.10\n          877759) as               cpu=5 start=10.11 finish=10.12\n        877760) c++              cpu=0 start=10.12 finish=10.76\n          877761) cc1plus          cpu=10 start=10.12 finish=10.74\n          877762) as               cpu=3 start=10.75 finish=10.76\n        877763) c++              cpu=4 start=10.77 finish=11.41\n          877764) cc1plus          cpu=5 start=10.77 finish=11.38\n          877765) as               cpu=14 start=11.39 finish=11.41\n        877766) c++              cpu=8 start=11.41 finish=11.95\n          877767) cc1plus          cpu=2 start=11.41 finish=11.93\n          877768) as               cpu=12 start=11.94 finish=11.95\n        877769) c++              cpu=2 start=11.95 finish=12.48\n          877770) cc1plus          cpu=5 start=11.95 finish=12.46\n          877771) as               cpu=11 start=12.47 finish=12.48\n        877772) c++              cpu=11 start=12.48 finish=13.85\n          877773) cc1plus          cpu=4 start=12.48 finish=13.77\n          877774) as               cpu=13 start=13.78 finish=13.85\n        877775) c++              cpu=8 start=13.85 finish=14.96\n          877776) cc1plus          cpu=10 start=13.85 finish=14.89\n          877777) as               cpu=3 start=14.91 finish=14.95\n        877778) c++              cpu=10 start=14.96 finish=16.21\n          877779) cc1plus          cpu=11 start=14.96 finish=16.14\n          877780) as               cpu=12 start=16.16 finish=16.21\n        877781) c++              cpu=0 start=16.21 finish=19.44\n          877782) cc1plus          cpu=3 start=16.21 finish=19.25\n          877783) as               cpu=2 start=19.27 finish=19.43\n        877784) c++              cpu=3 start=19.44 finish=20.60\n          877785) cc1plus          cpu=4 start=19.44 finish=20.53\n          877786) as               cpu=5 start=20.55 finish=20.60\n        877787) c++              cpu=0 start=20.60 finish=21.20\n          877788) cc1plus          cpu=10 start=20.60 finish=21.18\n          877789) as               cpu=3 start=21.19 finish=21.20\n        877790) c++              cpu=10 start=21.20 finish=22.39\n          877791) cc1plus          cpu=4 start=21.20 finish=22.32\n          877792) as               cpu=11 start=22.34 finish=22.38\n        877793) c++              cpu=8 start=22.39 finish=29.74\n          877794) cc1plus          cpu=11 start=22.39 finish=29.28\n          877795) as               cpu=1 start=29.32 finish=29.73\n        877796) c++              cpu=15 start=29.74 finish=31.55\n          877797) cc1plus          cpu=10 start=29.74 finish=31.44\n          877798) as               cpu=3 start=31.46 finish=31.55\n        877799) c++              cpu=8 start=31.55 finish=32.32\n          877800) cc1plus          cpu=10 start=31.55 finish=32.29\n          877801) as               cpu=10 start=32.30 finish=32.32\n        877802) c++              cpu=3 start=32.32 finish=32.81\n          877803) cc1plus          cpu=12 start=32.33 finish=32.80\n          877804) as               cpu=5 start=32.81 finish=32.81\n        877805) c++              cpu=8 start=32.81 finish=33.35\n          877806) cc1plus          cpu=2 start=32.82 finish=33.33\n          877807) as               cpu=3 start=33.34 finish=33.35\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Measuring the time to build the Eigen examples. This is a single-threaded workload. Topdown profile has frontend stalls highest, retirement medium and backend stalls lower. AMD metrics confirm the single-threaded nature. There is not much floating point and a higher <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/build-eigen\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-1487","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1487","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=1487"}],"version-history":[{"count":3,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1487\/revisions"}],"predecessor-version":[{"id":1594,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1487\/revisions\/1594"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=1487"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}