{"id":808,"date":"2024-01-22T10:56:29","date_gmt":"2024-01-22T10:56:29","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=808"},"modified":"2024-01-24T23:40:44","modified_gmt":"2024-01-24T23:40:44","slug":"build-nodejs","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/build-nodejs\/","title":{"rendered":"build-nodejs"},"content":{"rendered":"\n<p>A test of building the node.js javascript engine. This code is longer than several of the &#8220;build-*&#8221; workloads, though not quite as fast as build-gcc or build-llvm.  Similar to other build workloads there is a high number of processes, number of frontend stalls. In contrast to some others, there is somewhat higher backend stalls. This code has some &#8220;cleanup phases before each run that also look reflected in the compilation. Also looks like mostly parallel compilation with one serializing (link?) half way through and greater serialization towards end of the workload.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-59.png\" alt=\"\" class=\"wp-image-880\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-59.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-59-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-59-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows those general periods of higher frontend stalls. Also looks like the &#8220;cleanup&#8221; before each workload has different profile.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-97.png\" alt=\"\" class=\"wp-image-881\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-97.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-97-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-97-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show ~1\/5 instructions is a branch with little floating point code.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              2144.566\non_cpu               0.941          # 15.05 \/ 16 cores\nutime                30188.340\nstime                2091.837\nnvcsw                807175         # 44.88%\nnivcsw               991151         # 55.12%\ninblock              0              # 0.00\/sec\nonblock              12258648       # 5716.14\/sec\ncpu-clock            32279798809815 # 32279.799 seconds\ntask-clock           32280195435292 # 32280.195 seconds\npage faults          642525533      # 19904.636\/sec\ncontext switches     1568781        # 48.599\/sec\ncpu migrations       73009          # 2.262\/sec\nmajor page faults    1676           # 0.052\/sec\nminor page faults    642523857      # 19904.584\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             21649570910307 # 211.978 branches per 1000 inst\nbranch misses        542079184612   # 2.50% branch miss\nconditional          16952172427646 # 165.984 conditional branches per 1000 inst\nindirect             430406042887   # 4.214 indirect branches per 1000 inst\ncpu-cycles           133548357863790 # 3.89 GHz\ninstructions         101428319812801 # 0.76 IPC\nslots                268650189068820 #\nretiring             32676831786963 # 12.2% (16.3%)\n-- ucode             21498857071    #     0.0%\n-- fastpath          32655332929892 #    12.2%\nfrontend             69755359369146 # 26.0% (34.8%)\n-- latency           52488811107576 #    19.5%\n-- bandwidth         17266548261570 #     6.4%\nbackend              92552250002478 # 34.5% (46.1%)\n-- cpu               6556257702259  #     2.4%\n-- memory            85995992300219 #    32.0%\nspeculation          5716339764160  #  2.1% ( 2.8%)\n-- branch mispredict 5651531115706  #     2.1%\n-- pipeline restart  64808648454    #     0.0%\nsmt-contention       67949097952459 # 25.3% ( 0.0%)\ncpu-cycles           133519263543635 # 3.88 GHz\ninstructions         101430105523798 # 0.76 IPC\ninstructions         33965954644499 # 69.306 l2 access per 1000 inst\nl2 hit from l1       1830068894706  # 22.58% l2 miss\nl2 miss from l1      323908834364   #\nl2 hit from l2 pf    316444081333   #\nl3 hit from l2 pf    101163396524   #\nl3 miss from l2 pf   106365283407   #\ninstructions         33946845258090 # 15.153 float per 1000 inst\nfloat 512            81194          # 0.000 AVX-512 per 1000 inst\nfloat 256            2345789        # 0.000 AVX-256 per 1000 inst\nfloat 128            514401099938   # 15.153 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         8              # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              2512.218\non_cpu               0.945          # 15.12 \/ 16 cores\nutime                36117.910\nstime                1854.539\nnvcsw                878304         # 45.90%\nnivcsw               1035202        # 54.10%\ninblock              102096         # 40.64\/sec\nonblock              12247440       # 4875.15\/sec\ncpu-clock            37972118857909 # 37972.119 seconds\ntask-clock           37972545348380 # 37972.545 seconds\npage faults          642330574      # 16915.658\/sec\ncontext switches     1691066        # 44.534\/sec\ncpu migrations       84049          # 2.213\/sec\nmajor page faults    1076           # 0.028\/sec\nminor page faults    642329498      # 16915.629\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             21432898419363 # 210.396 branches per 1000 inst\nbranch misses        411236524227   # 1.92% branch miss\nconditional          21432906380163 # 210.396 conditional branches per 1000 inst\nindirect             3716689790845  # 36.485 indirect branches per 1000 inst\nslots                182247267068792 #\nretiring             54162825882165 # 29.7% (29.7%)\n-- ucode             4366997381232  #     2.4%\n-- fastpath          49795828500933 #    27.3%\nfrontend             55027280352127 # 30.2% (30.2%)\n-- latency           33598971511576 #    18.4%\n-- bandwidth         21428308840551 #    11.8%\nbackend              56316877227342 # 30.9% (30.9%)\n-- cpu               12654035624989 #     6.9%\n-- memory            43662841602353 #    24.0%\nspeculation          17013502249974 #  9.3% ( 9.3%)\n-- branch mispredict 16365820969817 #     9.0%\n-- pipeline restart  647681280157   #     0.4%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           81430819711470 # 2.03 GHz\ninstructions         79248849727315 # 0.97 IPC\nl2 access            3923871673155  # 68.114 l2 access per 1000 inst\nl2 miss              1327323041163  # 33.83% l2 miss\n<\/code><\/pre>\n\n\n\n<p>Process overview suggests ~2\/3 C++ compilation and ~1\/3 C compilation in terms of files, but much more time spent in the C++ compilation. Overall a quarter million processes<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>248492 processes\n\t7231 cc1plus              29340.31  1605.19\n\t3882 cc1                    464.31    56.24\n\t 36 ld                      23.74     9.48\n\t11158 as                      23.09     1.92\n\t 65 clinfo                  18.13     6.67\n\t  6 make                    11.97    18.56\n\t  3 torque                  10.78     0.23\n\t  3 xz                       4.32     0.43\n\t 37 python                   4.17     0.46\n\t 36 python3.10               3.14     0.24\n\t 18 node_mksnapshot          3.12     0.54\n\t  3 mksnapshot               2.86     0.23\n\t117 ar                       1.70     1.71\n\t 38 vulkaninfo               1.13     1.52\n\t  3 genccode                 0.49     0.29\n\t  6 php                      0.21     0.53\n\t  6 glxinfo:gdrv0            0.19     0.07\n\t  3 tar                      0.12     2.75\n\t  4 vulkani:disk$0           0.12     0.16\n\t  2 glxinfo                  0.09     0.03\n\t  2 glxinfo:cs0              0.09     0.03\n\t  2 glxinfo:disk$0           0.09     0.03\n\t  2 glxinfo:sh0              0.09     0.03\n\t  2 glxinfo:shlo0            0.09     0.03\n\t  2 llvmpipe-0               0.06     0.08\n\t  2 llvmpipe-1               0.06     0.08\n\t  2 llvmpipe-10              0.06     0.08\n\t  2 llvmpipe-11              0.06     0.08\n\t  2 llvmpipe-12              0.06     0.08\n\t  2 llvmpipe-13              0.06     0.08\n\t  2 llvmpipe-14              0.06     0.08\n\t  2 llvmpipe-15              0.06     0.08\n\t  2 llvmpipe-2               0.06     0.08\n\t  2 llvmpipe-3               0.06     0.08\n\t  2 llvmpipe-4               0.06     0.08\n\t  2 llvmpipe-5               0.06     0.08\n\t  2 llvmpipe-6               0.06     0.08\n\t  2 llvmpipe-7               0.06     0.08\n\t  2 llvmpipe-8               0.06     0.08\n\t  2 llvmpipe-9               0.06     0.08\n\t11325 rm                       0.05     2.43\n\t  3 icupkg                   0.05     0.19\n\t  6 clang                    0.05     0.07\n\t 45 V8 DefaultWorke          0.00    42.90\n\t  3 cp                       0.00     0.11\n\t3942 cc                       0.00     0.03\n\t  3 find                     0.00     0.03\n\t  3 rocminfo                 0.00     0.03\n\t  1 lspci                    0.00     0.03\n\t  1 ps                       0.00     0.01\n\t123565 sh                       0.00     0.00\n\t33622 sed                      0.00     0.00\n\t11553 mkdir                    0.00     0.00\n\t11376 printf                   0.00     0.00\n\t11296 touch                    0.00     0.00\n\t11208 grep                     0.00     0.00\n\t7263 g++                      0.00     0.00\n\t117 dirname                  0.00     0.00\n\t 36 collect2                 0.00     0.00\n\t 22 gcc                      0.00     0.00\n\t 17 uname                    0.00     0.00\n\t 15 tr                       0.00     0.00\n\t 10 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  4 bash                     0.00     0.00\n\t  3 build-nodejs             0.00     0.00\n\t  3 bytecode_builti          0.00     0.00\n\t  3 dconf worker             0.00     0.00\n\t  3 gen-regexp-spec          0.00     0.00\n\t  3 gmain                    0.00     0.00\n\t  3 ld.gold                  0.00     0.00\n\t  3 ln                       0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 python3                  0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n189 processes running\n236 maximum processes\n<\/code><\/pre>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>A test of building the node.js javascript engine. This code is longer than several of the &#8220;build-*&#8221; workloads, though not quite as fast as build-gcc or build-llvm. Similar to other build workloads there is a high number of processes, number <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/build-nodejs\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-808","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/808","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=808"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/808\/revisions"}],"predecessor-version":[{"id":882,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/808\/revisions\/882"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=808"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}