{"id":1489,"date":"2024-02-04T10:18:03","date_gmt":"2024-02-04T10:18:03","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=1489"},"modified":"2024-02-08T10:55:33","modified_gmt":"2024-02-08T10:55:33","slug":"build-erlang","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/build-erlang\/","title":{"rendered":"build-erlang"},"content":{"rendered":"\n<p>Measuring the time to compile Erlang\/OTP. The number of runnable processes jumps high and CPU core usage is variable.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-36.png\" alt=\"\" class=\"wp-image-1595\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-36.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-36-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-36-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile has a higher rate of topdown stalls with frontend stalls and retirement roughly similar.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-38.png\" alt=\"\" class=\"wp-image-1597\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-38.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-38-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-38-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show on average half the cores. Frontend stalls are the highest. There is not much floating point.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              374.136\non_cpu               0.520          # 8.33 \/ 16 cores\nutime                2600.996\nstime                513.696\nnvcsw                7826857        # 26.55%\nnivcsw               21651959       # 73.45%\ninblock              24             # 0.06\/sec\nonblock              7827320        # 20921.03\/sec\ncpu-clock            3122273753608  # 3122.274 seconds\ntask-clock           3122941278301  # 3122.941 seconds\npage faults          88926824       # 28475.343\/sec\ncontext switches     29413373       # 9418.484\/sec\ncpu migrations       517461         # 165.697\/sec\nmajor page faults    2910           # 0.932\/sec\nminor page faults    88923651       # 28474.327\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             2646561802828  # 192.873 branches per 1000 inst\nbranch misses        91839791917    # 3.47% branch miss\nconditional          1924017982101  # 140.216 conditional branches per 1000 inst\nindirect             112928353961   # 8.230 indirect branches per 1000 inst\ncpu-cycles           11752414130889 # 1.98 GHz\ninstructions         13299671577609 # 1.13 IPC\nslots                24883400890320 #\nretiring             4545808476867  # 18.3% (22.4%)\n-- ucode             19058779784    #     0.1%\n-- fastpath          4526749697083  #    18.2%\nfrontend             8930881982420  # 35.9% (43.9%)\n-- latency           6557542868286  #    26.4%\n-- bandwidth         2373339114134  #     9.5%\nbackend              6017703339481  # 24.2% (29.6%)\n-- cpu               853507402202   #     3.4%\n-- memory            5164195937279  #    20.8%\nspeculation          831290685150   #  3.3% ( 4.1%)\n-- branch mispredict 815429325807   #     3.3%\n-- pipeline restart  15861359343    #     0.1%\nsmt-contention       4557505158750  # 18.3% ( 0.0%)\ncpu-cycles           11730454882014 # 1.97 GHz\ninstructions         13294674228204 # 1.13 IPC\ninstructions         4558616129744  # 36.897 l2 access per 1000 inst\nl2 hit from l1       144877796872   # 19.30% l2 miss\nl2 miss from l1      19728155814    #\nl2 hit from l2 pf    10591886123    #\nl3 hit from l2 pf    6041907329     #\nl3 miss from l2 pf   6689503149     #\ninstructions         4551855078110  # 31.314 float per 1000 inst\nfloat 512            27337          # 0.000 AVX-512 per 1000 inst\nfloat 256            189660         # 0.000 AVX-256 per 1000 inst\nfloat 128            142535017390   # 31.314 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         5847           # 0.000 scalar per 1000 inst\ninstructions         2705980        #\nopcache              1009880        # 373.203 opcache per 1000 inst\nopcache miss         542791         # 53.7% opcache miss rate\nl1 dTLB miss         6952           # 2.569 L1 dTLB per 1000 inst\nl2 dTLB miss         1329           # 0.491 L2 dTLB per 1000 inst\ninstructions         2721500        #\nicache               1326721        # 487.496 icache per 1000 inst\nicache miss          112354         #  8.5% icache miss rate\nl1 iTLB miss         9              # 0.003 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            19             # 0.007 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              395.889\non_cpu               0.533          # 8.53 \/ 16 cores\nutime                3054.027\nstime                323.352\nnvcsw                7022544        # 24.22%\nnivcsw               21975293       # 75.78%\ninblock              71304          # 180.11\/sec\nonblock              7816912        # 19745.22\/sec\ncpu-clock            3380383176672  # 3380.383 seconds\ntask-clock           3381121509150  # 3381.122 seconds\npage faults          88938503       # 26304.439\/sec\ncontext switches     28933721       # 8557.433\/sec\ncpu migrations       542782         # 160.533\/sec\nmajor page faults    1469           # 0.434\/sec\nminor page faults    88936766       # 26303.925\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             2571256188919  # 189.076 branches per 1000 inst\nbranch misses        61958336618    # 2.41% branch miss\nconditional          2571261108567  # 189.076 conditional branches per 1000 inst\nindirect             571366916746   # 42.015 indirect branches per 1000 inst\nslots                23248461525260 #\nretiring             7749937132128  # 33.3% (33.3%)\n-- ucode             552996012166   #     2.4%\n-- fastpath          7196941119962  #    31.0%\nfrontend             7269498380742  # 31.3% (31.3%)\n-- latency           4040354222203  #    17.4%\n-- bandwidth         3229144158539  #    13.9%\nbackend              5019972335993  # 21.6% (21.6%)\n-- cpu               3131106952317  #    13.5%\n-- memory            1888865383676  #     8.1%\nspeculation          3276124690314  # 14.1% (14.1%) high\n-- branch mispredict 3156167726549  #    13.6%\n-- pipeline restart  119956963765   #     0.5%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           8559345730142  # 1.37 GHz\ninstructions         11195665887023 # 1.31 IPC\nl2 access            285577713652   # 35.058 l2 access per 1000 inst\nl2 miss              78765912679    # 27.58% l2 miss\ncpu-cycles           6242576212630  # 23.8% memory latency\nload stalls          1419074135749  #  6.3% l1 bound\nl1 miss              1024870800867  #  6.3% l2 bound\nl2 miss              629646656380   #  2.3% l3 bound\nl3 miss              486990186219   #  7.8% dram bound\nstore_stalls         65398513932    #  1.0% store bound\n<\/code><\/pre>\n\n\n\n<p>Process overview has a set of processes running to take user time. Over 150,000 processes total and it looks like we missed some.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>153290 processes\n\t3948 beam.smp              1349.21   277.09\n\t3948 sys_sig_dispatc       1349.16   277.07\n\t3948 sys_msg_dispatc       1349.11   277.04\n\t3948 1_scheduler           1349.03   277.02\n\t3948 1_dirty_cpu_sch       1348.95   277.00\n\t3948 1_dirty_io_sche       1348.90   276.98\n\t3948 2_dirty_io_sche       1348.84   276.98\n\t3948 3_dirty_io_sche       1348.82   276.97\n\t3948 4_dirty_io_sche       1348.80   276.95\n\t3948 5_dirty_io_sche       1348.73   276.95\n\t3948 6_dirty_io_sche       1348.67   276.91\n\t3948 7_dirty_io_sche       1348.60   276.90\n\t3948 8_dirty_io_sche       1348.56   276.88\n\t3948 9_dirty_io_sche       1348.53   276.88\n\t3948 10_dirty_io_sch       1348.47   276.87\n\t3948 1_aux                 1348.44   276.84\n\t3948 0_poller              1348.39   276.83\n\t3948 async_1               1347.00   276.47\n\t186 javac                 1089.11    40.06\n\t 96 Finalizer              544.81    20.08\n\t 96 Common-Cleaner         544.79    20.09\n\t3795 cc1                    353.47    31.04\n\t288 cc1plus                157.15    20.61\n\t2937 as                      25.94     2.92\n\t 68 clinfo                  16.54     5.98\n\t 30 10_scheduler             7.89     2.01\n\t 30 11_scheduler             7.89     2.01\n\t 30 2_scheduler              7.89     2.01\n\t 30 3_scheduler              7.89     2.01\n\t 30 4_scheduler              7.89     2.01\n\t 30 5_scheduler              7.89     2.01\n\t 30 6_scheduler              7.89     2.01\n\t 30 7_scheduler              7.89     2.01\n\t 30 8_scheduler              7.89     2.01\n\t 30 9_scheduler              7.89     2.01\n\t 30 12_scheduler             7.88     2.01\n\t 30 13_scheduler             7.88     2.01\n\t 30 14_scheduler             7.87     2.01\n\t 30 15_scheduler             7.87     2.01\n\t 30 16_scheduler             7.87     2.01\n\t 30 2_dirty_cpu_sch          7.87     2.01\n\t 30 3_dirty_cpu_sch          7.87     2.01\n\t 30 4_dirty_cpu_sch          7.87     2.01\n\t 30 5_dirty_cpu_sch          7.87     2.01\n\t 30 6_dirty_cpu_sch          7.87     2.01\n\t 30 10_dirty_cpu_sc          7.87     2.00\n\t 30 11_dirty_cpu_sc          7.87     2.00\n\t 30 12_dirty_cpu_sc          7.87     2.00\n\t 30 13_dirty_cpu_sc          7.87     2.00\n\t 30 14_dirty_cpu_sc          7.87     2.00\n\t 30 15_dirty_cpu_sc          7.87     2.00\n\t 30 16_dirty_cpu_sc          7.87     2.00\n\t 30 7_dirty_cpu_sch          7.87     2.00\n\t 30 8_dirty_cpu_sch          7.87     2.00\n\t 30 9_dirty_cpu_sch          7.87     2.00\n\t  9 yielding_c_fun           2.93     0.54\n\t930 ld                       2.65     0.90\n\t  3 gzip                     2.05     0.10\n\t 93 G1 Main Marker           2.00     0.00\n\t 93 G1 Young RemSet          2.00     0.00\n\t 93 VM Periodic Tas          2.00     0.00\n\t6670 bash                     1.15     3.70\n\t 38 vulkaninfo               1.15     1.33\n\t801 make                     0.88     0.44\n\t  6 jar                      0.54     0.10\n\t 72 perl                     0.46     0.00\n\t  6 php                      0.14     0.22\n\t  4 vulkani:disk$0           0.13     0.14\n\t  6 glxinfo:gdrv0            0.12     0.06\n\t  6 glxinfo:gl0              0.12     0.06\n\t  3 tar                      0.07     1.33\n\t  2 llvmpipe-0               0.07     0.07\n\t  2 llvmpipe-1               0.07     0.07\n\t  2 llvmpipe-10              0.07     0.07\n\t  2 llvmpipe-11              0.07     0.07\n\t  2 llvmpipe-12              0.07     0.07\n\t  2 llvmpipe-13              0.07     0.07\n\t  2 llvmpipe-14              0.07     0.07\n\t  2 llvmpipe-15              0.07     0.07\n\t  2 llvmpipe-2               0.07     0.07\n\t  2 llvmpipe-3               0.07     0.07\n\t  2 llvmpipe-4               0.07     0.07\n\t  2 llvmpipe-5               0.07     0.07\n\t  2 llvmpipe-6               0.07     0.07\n\t  2 llvmpipe-7               0.07     0.07\n\t  2 llvmpipe-8               0.07     0.07\n\t  2 llvmpipe-9               0.07     0.07\n\t  6 clang                    0.07     0.05\n\t  2 glxinfo                  0.07     0.02\n\t  2 glxinfo:cs0              0.06     0.02\n\t  2 glxinfo:disk$0           0.06     0.02\n\t  2 glxinfo:sh0              0.06     0.02\n\t  2 glxinfo:shlo0            0.06     0.02\n\t 42 flex                     0.06     0.00\n\t5506 rm                       0.03     1.00\n\t7645 sh                       0.03     0.07\n\t3259 gcc                      0.03     0.05\n\t3221 sed                      0.03     0.00\n\t  3 rocminfo                 0.03     0.00\n\t 24 ranlib                   0.02     0.80\n\t 24 ar                       0.02     0.78\n\t307 configure                0.02     0.00\n\t 12 m4                       0.01     0.00\n\t804 C2 CompilerThre          0.00  4156.98\n\t426 C1 CompilerThre          0.00  2211.08\n\t 96 GC Thread#0              0.00   544.82\n\t 96 G1 Conc#0                0.00   544.81\n\t 96 Service Thread           0.00   544.81\n\t 96 VM Thread                0.00   544.81\n\t 96 Reference Handl          0.00   544.80\n\t 96 Sweeper thread           0.00   544.79\n\t 96 Signal Dispatch          0.00   544.61\n\t 93 G1 Conc#1                0.00   544.52\n\t 93 G1 Conc#2                0.00   544.52\n\t 93 GC Thread#1              0.00   544.51\n\t 93 G1 Refine#0              0.00   539.18\n\t924 install                  0.00     0.09\n\t  1 lspci                    0.00     0.02\n\t11774 dirname                  0.00     0.00\n\t5148 basename                 0.00     0.00\n\t3948 erl_child_setup          0.00     0.00\n\t3885 dyn_erl                  0.00     0.00\n\t3573 cat                      0.00     0.00\n\t2475 config.sub               0.00     0.00\n\t1044 mkdir                    0.00     0.00\n\t930 collect2                 0.00     0.00\n\t817 grep                     0.00     0.00\n\t786 awk                      0.00     0.00\n\t552 mv                       0.00     0.00\n\t453 expr                     0.00     0.00\n\t294 uname                    0.00     0.00\n\t276 g++                      0.00     0.00\n\t261 cp                       0.00     0.00\n\t192 ls                       0.00     0.00\n\t119 conftest                 0.00     0.00\n\t114 ln                       0.00     0.00\n\t 92 cc                       0.00     0.00\n\t 82 sort                     0.00     0.00\n\t 81 hostname                 0.00     0.00\n\t 63 chmod                    0.00     0.00\n\t 61 mktemp                   0.00     0.00\n\t 60 inet_gethost             0.00     0.00\n\t 57 rmdir                    0.00     0.00\n\t 54 arch                     0.00     0.00\n\t 45 diff                     0.00     0.00\n\t 45 find                     0.00     0.00\n\t 42 tr                       0.00     0.00\n\t 27 getconf                  0.00     0.00\n\t 15 config.guess             0.00     0.00\n\t 12 gen_git_version          0.00     0.00\n\t 12 touch                    0.00     0.00\n\t 11 gsettings                0.00     0.00\n\t  9 echo                     0.00     0.00\n\t  9 otp_build                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  6 pkg-config               0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  4 date                     0.00     0.00\n\t  3 build-erlang             0.00     0.00\n\t  3 git                      0.00     0.00\n\t  3 gmain                    0.00     0.00\n\t  3 snmp-v2tov1              0.00     0.00\n\t  2 dconf worker             0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 ps                       0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Measuring the time to compile Erlang\/OTP. The number of runnable processes jumps high and CPU core usage is variable. Topdown profile has a higher rate of topdown stalls with frontend stalls and retirement roughly similar. AMD metrics show on average <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/build-erlang\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-1489","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1489","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=1489"}],"version-history":[{"count":3,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1489\/revisions"}],"predecessor-version":[{"id":1598,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1489\/revisions\/1598"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=1489"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}