{"id":1485,"date":"2024-02-04T10:16:07","date_gmt":"2024-02-04T10:16:07","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=1485"},"modified":"2024-02-08T10:42:31","modified_gmt":"2024-02-08T10:42:31","slug":"build-clash","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/build-clash\/","title":{"rendered":"build-clash"},"content":{"rendered":"\n<p>Time to build the clash-lang Haskell to VDHL\/Verilog\/SystemVerilog compiler. This system builds on only a subset of the cores with runnable processes changing.There is a noticeable amount of interrupts.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-34.png\" alt=\"\" class=\"wp-image-1587\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-34.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-34-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-34-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows frontend stalls as high with backend stalls next in a range.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-36.png\" alt=\"\" class=\"wp-image-1589\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-36.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-36-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-36-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show two cores on average, no floating point. One in five instructions is a branch.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              943.254\non_cpu               0.128          # 2.05 \/ 16 cores\nutime                1350.714\nstime                579.751\nnvcsw                529369         # 98.05%\nnivcsw               10509          # 1.95%\ninblock              32             # 0.03\/sec\nonblock              9080656        # 9626.95\/sec\ncpu-clock            1925510750508  # 1925.511 seconds\ntask-clock           1926048324360  # 1926.048 seconds\npage faults          35179854       # 18265.302\/sec\ncontext switches     509918         # 264.748\/sec\ncpu migrations       8138           # 4.225\/sec\nmajor page faults    55             # 0.029\/sec\nminor page faults    35179709       # 18265.227\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             2043377572043  # 199.807 branches per 1000 inst\nbranch misses        128313051498   # 6.28% branch miss\nconditional          1276610592489  # 124.831 conditional branches per 1000 inst\nindirect             217093841014   # 21.228 indirect branches per 1000 inst\ncpu-cycles           8313673121795  # 0.55 GHz\ninstructions         10094826462900 # 1.21 IPC\nslots                16943857322868 #\nretiring             3523609338335  # 20.8% (20.8%)\n-- ucode             12290443004    #     0.1%\n-- fastpath          3511318895331  #    20.7%\nfrontend             8039710358232  # 47.4% (47.5%) high\n-- latency           6205587226386  #    36.6%\n-- bandwidth         1834123131846  #    10.8%\nbackend              4512534563896  # 26.6% (26.7%)\n-- cpu               625024650200   #     3.7%\n-- memory            3887509913696  #    22.9%\nspeculation          835901972795   #  4.9% ( 4.9%)\n-- branch mispredict 831878338020   #     4.9%\n-- pipeline restart  4023634775     #     0.0%\nsmt-contention       32090722809    #  0.2% ( 0.0%)\ncpu-cycles           8354160492929  # 0.55 GHz\ninstructions         10123650554383 # 1.21 IPC\ninstructions         3412681911672  # 34.982 l2 access per 1000 inst\nl2 hit from l1       96205174518    # 26.78% l2 miss\nl2 miss from l1      15673704213    #\nl2 hit from l2 pf    6879988344     #\nl3 hit from l2 pf    4338307940     #\nl3 miss from l2 pf   11959796785    #\ninstructions         3409729480674  # 3.918 float per 1000 inst\nfloat 512            6370           # 0.000 AVX-512 per 1000 inst\nfloat 256            10224585       # 0.003 AVX-256 per 1000 inst\nfloat 128            13350692007    # 3.915 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\ninstructions         2711526        #\nopcache              1002139        # 369.585 opcache per 1000 inst\nopcache miss         537719         # 53.7% opcache miss rate\nl1 dTLB miss         5687           # 2.097 L1 dTLB per 1000 inst\nl2 dTLB miss         1209           # 0.446 L2 dTLB per 1000 inst\ninstructions         2743858        #\nicache               1331569        # 485.291 icache per 1000 inst\nicache miss          112012         #  8.4% icache miss rate\nl1 iTLB miss         8              # 0.003 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            19             # 0.007 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              933.262\non_cpu               0.120          # 1.93 \/ 16 cores\nutime                1377.167\nstime                421.546\nnvcsw                481761         # 95.77%\nnivcsw               21269          # 4.23%\ninblock              11560          # 12.39\/sec\nonblock              9069416        # 9717.98\/sec\ncpu-clock            1789592121062  # 1789.592 seconds\ntask-clock           1790141697509  # 1790.142 seconds\npage faults          35173091       # 19648.216\/sec\ncontext switches     475446         # 265.591\/sec\ncpu migrations       15612          # 8.721\/sec\nmajor page faults    106            # 0.059\/sec\nminor page faults    35172940       # 19648.132\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             2167976430602  # 185.975 branches per 1000 inst\nbranch misses        46314803455    # 2.14% branch miss\nconditional          2167977636842  # 185.975 conditional branches per 1000 inst\nindirect             236751245693   # 20.309 indirect branches per 1000 inst\nslots                38620667784266 #\nretiring             10062018842874 # 26.1% (26.1%)\n-- ucode             807698917570   #     2.1%\n-- fastpath          9254319925304  #    24.0%\nfrontend             13551174827531 # 35.1% (35.1%)\n-- latency           8548975097340  #    22.1%\n-- bandwidth         5002199730191  #    13.0%\nbackend              9175517911992  # 23.8% (23.8%)\n-- cpu               3393190446142  #     8.8%\n-- memory            5782327465850  #    15.0%\nspeculation          5780856758211  # 15.0% (15.0%) high\n-- branch mispredict 5692519034679  #    14.7%\n-- pipeline restart  88337723532    #     0.2%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           6645969571438  # 0.45 GHz\ninstructions         11758840162591 # 1.77 IPC\nl2 access            390896034086   # 34.055 l2 access per 1000 inst\nl2 miss              145819187216   # 37.30% l2 miss\ncpu-cycles           6464808070216  # 23.7% memory latency\nload stalls          1447783462483  #  2.0% l1 bound\nl1 miss              1317212090663  #  9.1% l2 bound\nl2 miss              731698414701   #  2.4% l3 bound\nl3 miss              576204429564   #  8.9% dram bound\nstore_stalls         84132000633    #  1.3% store bound\n<\/code><\/pre>\n\n\n\n<p>Process overview shows most of the time in ghc (haskell compiler). Numerically there are many &#8220;cc&#8221; but not always time allocated?<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>37688 processes\n\t721 ghc:w                15652.67  6544.89\n\t165 ghc_ticker            1185.23   498.89\n\t120 ghc                   1182.36   497.01\n\t740 ld.gold                 51.83     9.18\n\t 45 nix-shell               27.91     6.15\n\t2328 as                      26.61     7.12\n\t 68 clinfo                  15.54     7.31\n\t 18 nix-instantiate          8.90     1.90\n\t108 Setup:w                  8.65     5.58\n\t1107 cc1                      8.39     4.35\n\t4363 gcc                      4.93     0.65\n\t154 bash                     3.91     1.66\n\t 51 strip                    3.87     5.62\n\t 36 Setup                    2.89     1.86\n\t  9 ranlib                   2.55     3.87\n\t 45 nix-store                1.72     2.82\n\t 63 ghc-pkg                  1.52     0.32\n\t 38 vulkaninfo               1.11     1.33\n\t 72 jq                       0.69     0.00\n\t 18 ar                       0.24     0.33\n\t  4 vulkani:disk$0           0.11     0.14\n\t 27 haddock:w                0.11     0.12\n\t  6 glxinfo:gdrv0            0.10     0.06\n\t  6 glxinfo:gl0              0.10     0.06\n\t  6 php                      0.06     0.20\n\t  2 llvmpipe-0               0.06     0.07\n\t  2 llvmpipe-1               0.06     0.07\n\t  2 llvmpipe-10              0.06     0.07\n\t  2 llvmpipe-11              0.06     0.07\n\t  2 llvmpipe-12              0.06     0.07\n\t  2 llvmpipe-13              0.06     0.07\n\t  2 llvmpipe-14              0.06     0.07\n\t  2 llvmpipe-15              0.06     0.07\n\t  2 llvmpipe-2               0.06     0.07\n\t  2 llvmpipe-3               0.06     0.07\n\t  2 llvmpipe-4               0.06     0.07\n\t  2 llvmpipe-5               0.06     0.07\n\t  2 llvmpipe-6               0.06     0.07\n\t  2 llvmpipe-7               0.06     0.07\n\t  2 llvmpipe-8               0.06     0.07\n\t  2 llvmpipe-9               0.06     0.07\n\t  2 glxinfo                  0.06     0.03\n\t  2 glxinfo:cs0              0.06     0.03\n\t  2 glxinfo:disk$0           0.06     0.03\n\t  2 glxinfo:sh0              0.06     0.03\n\t  2 glxinfo:shlo0            0.06     0.03\n\t  6 clang                    0.04     0.08\n\t  9 haddock                  0.04     0.04\n\t 30 patchelf                 0.00     0.90\n\t  1 lspci                    0.00     0.02\n\t  1 ps                       0.00     0.01\n\t22487 cc                       0.00     0.00\n\t1427 readlink                 0.00     0.00\n\t1272 expand-response          0.00     0.00\n\t795 cp                       0.00     0.00\n\t713 collect2                 0.00     0.00\n\t 87 find                     0.00     0.00\n\t 82 sh                       0.00     0.00\n\t 78 touch                    0.00     0.00\n\t 39 mkdir                    0.00     0.00\n\t 37 grep                     0.00     0.00\n\t 21 chmod                    0.00     0.00\n\t 13 sort                     0.00     0.00\n\t 12 xargs                    0.00     0.00\n\t 11 gsettings                0.00     0.00\n\t  9 \/nix\/store\/871g          0.00     0.00\n\t  9 HsColour                 0.00     0.00\n\t  9 basename                 0.00     0.00\n\t  9 cut                      0.00     0.00\n\t  9 gawk                     0.00     0.00\n\t  9 head                     0.00     0.00\n\t  9 hpc                      0.00     0.00\n\t  9 hsc2hs                   0.00     0.00\n\t  9 jailbreak-cabal          0.00     0.00\n\t  9 mv                       0.00     0.00\n\t  9 runghc                   0.00     0.00\n\t  9 tail                     0.00     0.00\n\t  9 tar                      0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  7 realpath                 0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  6 tr                       0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  4 dirname                  0.00     0.00\n\t  4 gmain                    0.00     0.00\n\t  4 sed                      0.00     0.00\n\t  3 echo                     0.00     0.00\n\t  3 rm                       0.00     0.00\n\t  3 rocminfo                 0.00     0.00\n\t  3 seq                      0.00     0.00\n\t  3 tee                      0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dconf worker             0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n1 processes running\n49 maximum processes\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Time to build the clash-lang Haskell to VDHL\/Verilog\/SystemVerilog compiler. This system builds on only a subset of the cores with runnable processes changing.There is a noticeable amount of interrupts. Topdown profile shows frontend stalls as high with backend stalls next <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/build-clash\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-1485","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1485","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=1485"}],"version-history":[{"count":3,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1485\/revisions"}],"predecessor-version":[{"id":1590,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1485\/revisions\/1590"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=1485"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}