{"id":806,"date":"2024-01-22T10:55:44","date_gmt":"2024-01-22T10:55:44","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=806"},"modified":"2024-01-23T11:25:40","modified_gmt":"2024-01-23T11:25:40","slug":"build-llvm","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/build-llvm\/","title":{"rendered":"build-llvm"},"content":{"rendered":"\n<p>A test of building the llvm compiler stack. The entire stack is built twice, once with Ninja and then with Unix Makefiles. Overall time is slightly faster for Ninja. The overall profile shows except for periods towards end of each build, almost all cores are kept busy.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-54.png\" alt=\"\" class=\"wp-image-838\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-54.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-54-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-54-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Overall topdown profile is a workload dominated by frontend stalls with a lower retirement rate. This is similar to other build-* workloads, though backend\/memory stalls are slightly higher with LLVM and frontend stalls are slightly lower.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-92.png\" alt=\"\" class=\"wp-image-840\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-92.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-92-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-92-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD topdown metrics show little floating point, a high amount of branches and a moderate amount of L2 access\/miss<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              5163.931\non_cpu               0.945          # 15.11 \/ 16 cores\nutime                72481.356\nstime                5561.869\nnvcsw                690828         # 26.70%\nnivcsw               1896731        # 73.30%\ninblock              752            # 0.15\/sec\nonblock              72693768       # 14077.21\/sec\ncpu-clock            78056877516054 # 78056.878 seconds\ntask-clock           78058299825553 # 78058.300 seconds\npage faults          1723272985     # 22076.742\/sec\ncontext switches     2501754        # 32.050\/sec\ncpu migrations       84464          # 1.082\/sec\nmajor page faults    7702           # 0.099\/sec\nminor page faults    1723265283     # 22076.644\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             53968965816670 # 209.027 branches per 1000 inst\nbranch misses        1624267586008  # 3.01% branch miss\nconditional          41377495312890 # 160.259 conditional branches per 1000 inst\nindirect             1239736919294  # 4.802 indirect branches per 1000 inst\ncpu-cycles           318616249965597 # 3.88 GHz\ninstructions         257239532203885 # 0.81 IPC\nslots                638462591892552 #\nretiring             83608634229281 # 13.1% (16.4%)\n-- ucode             98629507536    #     0.0%\n-- fastpath          83510004721745 #    13.1%\nfrontend             200104012133847 # 31.3% (39.2%)\n-- latency           150156168729132 #    23.5%\n-- bandwidth         49947843404715 #     7.8%\nbackend              209812204890246 # 32.9% (41.1%)\n-- cpu               17481000115336 #     2.7%\n-- memory            192331204774910 #    30.1%\nspeculation          16989232947648 #  2.7% ( 3.3%)\n-- branch mispredict 16820144024682 #     2.6%\n-- pipeline restart  169088922966   #     0.0%\nsmt-contention       127947932556371 # 20.0% ( 0.0%)\ncpu-cycles           318995838054107 # 3.88 GHz\ninstructions         257262438648603 # 0.81 IPC\ninstructions         85844563756236 # 57.868 l2 access per 1000 inst\nl2 hit from l1       4340327891636  # 21.96% l2 miss\nl2 miss from l1      756553839542   #\nl2 hit from l2 pf    293116256893   #\nl3 hit from l2 pf    124042424042   #\nl3 miss from l2 pf   210191796922   #\ninstructions         85821948979048 # 18.776 float per 1000 inst\nfloat 512            38882          # 0.000 AVX-512 per 1000 inst\nfloat 256            18532849       # 0.000 AVX-256 per 1000 inst\nfloat 128            1611376882605  # 18.776 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         5              # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics. Perhaps counted slightly differently but this workload as others shows higher IPC, lower GHz and then higher retirement rate than AMD.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              6082.871\non_cpu               0.952          # 15.23 \/ 16 cores\nutime                87827.392\nstime                4786.786\nnvcsw                920851         # 30.45%\nnivcsw               2103588        # 69.55%\ninblock              423848         # 69.68\/sec\nonblock              72678008       # 11947.98\/sec\ncpu-clock            92626124571056 # 92626.125 seconds\ntask-clock           92627957812326 # 92627.958 seconds\npage faults          1715040925     # 18515.370\/sec\ncontext switches     2950846        # 31.857\/sec\ncpu migrations       96215          # 1.039\/sec\nmajor page faults    2083           # 0.022\/sec\nminor page faults    1715038842     # 18515.348\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             53412577346061 # 207.493 branches per 1000 inst\nbranch misses        1337965435641  # 2.50% branch miss\nconditional          53412581027341 # 207.493 conditional branches per 1000 inst\nindirect             10008132965242 # 38.879 indirect branches per 1000 inst\nslots                437772678130796 #\nretiring             133128924746572 # 30.4% (30.4%)\n-- ucode             10347229360575 #     2.4%\n-- fastpath          122781695385997 #    28.0%\nfrontend             154233918698731 # 35.2% (35.2%)\n-- latency           89782197058371 #    20.5%\n-- bandwidth         64451721640360 #    14.7%\nbackend              97914185694265 # 22.4% (22.4%)\n-- cpu               24835061651941 #     5.7%\n-- memory            73079124042324 #    16.7%\nspeculation          53004289903871 # 12.1% (12.1%)\n-- branch mispredict 51316773822425 #    11.7%\n-- pipeline restart  1687516081446  #     0.4%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           194627005685821 # 2.00 GHz\ninstructions         194392697442484 # 1.00 IPC\nl2 access            8477188822805  # 59.773 l2 access per 1000 inst\nl2 miss              2579404345601  # 30.43% l2 miss\n<\/code><\/pre>\n\n\n\n<p>Process summary is incomplete. Ninja completes, Unix makefiles dies part way through, but at least half is sufficient to get a good profile. Overall cpu time is dominated by the C++ front end, with a lesser amount in tblgen program. Many invocations of the assembler with almost no time.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>78068 processes\n\t12328 cc1plus              48003.58  2933.97\n\t1342 llvm-tblgen           2043.24   144.27\n\t901 ld                     423.71   113.10\n\t12793 as                      84.78     7.24\n\t548 cc1                     14.60     1.46\n\t  5 xz                      12.57     0.90\n\t 34 clinfo                   9.59     3.66\n\t7885 cmake                    9.49    10.41\n\t1443 ninja                    3.77     2.85\n\t 19 vulkaninfo               0.73     0.73\n\t850 ranlib                   0.69     5.26\n\t1573 gmake                    0.60     0.61\n\t849 ar                       0.59     5.16\n\t 10 tar                      0.54     9.47\n\t 80 python3.10               0.50     0.04\n\t 13 rm                       0.09     6.98\n\t  2 vulkani:disk$0           0.07     0.07\n\t  3 glxinfo:gdrv0            0.07     0.04\n\t  6 clang                    0.06     0.06\n\t  1 llvmpipe-0               0.04     0.04\n\t  1 llvmpipe-1               0.04     0.04\n\t  1 llvmpipe-10              0.04     0.04\n\t  1 llvmpipe-11              0.04     0.04\n\t  1 llvmpipe-12              0.04     0.04\n\t  1 llvmpipe-13              0.04     0.04\n\t  1 llvmpipe-14              0.04     0.04\n\t  1 llvmpipe-15              0.04     0.04\n\t  1 llvmpipe-2               0.04     0.04\n\t  1 llvmpipe-3               0.04     0.04\n\t  1 llvmpipe-4               0.04     0.04\n\t  1 llvmpipe-5               0.04     0.04\n\t  1 llvmpipe-6               0.04     0.04\n\t  1 llvmpipe-7               0.04     0.04\n\t  1 llvmpipe-8               0.04     0.04\n\t  1 llvmpipe-9               0.04     0.04\n\t  5 py3versions              0.04     0.01\n\t  1 glxinfo                  0.03     0.02\n\t  1 glxinfo:cs0              0.03     0.02\n\t  1 glxinfo:disk$0           0.03     0.02\n\t  1 glxinfo:sh0              0.03     0.02\n\t  1 glxinfo:shlo0            0.03     0.02\n\t  1 ps                       0.00     0.01\n\t22348 sh                       0.00     0.00\n\t12872 c++                      0.00     0.00\n\t949 cc                       0.00     0.00\n\t901 collect2                 0.00     0.00\n\t 30 uname                    0.00     0.00\n\t 25 git                      0.00     0.00\n\t 20 pkg-config               0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t 10 gsettings                0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  7 stat                     0.00     0.00\n\t  6 bash                     0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  6 sed                      0.00     0.00\n\t  5 mkdir                    0.00     0.00\n\t  5 mv                       0.00     0.00\n\t  4 build-llvm               0.00     0.00\n\t  4 phoronix-test-s          0.00     0.00\n\t  3 gmain                    0.00     0.00\n\t  2 dconf worker             0.00     0.00\n\t  2 which                    0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lscpu                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 python                   0.00     0.00\n\t  1 python3                  0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n\t  1 xset                     0.00     0.00\n83 processes running\n99 maximum processes<\/code><\/pre>\n\n\n\n<p>The following pattern is used for compilation<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>            58628) ninja            cpu=3 start=9.98  finish=9.98 \n            58629) ninja            cpu=5 start=9.99  finish=9.99 \n            58630) ninja            cpu=11 start=9.99  finish=9.99 \n            58631) ninja            cpu=4 start=9.99  finish=10.03\n              58632) sh               cpu=15 start=10.00 finish=10.01\n                58633) cc               cpu=0 start=10.00 finish=10.01\n                  58634) cc1              cpu=1 start=10.00 finish=10.01\n                  58635) as               cpu=14 start=10.01 finish=10.01\n              58636) sh               cpu=5 start=10.01 finish=10.03\n                58637) cc               cpu=14 start=10.02 finish=10.03\n                  58638) collect2         cpu=0 start=10.02 finish=10.03\n                    58639) ld               cpu=15 start=10.02 finish=10.03\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>A test of building the llvm compiler stack. The entire stack is built twice, once with Ninja and then with Unix Makefiles. Overall time is slightly faster for Ninja. The overall profile shows except for periods towards end of each <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/build-llvm\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-806","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/806","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=806"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/806\/revisions"}],"predecessor-version":[{"id":841,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/806\/revisions\/841"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=806"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}