{"id":615,"date":"2024-01-16T05:10:46","date_gmt":"2024-01-16T05:10:46","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=615"},"modified":"2024-01-16T05:36:08","modified_gmt":"2024-01-16T05:36:08","slug":"build-linux-kernel","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/build-linux-kernel\/","title":{"rendered":"build-linux-kernel"},"content":{"rendered":"\n<p>Time to build the linux kernel and all the modules. The first three sections are the small workload building the kernel, the long three are building all the modules. The run queue seems to top out at 16 runnable processes very consistently with occasional spikes to 45  and almost 100% of the time is spent in compilation.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-25.png\" alt=\"\" class=\"wp-image-618\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-25.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-25-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-25-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown is similar to other compiler tasks with a large number of quick processes and a high amount of frontend stalls.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-62.png\" alt=\"\" class=\"wp-image-620\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-62.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-62-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-62-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show ~20% of the code is branches, on cpu keeps all cores busy<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              4384.994\non_cpu               0.948          # 15.17 \/ 16 cores\nutime                57169.074\nstime                9330.941\nnvcsw                16234616       # 51.42%\nnivcsw               15340066       # 48.58%\ninblock              1508792        # 344.08\/sec\nonblock              71794720       # 16372.82\/sec\ncpu-clock            66453691923913 # 66453.692 seconds\ntask-clock           66455107133213 # 66455.107 seconds\npage faults          2366940384     # 35617.133\/sec\ncontext switches     29689799       # 446.765\/sec\ncpu migrations       1286096        # 19.353\/sec\nmajor page faults    40348          # 0.607\/sec\nminor page faults    2366900036     # 35616.526\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             60287753558225 # 209.069 branches per 1000 inst\nbranch misses        1900198031594  # 3.15% branch miss\nconditional          45363845907509 # 157.315 conditional branches per 1000 inst\nindirect             1394625009817  # 4.836 indirect branches per 1000 inst\ncpu-cycles           251756576312197 # 3.60 GHz\ninstructions         281890310273018 # 1.12 IPC\nslots                521153828888532 #\nretiring             94928942347433 # 18.2% (21.8%)\n-- ucode             107890432544   #     0.0%\n-- fastpath          94821051914889 #    18.2%\nfrontend             202492707510017 # 38.9% (46.4%)\n-- latency           152779147446612 #    29.3%\n-- bandwidth         49713560063405 #     9.5%\nbackend              121649005397933 # 23.3% (27.9%)\n-- cpu               17504743131630 #     3.4%\n-- memory            104144262266303 #    20.0%\nspeculation          17279631076873 #  3.3% ( 4.0%)\n-- branch mispredict 17082830645741 #     3.3%\n-- pipeline restart  196800431132   #     0.0%\nsmt-contention       84801816758218 # 16.3% ( 0.0%)\ncpu-cycles           251756576312197 # 3.60 GHz\ninstructions         281890310273018 # 1.12 IPC\nslots                521153828888532 #\nretiring             94928942347433 # 18.2% (21.8%)\n-- ucode             107890432544   #     0.0%\n-- fastpath          94821051914889 #    18.2%\nfrontend             202492707510017 # 38.9% (46.4%)\n-- latency           152779147446612 #    29.3%\n-- bandwidth         49713560063405 #     9.5%\nbackend              121649005397933 # 23.3% (27.9%)\n-- cpu               17504743131630 #     3.4%\n-- memory            104144262266303 #    20.0%\nspeculation          17279631076873 #  3.3% ( 4.0%)\n-- branch mispredict 17082830645741 #     3.3%\n-- pipeline restart  196800431132   #     0.0%\nsmt-contention       84801816758218 # 16.3% ( 0.0%)\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              5364.641\non_cpu               0.940          # 15.04 \/ 16 cores\nutime                73062.973\nstime                7600.396\nnvcsw                16447956       # 52.21%\nnivcsw               15057053       # 47.79%\ninblock              12795984       # 2385.25\/sec\nonblock              71783544       # 13380.87\/sec\ncpu-clock            80624919765176 # 80624.920 seconds\ntask-clock           80625961666563 # 80625.962 seconds\npage faults          2364809417     # 29330.620\/sec\ncontext switches     29702086       # 368.394\/sec\ncpu migrations       1405666        # 17.434\/sec\nmajor page faults    36674          # 0.455\/sec\nminor page faults    2364772743     # 29330.165\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             59134611424110 # 206.302 branches per 1000 inst\nbranch misses        1373270026422  # 2.32% branch miss\nconditional          59134676880558 # 206.302 conditional branches per 1000 inst\nindirect             12260753772451 # 42.774 indirect branches per 1000 inst\nslots                359500793544890 #\nretiring             139217767267262 # 38.7% (38.7%)\n-- ucode             10907139025969 #     3.0%\n-- fastpath          128310628241293 #    35.7%\nfrontend             136579197555329 # 38.0% (38.0%)\n-- latency           65433801874971 #    18.2%\n-- bandwidth         71145395680358 #    19.8%\nbackend              29896581085111 #  8.3% ( 8.3%)\n-- cpu               12871473352899 #     3.6%\n-- memory            17025107732212 #     4.7%\nspeculation          54440246559477 # 15.1% (15.1%)\n-- branch mispredict 52519725725341 #    14.6%\n-- pipeline restart  1920520834136  #     0.5%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           146212049839027 # 1.72 GHz\ninstructions         188401538776691 # 1.29 IPC\nl2 access            6914129112934  # 47.173 l2 access per 1000 inst\nl2 miss              1426172948897  # 20.63% l2 miss<\/code><\/pre>\n\n\n\n<p>Process summary shows most of the time is spent in the compiler front end (not a complate count since it crashed in second build but still a lot of processes&#8230;)<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>1687569 processes\n\t96078 cc1                  41429.35  4508.53\n\t10782 genksyms              1708.71   109.67\n\t18256 make                   222.14    72.39\n\t82007 as                      82.24     4.88\n\t30098 objtool                 61.47    21.11\n\t  5 xz                      52.47     1.15\n\t  8 gzip                    25.86     0.32\n\t24802 ld                      23.54    14.13\n\t  5 modpost                 17.03     2.31\n\t 34 clinfo                  10.25     2.99\n\t  3 openssl                  5.96     0.02\n\t 10 kallsyms                 5.55     0.20\n\t84426 fixdep                   3.15    23.83\n\t 84 find                     2.89     0.66\n\t117164 nm                       2.17     0.37\n\t3885 ar                       0.97     3.10\n\t 20 m4                       0.79     0.00\n\t148 objcopy                  0.77     1.25\n\t 19 vulkaninfo               0.76     0.57\n\t96533 gcc                      0.57     0.00\n\t14121 sed                      0.57     0.00\n\t  5 tar                      0.44     6.66\n\t  4 conf                     0.41     0.32\n\t 16 relocs                   0.35     0.46\n\t1854 xargs                    0.29     4.03\n\t 12 sorttable                0.26     0.02\n\t55191 grep                     0.23     0.14\n\t699860 sh                       0.19     1.99\n\t21982 perl                     0.15     0.01\n\t  5 build                    0.14     0.08\n\t  3 glxinfo:gdrv0            0.10     0.03\n\t  3 cpio                     0.08     1.85\n\t  2 vulkani:disk$0           0.08     0.06\n\t  6 clang                    0.07     0.05\n\t92612 rm                       0.05     8.84\n\t  1 llvmpipe-0               0.04     0.03\n\t  1 llvmpipe-1               0.04     0.03\n\t  1 llvmpipe-10              0.04     0.03\n\t  1 llvmpipe-11              0.04     0.03\n\t  1 llvmpipe-12              0.04     0.03\n\t  1 llvmpipe-13              0.04     0.03\n\t  1 llvmpipe-14              0.04     0.03\n\t  1 llvmpipe-15              0.04     0.03\n\t  1 llvmpipe-2               0.04     0.03\n\t  1 llvmpipe-3               0.04     0.03\n\t  1 llvmpipe-4               0.04     0.03\n\t  1 llvmpipe-5               0.04     0.03\n\t  1 llvmpipe-6               0.04     0.03\n\t  1 llvmpipe-7               0.04     0.03\n\t  1 llvmpipe-8               0.04     0.03\n\t  1 llvmpipe-9               0.04     0.03\n\t  1 glxinfo                  0.04     0.02\n\t  1 glxinfo:cs0              0.04     0.02\n\t  1 glxinfo:disk$0           0.04     0.02\n\t  1 glxinfo:sh0              0.04     0.01\n\t  1 glxinfo:shlo0            0.04     0.01\n\t 19 ls                       0.01     0.12\n\t6923 cat                      0.00     0.30\n\t125407 check-local-exp          0.00     0.00\n\t84807 awk                      0.00     0.00\n\t7543 mkdir                    0.00     0.00\n\t2934 touch                    0.00     0.00\n\t2865 unifdef                  0.00     0.00\n\t2194 wc                       0.00     0.00\n\t2069 tr                       0.00     0.00\n\t413 pahole-flags.sh          0.00     0.00\n\t223 collect2                 0.00     0.00\n\t169 mv                       0.00     0.00\n\t124 dtc                      0.00     0.00\n\t 95 objdump                  0.00     0.00\n\t 80 strip                    0.00     0.00\n\t 72 diff                     0.00     0.00\n\t 66 pkg-config               0.00     0.00\n\t 63 asn1_compiler            0.00     0.00\n\t 58 head                     0.00     0.00\n\t 58 uname                    0.00     0.00\n\t 54 getconf                  0.00     0.00\n\t 40 cmp                      0.00     0.00\n\t 30 flex                     0.00     0.00\n\t 30 mkregtable               0.00     0.00\n\t 30 sort                     0.00     0.00\n\t 28 basename                 0.00     0.00\n\t 23 dirname                  0.00     0.00\n\t 20 cut                      0.00     0.00\n\t 18 mkcompile_h              0.00     0.00\n\t 17 sha1sum                  0.00     0.00\n\t 16 cc-can-link.sh           0.00     0.00\n\t 16 cc-version.sh            0.00     0.00\n\t 16 uniq                     0.00     0.00\n\t 15 bash                     0.00     0.00\n\t 15 bin2c                    0.00     0.00\n\t 15 expr                     0.00     0.00\n\t 15 extract-cert             0.00     0.00\n\t 15 vdso2c                   0.00     0.00\n\t 13 bison                    0.00     0.00\n\t 12 as-version.sh            0.00     0.00\n\t 12 gsettings                0.00     0.00\n\t 12 ld-version.sh            0.00     0.00\n\t 12 min-tool-versio          0.00     0.00\n\t 12 tail                     0.00     0.00\n\t 11 mktemp                   0.00     0.00\n\t 10 git                      0.00     0.00\n\t  9 md5sum                   0.00     0.00\n\t  9 pnmtologo                0.00     0.00\n\t  8 gcc-x86_32-has-          0.00     0.00\n\t  8 gcc-x86_64-has-          0.00     0.00\n\t  8 pahole-version.          0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  8 which                    0.00     0.00\n\t  7 stat                     0.00     0.00\n\t  6 bc                       0.00     0.00\n\t  6 conmakehash              0.00     0.00\n\t  6 date                     0.00     0.00\n\t  6 gen_crc32table           0.00     0.00\n\t  6 gen_init_cpio            0.00     0.00\n\t  6 genheaders               0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  6 remove-stale-fi          0.00     0.00\n\t  6 whoami                   0.00     0.00\n\t  5 build-linux-ker          0.00     0.00\n\t  5 build-version            0.00     0.00\n\t  5 link-vmlinux.sh          0.00     0.00\n\t  5 ln                       0.00     0.00\n\t  5 mk_elfconfig             0.00     0.00\n\t  5 mkcpustr                 0.00     0.00\n\t  5 mkpiggy                  0.00     0.00\n\t  4 fdtoverlay               0.00     0.00\n\t  4 phoronix-test-s          0.00     0.00\n\t  4 readlink                 0.00     0.00\n\t  4 rust_is_availab          0.00     0.00\n\t  4 tools-support-r          0.00     0.00\n\t  3 gen_crc64table           0.00     0.00\n\t  3 mktables                 0.00     0.00\n\t  2 genmap                   0.00     0.00\n\t  2 gmain                    0.00     0.00\n\t  2 makemapdata              0.00     0.00\n\t  1 cc                       0.00     0.00\n\t  1 dconf worker             0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lscpu                    0.00     0.00\n\t  1 ps                       0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 xrandr                   0.00     0.00\n\t  1 xset                     0.00     0.00\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Time to build the linux kernel and all the modules. The first three sections are the small workload building the kernel, the long three are building all the modules. The run queue seems to top out at 16 runnable processes <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/build-linux-kernel\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-615","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/615","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=615"}],"version-history":[{"count":3,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/615\/revisions"}],"predecessor-version":[{"id":624,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/615\/revisions\/624"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=615"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}