{"id":224,"date":"2024-01-05T10:08:53","date_gmt":"2024-01-05T10:08:53","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=224"},"modified":"2024-01-05T20:20:38","modified_gmt":"2024-01-05T20:20:38","slug":"build-gcc-2","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/build-gcc-2\/","title":{"rendered":"build-gcc"},"content":{"rendered":"\n<p>Classic compiler build, running approximately an hour and providing a good stress test.  Compile workloads seem to have some of the highest frontend metrics (green) and many small quick processes. Branch prediction can also be higher (yellow) and there is some memory (blue).<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-20.png\" alt=\"\" class=\"wp-image-285\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-20.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-20-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-20-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD  metrics show a lot of branches and sometimes higher miss rate.  The &#8220;on_cpu&#8221; amount also reflects waiting for disk and parts where the compilation is not as multi-threaded.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              3427.385\non_cpu               0.624          # 9.99 \/ 16 cores\nutime                32190.169\nstime                2045.712\nnvcsw                1985244        # 48.41%\nnivcsw               2115761        # 51.59%\ninblock              516424         # 150.68\/sec\nonblock              98507112       # 28741.19\/sec\ncpu-clock            34213116381288 # 34213.116 seconds\ntask-clock           34214907130799 # 34214.907 seconds\npage faults          513204835      # 14999.451\/sec\ncontext switches     3464477        # 101.256\/sec\ncpu migrations       239642         # 7.004\/sec\nmajor page faults    17576          # 0.514\/sec\nminor page faults    513187259      # 14998.938\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             30568358827057 # 192.709 branches per 1000 inst\nbranch misses        701201423712   # 2.29% branch miss\nconditional          17638912593304 # 111.199 conditional branches per 1000 inst\nindirect             487258108842   # 3.072 indirect branches per 1000 inst\ncpu-cycles           138568869943175 # 2.53 GHz\ninstructions         156207628466024 # 1.13 IPC\nslots                283067952990174 #\nretiring             55133090858931 # 19.5% (23.1%)\n-- ucode             32000982449    #     0.0%\n-- fastpath          55101089876482 #    19.5%\nfrontend             115172179636077 # 40.7% (48.2%)\n-- latency           86764652204442 #    30.7%\n-- bandwidth         28407527431635 #    10.0%\nbackend              57849381391632 # 20.4% (24.2%)\n-- cpu               6077068920462  #     2.1%\n-- memory            51772312471170 #    18.3%\nspeculation          10655334197850 #  3.8% ( 4.5%)\n-- branch mispredict 8804136631900  #     3.1%\n-- pipeline restart  1851197565950  #     0.7%\nsmt-contention       44257340647327 # 15.6% ( 0.0%)\ncpu-cycles           138652693363066 # 2.52 GHz\ninstructions         156238086746318 # 1.13 IPC\ninstructions         52819253385890 # 40.743 l2 access per 1000 inst\nl2 hit from l1       1931591401779  # 16.62% l2 miss\nl2 miss from l1      244520309408   #\nl2 hit from l2 pf    107325558661   #\nl3 hit from l2 pf    59042583282    #\nl3 miss from l2 pf   54059169410    #\ninstructions         52796829647902 # 7.270 float per 1000 inst\nfloat 512            179380         # 0.000 AVX-512 per 1000 inst\nfloat 256            21703524       # 0.000 AVX-256 per 1000 inst\nfloat 128            383802719307   # 7.269 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst<\/code><\/pre>\n\n\n\n<p>Intel metrics including 10% of the time attributed to branch misprediction.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              3634.438\non_cpu               0.676          # 10.82 \/ 16 cores\nutime                37835.081\nstime                1472.287\nnvcsw                1922325        # 49.28%\nnivcsw               1978190        # 50.72%\ninblock              614832         # 169.17\/sec\nonblock              98598064       # 27128.83\/sec\ncpu-clock            39269125917324 # 39269.126 seconds\ntask-clock           39271771617236 # 39271.772 seconds\npage faults          512960631      # 13061.815\/sec\ncontext switches     3281468        # 83.558\/sec\ncpu migrations       389949         # 9.929\/sec\nmajor page faults    12097          # 0.308\/sec\nminor page faults    512948534      # 13061.507\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             30254370267193 # 191.071 branches per 1000 inst\nbranch misses        455639475516   # 1.51% branch miss\nconditional          30254391920953 # 191.071 conditional branches per 1000 inst\nindirect             8710920825064  # 55.014 indirect branches per 1000 inst\nslots                220536156479210 #\nretiring             90522893095036 # 41.0% (41.0%)\n-- ucode             7820401372828  #     3.5%\n-- fastpath          82702491722208 #    37.5%\nfrontend             81438050633294 # 36.9% (36.9%)\n-- latency           38745936530923 #    17.6%\n-- bandwidth         42692114102371 #    19.4%\nbackend              25344106231891 # 11.5% (11.5%)\n-- cpu               11855996482089 #     5.4%\n-- memory            13488109749802 #     6.1%\nspeculation          23721466183852 # 10.8% (10.8%)\n-- branch mispredict 22607620934670 #    10.3%\n-- pipeline restart  1113845249182  #     0.5%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           84130561222309 # 1.46 GHz\ninstructions         121342432799798 # 1.44 IPC\nl2 access            3608782396433  # 39.796 l2 access per 1000 inst\nl2 miss              782438848933   # 21.68% l2 miss<\/code><\/pre>\n\n\n\n<p>Over 600,000 processes provides a good stress test for the cpu tracing.   The 500+ processes marked as running are ones where we somehow received the fork event but missed an exit event to close out the process.  The largest fraction of overall time is spent in the compiler processes.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>676238 processes\n\t14002 cc1plus              27709.21   796.22\n\t27073 cc1                   2320.38   152.84\n\t38430 as                     441.70    51.52\n\t  9 genattrtab             245.76     0.94\n\t8420 ld                     184.42    32.95\n\t  9 genautomata            136.13     3.53\n\t227581 bash                    37.29    62.66\n\t  9 genoutput               34.05     0.48\n\t 27 genpreds                22.00     0.35\n\t  9 genrecog                19.84     0.58\n\t144 msgmerge                17.94     0.16\n\t  9 genextract              11.62     0.16\n\t  9 gencodes                11.49     0.12\n\t908 make                    11.28     4.12\n\t  9 genemit                 11.23     0.27\n\t  9 genconfig               11.20     0.12\n\t  9 genopinit               10.97     0.16\n\t  9 genpeep                 10.91     0.17\n\t  9 genattr                 10.84     0.18\n\t  9 genflags                10.80     0.21\n\t  9 genattr-common          10.68     0.16\n\t  9 gentarget-def           10.68     0.14\n\t1111 f951                     7.18     0.70\n\t 15 pod2man                  7.08     0.02\n\t  9 genconditions            6.90     0.11\n\t468 fixincl                  5.12     0.19\n\t1122 mawk                     3.41     0.02\n\t  1 xz                       3.34     0.18\n\t 24 perl                     2.93     0.21\n\t 18 gengtype                 2.71     0.20\n\t 24 genchecksum              1.27     0.21\n\t 18 genmatch                 1.13     0.12\n\t71119 sed                      0.84     0.00\n\t 38 vulkaninfo               0.76     1.06\n\t579 ar                       0.69     6.85\n\t261 print                    0.65     0.06\n\t 33 cc1obj                   0.42     0.05\n\t  6 php                      0.28     1.25\n\t297 ranlib                   0.27     6.04\n\t46226 cat                      0.26     2.29\n\t 13 tar                      0.21     2.89\n\t  9 genenums                 0.12     0.00\n\t  9 genconstants             0.11     0.00\n\t57476 rm                       0.09     5.62\n\t  4 vulkani:disk$0           0.08     0.11\n\t  9 genmddeps                0.08     0.01\n\t  2 llvmpipe-0               0.04     0.06\n\t  2 llvmpipe-1               0.04     0.06\n\t  2 llvmpipe-10              0.04     0.06\n\t  2 llvmpipe-11              0.04     0.06\n\t  2 llvmpipe-12              0.04     0.06\n\t  2 llvmpipe-13              0.04     0.06\n\t  2 llvmpipe-14              0.04     0.06\n\t  2 llvmpipe-15              0.04     0.06\n\t  2 llvmpipe-2               0.04     0.06\n\t  2 llvmpipe-3               0.04     0.06\n\t  2 llvmpipe-4               0.04     0.06\n\t  2 llvmpipe-5               0.04     0.06\n\t  2 llvmpipe-6               0.04     0.06\n\t  2 llvmpipe-7               0.04     0.06\n\t  2 llvmpipe-8               0.04     0.06\n\t  2 llvmpipe-9               0.04     0.06\n\t  6 clang                    0.04     0.02\n\t7030 xg++                     0.00     0.23\n\t197 find                     0.00     0.20\n\t4574 mkdir                    0.00     0.09\n\t396 nm                       0.00     0.09\n\t3216 g++                      0.00     0.06\n\t  1 lspci                    0.00     0.03\n\t28728 xgcc                     0.00     0.02\n\t39074 basename                 0.00     0.00\n\t22391 mv                       0.00     0.00\n\t12805 grep                     0.00     0.00\n\t7569 collect2                 0.00     0.00\n\t6644 rmdir                    0.00     0.00\n\t5727 cp                       0.00     0.00\n\t5502 expr                     0.00     0.00\n\t5136 dirname                  0.00     0.00\n\t4566 strip                    0.00     0.00\n\t4459 gcc                      0.00     0.00\n\t3663 ln                       0.00     0.00\n\t3199 uname                    0.00     0.00\n\t2295 cmp                      0.00     0.00\n\t1626 chmod                    0.00     0.00\n\t1372 sort                     0.00     0.00\n\t1120 gfortran                 0.00     0.00\n\t1057 tr                       0.00     0.00\n\t1023 conftest                 0.00     0.00\n\t541 hostname                 0.00     0.00\n\t420 diff                     0.00     0.00\n\t408 ls                       0.00     0.00\n\t312 arch                     0.00     0.00\n\t291 awk                      0.00     0.00\n\t248 sh                       0.00     0.00\n\t231 install                  0.00     0.00\n\t207 touch                    0.00     0.00\n\t198 objdump                  0.00     0.00\n\t165 echo                     0.00     0.00\n\t160 mktemp                   0.00     0.00\n\t140 tmpmultilib3             0.00     0.00\n\t135 missing                  0.00     0.00\n\t 93 getconf                  0.00     0.00\n\t 84 file                     0.00     0.00\n\t 84 true                     0.00     0.00\n\t 84 uniq                     0.00     0.00\n\t 82 tmpmultilib4             0.00     0.00\n\t 72 sleep                    0.00     0.00\n\t 72 tmpmultilib              0.00     0.00\n\t 54 msgfmt                   0.00     0.00\n\t 48 configure                0.00     0.00\n\t 47 cc                       0.00     0.00\n\t 44 move-if-change           0.00     0.00\n\t 36 genhooks                 0.00     0.00\n\t 36 genmodes                 0.00     0.00\n\t 36 xgettext                 0.00     0.00\n\t 20 which                    0.00     0.00\n\t 18 bison                    0.00     0.00\n\t 18 gencfn-macros            0.00     0.00\n\t 18 ld.gold                  0.00     0.00\n\t 15 compare-debug            0.00     0.00\n\t 15 tmpmultilib2             0.00     0.00\n\t 10 date                     0.00     0.00\n\t  9 c++filt                  0.00     0.00\n\t  9 gencheck                 0.00     0.00\n\t  9 gencondmd                0.00     0.00\n\t  9 gengenrtl                0.00     0.00\n\t  9 genversion               0.00     0.00\n\t  9 git                      0.00     0.00\n\t  9 mkheader.sh              0.00     0.00\n\t  9 objcopy                  0.00     0.00\n\t  9 python3                  0.00     0.00\n\t  9 readelf                  0.00     0.00\n\t  9 stty                     0.00     0.00\n\t  9 tail                     0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  7 gsettings                0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  6 pwd                      0.00     0.00\n\t  5 glxinfo                  0.00     0.00\n\t  5 gmain                    0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  4 dconf worker             0.00     0.00\n\t  3 build-gcc \t  3 xargs                    0.00     0.00\n\t  2 clinfo                   0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 setterm                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 ps                       0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n573 processes running\n640 maximum processes\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Classic compiler build, running approximately an hour and providing a good stress test. Compile workloads seem to have some of the highest frontend metrics (green) and many small quick processes. Branch prediction can also be higher (yellow) and there is <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/build-gcc-2\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-224","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/224","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=224"}],"version-history":[{"count":3,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/224\/revisions"}],"predecessor-version":[{"id":286,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/224\/revisions\/286"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=224"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}