{"id":1497,"date":"2024-02-04T10:21:49","date_gmt":"2024-02-04T10:21:49","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=1497"},"modified":"2024-02-09T23:34:13","modified_gmt":"2024-02-09T23:34:13","slug":"build-mesa","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/build-mesa\/","title":{"rendered":"build-mesa"},"content":{"rendered":"\n<p>This workload builds mesa with Meson\/Ninja. There is one build and it completes within a minute. Looks like a classic parallel compile followed by a link step.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-42.png\" alt=\"\" class=\"wp-image-1631\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-42.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-42-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-42-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows a mix of frontend and backend stalls and a relatively low retirement rate.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-44.png\" alt=\"\" class=\"wp-image-1633\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-44.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-44-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-44-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics shows little floating point, and a matched frontend and backend stalls.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              163.906\non_cpu               0.790          # 12.63 \/ 16 cores\nutime                1886.195\nstime                184.596\nnvcsw                56848          # 30.87%\nnivcsw               127332         # 69.13%\ninblock              0              # 0.00\/sec\nonblock              1388432        # 8470.91\/sec\ncpu-clock            2071048415957  # 2071.048 seconds\ntask-clock           2071106212462  # 2071.106 seconds\npage faults          46543416       # 22472.733\/sec\ncontext switches     175524         # 84.749\/sec\ncpu migrations       10132          # 4.892\/sec\nmajor page faults    227            # 0.110\/sec\nminor page faults    46543189       # 22472.623\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             1908130912209  # 209.412 branches per 1000 inst\nbranch misses        40968023376    # 2.15% branch miss\nconditional          1480230238727  # 162.451 conditional branches per 1000 inst\nindirect             40444519371    # 4.439 indirect branches per 1000 inst\ncpu-cycles           8202524114175  # 3.13 GHz\ninstructions         9044347511904  # 1.10 IPC\nslots                16590148343382 #\nretiring             2965706690130  # 17.9% (22.1%)\n-- ucode             3755874833     #     0.0%\n-- fastpath          2961950815297  #    17.9%\nfrontend             5025041411180  # 30.3% (37.4%)\n-- latency           3597710893644  #    21.7%\n-- bandwidth         1427330517536  #     8.6%\nbackend              5007168083459  # 30.2% (37.3%)\n-- cpu               516986356812   #     3.1%\n-- memory            4490181726647  #    27.1%\nspeculation          426799538910   #  2.6% ( 3.2%)\n-- branch mispredict 420334664273   #     2.5%\n-- pipeline restart  6464874637     #     0.0%\nsmt-contention       3165401829088  # 19.1% ( 0.0%)\ncpu-cycles           8203295377763  # 3.12 GHz\ninstructions         9043102000155  # 1.10 IPC\ninstructions         3031177228591  # 42.544 l2 access per 1000 inst\nl2 hit from l1       110789101776   # 18.69% l2 miss\nl2 miss from l1      14472622849    #\nl2 hit from l2 pf    8540442057     #\nl3 hit from l2 pf    4207616137     #\nl3 miss from l2 pf   5421033171     #\ninstructions         3032329413889  # 27.021 float per 1000 inst\nfloat 512            3821           # 0.000 AVX-512 per 1000 inst\nfloat 256            21829          # 0.000 AVX-256 per 1000 inst\nfloat 128            81936088066    # 27.021 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         242            # 0.000 scalar per 1000 inst\ninstructions         2700323        #\nopcache              994769         # 368.389 opcache per 1000 inst\nopcache miss         529578         # 53.2% opcache miss rate\nl1 dTLB miss         7107           # 2.632 L1 dTLB per 1000 inst\nl2 dTLB miss         1281           # 0.474 L2 dTLB per 1000 inst\ninstructions         2728917        #\nicache               1334652        # 489.078 icache per 1000 inst\nicache miss          113690         #  8.5% icache miss rate\nl1 iTLB miss         9              # 0.003 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            19             # 0.007 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              184.657\non_cpu               0.809          # 12.94 \/ 16 cores\nutime                2241.613\nstime                148.765\nnvcsw                62568          # 32.87%\nnivcsw               127758         # 67.13%\ninblock              80336          # 435.05\/sec\nonblock              1376936        # 7456.72\/sec\ncpu-clock            2390480847282  # 2390.481 seconds\ntask-clock           2390536091411  # 2390.536 seconds\npage faults          46530259       # 19464.362\/sec\ncontext switches     181420         # 75.891\/sec\ncpu migrations       9724           # 4.068\/sec\nmajor page faults    356            # 0.149\/sec\nminor page faults    46529903       # 19464.213\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             1887470376627  # 207.923 branches per 1000 inst\nbranch misses        31867530246    # 1.69% branch miss\nconditional          1887470750355  # 207.923 conditional branches per 1000 inst\nindirect             345061524941   # 38.012 indirect branches per 1000 inst\nslots                11549890613120 #\nretiring             4718273251825  # 40.9% (40.9%)\n-- ucode             332523494430   #     2.9%\n-- fastpath          4385749757395  #    38.0%\nfrontend             3969971068077  # 34.4% (34.4%)\n-- latency           1939167474365  #    16.8%\n-- bandwidth         2030803593712  #    17.6%\nbackend              1474722638981  # 12.8% (12.8%) low\n-- cpu               517748601643   #     4.5%\n-- memory            956974037338   #     8.3%\nspeculation          1398329649193  # 12.1% (12.1%) high\n-- branch mispredict 1345271527912  #    11.6%\n-- pipeline restart  53058121281    #     0.5%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           4624134244003  # 1.59 GHz\ninstructions         6236380120473  # 1.35 IPC\nl2 access            223293941969   # 44.844 l2 access per 1000 inst\nl2 miss              58309315082    # 26.11% l2 miss\ncpu-cycles           3684561267847  # 32.2% memory latency\nload stalls          1147424510193  #  7.3% l1 bound\nl1 miss              878932569967   #  9.4% l2 bound\nl2 miss              534380656444   #  3.0% l3 bound\nl3 miss              424413292847   # 11.5% dram bound\nstore_stalls         37966466834    #  1.0% store bound\n<\/code><\/pre>\n\n\n\n<p>Process overview shows most time in the compiler front ends with more C than C++.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>11662 processes\n\t2192 cc1                   1436.56    95.98\n\t380 cc1plus                340.45    35.68\n\t138 python3                 38.39     1.64\n\t114 meson                   25.38     2.85\n\t 68 clinfo                  18.50     5.34\n\t2560 as                       6.20     0.43\n\t106 dpkg-architectu          2.42     0.26\n\t 18 m4                       2.13     0.00\n\t 38 vulkaninfo               1.49     0.95\n\t 75 ld                       1.02     0.58\n\t  1 xz                       0.79     0.03\n\t 16 ninja                    0.52     0.75\n\t  4 vulkani:disk$0           0.15     0.10\n\t  6 glxinfo:gdrv0            0.10     0.04\n\t  6 glxinfo:gl0              0.10     0.04\n\t  4 cmake                    0.09     0.04\n\t 46 ar                       0.08     0.24\n\t  2 llvmpipe-0               0.08     0.05\n\t  2 llvmpipe-1               0.08     0.05\n\t  2 llvmpipe-10              0.08     0.05\n\t  2 llvmpipe-11              0.08     0.05\n\t  2 llvmpipe-12              0.08     0.05\n\t  2 llvmpipe-13              0.08     0.05\n\t  2 llvmpipe-14              0.08     0.05\n\t  2 llvmpipe-15              0.08     0.05\n\t  2 llvmpipe-2               0.08     0.05\n\t  2 llvmpipe-3               0.08     0.05\n\t  2 llvmpipe-4               0.08     0.05\n\t  2 llvmpipe-5               0.08     0.05\n\t  2 llvmpipe-6               0.08     0.05\n\t  2 llvmpipe-7               0.08     0.05\n\t  2 llvmpipe-8               0.08     0.05\n\t  2 llvmpipe-9               0.08     0.05\n\t  6 php                      0.07     0.11\n\t 10 bison                    0.07     0.00\n\t  6 clang                    0.06     0.06\n\t  2 glxinfo                  0.06     0.02\n\t  2 glxinfo:cs0              0.05     0.02\n\t  2 glxinfo:disk$0           0.05     0.02\n\t  2 glxinfo:sh0              0.05     0.02\n\t  2 glxinfo:shlo0            0.05     0.02\n\t  3 rocminfo                 0.03     0.00\n\t  1 tar                      0.02     0.40\n\t  1 lspci                    0.01     0.02\n\t 46 rm                       0.00     0.25\n\t  3 cp                       0.00     0.02\n\t2753 sh                       0.00     0.00\n\t2211 cc                       0.00     0.00\n\t393 c++                      0.00     0.00\n\t119 gcc                      0.00     0.00\n\t 75 collect2                 0.00     0.00\n\t 47 pkg-config               0.00     0.00\n\t 46 gcc-ar                   0.00     0.00\n\t 27 flex                     0.00     0.00\n\t  8 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  8 uname                    0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 gmain                    0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  4 bash                     0.00     0.00\n\t  3 build-mesa               0.00     0.00\n\t  3 dconf worker             0.00     0.00\n\t  3 git                      0.00     0.00\n\t  3 nm                       0.00     0.00\n\t  3 readelf                  0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 output.exe               0.00     0.00\n\t  2 python                   0.00     0.00\n\t  2 sanitycheckc.ex          0.00     0.00\n\t  2 sanitycheckcpp.          0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 ps                       0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n66 maximum processes\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>This workload builds mesa with Meson\/Ninja. There is one build and it completes within a minute. Looks like a classic parallel compile followed by a link step. Topdown profile shows a mix of frontend and backend stalls and a relatively <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/build-mesa\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-1497","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1497","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=1497"}],"version-history":[{"count":3,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1497\/revisions"}],"predecessor-version":[{"id":1634,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1497\/revisions\/1634"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=1497"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}