A test of building the llvm compiler stack. The entire stack is built twice, once with Ninja and then with Unix Makefiles. Overall time is slightly faster for Ninja. The overall profile shows except for periods towards end of each build, almost all cores are kept busy.

Overall topdown profile is a workload dominated by frontend stalls with a lower retirement rate. This is similar to other build-* workloads, though backend/memory stalls are slightly higher with LLVM and frontend stalls are slightly lower.

AMD topdown metrics show little floating point, a high amount of branches and a moderate amount of L2 access/miss

elapsed              5163.931
on_cpu               0.945          # 15.11 / 16 cores
utime                72481.356
stime                5561.869
nvcsw                690828         # 26.70%
nivcsw               1896731        # 73.30%
inblock              752            # 0.15/sec
onblock              72693768       # 14077.21/sec
cpu-clock            78056877516054 # 78056.878 seconds
task-clock           78058299825553 # 78058.300 seconds
page faults          1723272985     # 22076.742/sec
context switches     2501754        # 32.050/sec
cpu migrations       84464          # 1.082/sec
major page faults    7702           # 0.099/sec
minor page faults    1723265283     # 22076.644/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             53968965816670 # 209.027 branches per 1000 inst
branch misses        1624267586008  # 3.01% branch miss
conditional          41377495312890 # 160.259 conditional branches per 1000 inst
indirect             1239736919294  # 4.802 indirect branches per 1000 inst
cpu-cycles           318616249965597 # 3.88 GHz
instructions         257239532203885 # 0.81 IPC
slots                638462591892552 #
retiring             83608634229281 # 13.1% (16.4%)
-- ucode             98629507536    #     0.0%
-- fastpath          83510004721745 #    13.1%
frontend             200104012133847 # 31.3% (39.2%)
-- latency           150156168729132 #    23.5%
-- bandwidth         49947843404715 #     7.8%
backend              209812204890246 # 32.9% (41.1%)
-- cpu               17481000115336 #     2.7%
-- memory            192331204774910 #    30.1%
speculation          16989232947648 #  2.7% ( 3.3%)
-- branch mispredict 16820144024682 #     2.6%
-- pipeline restart  169088922966   #     0.0%
smt-contention       127947932556371 # 20.0% ( 0.0%)
cpu-cycles           318995838054107 # 3.88 GHz
instructions         257262438648603 # 0.81 IPC
instructions         85844563756236 # 57.868 l2 access per 1000 inst
l2 hit from l1       4340327891636  # 21.96% l2 miss
l2 miss from l1      756553839542   #
l2 hit from l2 pf    293116256893   #
l3 hit from l2 pf    124042424042   #
l3 miss from l2 pf   210191796922   #
instructions         85821948979048 # 18.776 float per 1000 inst
float 512            38882          # 0.000 AVX-512 per 1000 inst
float 256            18532849       # 0.000 AVX-256 per 1000 inst
float 128            1611376882605  # 18.776 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         5              # 0.000 scalar per 1000 inst

Intel metrics. Perhaps counted slightly differently but this workload as others shows higher IPC, lower GHz and then higher retirement rate than AMD.

elapsed              6082.871
on_cpu               0.952          # 15.23 / 16 cores
utime                87827.392
stime                4786.786
nvcsw                920851         # 30.45%
nivcsw               2103588        # 69.55%
inblock              423848         # 69.68/sec
onblock              72678008       # 11947.98/sec
cpu-clock            92626124571056 # 92626.125 seconds
task-clock           92627957812326 # 92627.958 seconds
page faults          1715040925     # 18515.370/sec
context switches     2950846        # 31.857/sec
cpu migrations       96215          # 1.039/sec
major page faults    2083           # 0.022/sec
minor page faults    1715038842     # 18515.348/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             53412577346061 # 207.493 branches per 1000 inst
branch misses        1337965435641  # 2.50% branch miss
conditional          53412581027341 # 207.493 conditional branches per 1000 inst
indirect             10008132965242 # 38.879 indirect branches per 1000 inst
slots                437772678130796 #
retiring             133128924746572 # 30.4% (30.4%)
-- ucode             10347229360575 #     2.4%
-- fastpath          122781695385997 #    28.0%
frontend             154233918698731 # 35.2% (35.2%)
-- latency           89782197058371 #    20.5%
-- bandwidth         64451721640360 #    14.7%
backend              97914185694265 # 22.4% (22.4%)
-- cpu               24835061651941 #     5.7%
-- memory            73079124042324 #    16.7%
speculation          53004289903871 # 12.1% (12.1%)
-- branch mispredict 51316773822425 #    11.7%
-- pipeline restart  1687516081446  #     0.4%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           194627005685821 # 2.00 GHz
instructions         194392697442484 # 1.00 IPC
l2 access            8477188822805  # 59.773 l2 access per 1000 inst
l2 miss              2579404345601  # 30.43% l2 miss

Process summary is incomplete. Ninja completes, Unix makefiles dies part way through, but at least half is sufficient to get a good profile. Overall cpu time is dominated by the C++ front end, with a lesser amount in tblgen program. Many invocations of the assembler with almost no time.

78068 processes
	12328 cc1plus              48003.58  2933.97
	1342 llvm-tblgen           2043.24   144.27
	901 ld                     423.71   113.10
	12793 as                      84.78     7.24
	548 cc1                     14.60     1.46
	  5 xz                      12.57     0.90
	 34 clinfo                   9.59     3.66
	7885 cmake                    9.49    10.41
	1443 ninja                    3.77     2.85
	 19 vulkaninfo               0.73     0.73
	850 ranlib                   0.69     5.26
	1573 gmake                    0.60     0.61
	849 ar                       0.59     5.16
	 10 tar                      0.54     9.47
	 80 python3.10               0.50     0.04
	 13 rm                       0.09     6.98
	  2 vulkani:disk$0           0.07     0.07
	  3 glxinfo:gdrv0            0.07     0.04
	  6 clang                    0.06     0.06
	  1 llvmpipe-0               0.04     0.04
	  1 llvmpipe-1               0.04     0.04
	  1 llvmpipe-10              0.04     0.04
	  1 llvmpipe-11              0.04     0.04
	  1 llvmpipe-12              0.04     0.04
	  1 llvmpipe-13              0.04     0.04
	  1 llvmpipe-14              0.04     0.04
	  1 llvmpipe-15              0.04     0.04
	  1 llvmpipe-2               0.04     0.04
	  1 llvmpipe-3               0.04     0.04
	  1 llvmpipe-4               0.04     0.04
	  1 llvmpipe-5               0.04     0.04
	  1 llvmpipe-6               0.04     0.04
	  1 llvmpipe-7               0.04     0.04
	  1 llvmpipe-8               0.04     0.04
	  1 llvmpipe-9               0.04     0.04
	  5 py3versions              0.04     0.01
	  1 glxinfo                  0.03     0.02
	  1 glxinfo:cs0              0.03     0.02
	  1 glxinfo:disk$0           0.03     0.02
	  1 glxinfo:sh0              0.03     0.02
	  1 glxinfo:shlo0            0.03     0.02
	  1 ps                       0.00     0.01
	22348 sh                       0.00     0.00
	12872 c++                      0.00     0.00
	949 cc                       0.00     0.00
	901 collect2                 0.00     0.00
	 30 uname                    0.00     0.00
	 25 git                      0.00     0.00
	 20 pkg-config               0.00     0.00
	 13 gcc                      0.00     0.00
	 10 gsettings                0.00     0.00
	  8 systemd-detect-          0.00     0.00
	  7 stat                     0.00     0.00
	  6 bash                     0.00     0.00
	  6 llvm-link                0.00     0.00
	  6 sed                      0.00     0.00
	  5 mkdir                    0.00     0.00
	  5 mv                       0.00     0.00
	  4 build-llvm               0.00     0.00
	  4 phoronix-test-s          0.00     0.00
	  3 gmain                    0.00     0.00
	  2 dconf worker             0.00     0.00
	  2 which                    0.00     0.00
	  1 date                     0.00     0.00
	  1 dirname                  0.00     0.00
	  1 grep                     0.00     0.00
	  1 ifconfig                 0.00     0.00
	  1 ip                       0.00     0.00
	  1 lscpu                    0.00     0.00
	  1 mktemp                   0.00     0.00
	  1 python                   0.00     0.00
	  1 python3                  0.00     0.00
	  1 qdbus                    0.00     0.00
	  1 readlink                 0.00     0.00
	  1 realpath                 0.00     0.00
	  1 sort                     0.00     0.00
	  1 stty                     0.00     0.00
	  1 systemctl                0.00     0.00
	  1 template.sh              0.00     0.00
	  1 wc                       0.00     0.00
	  1 xrandr                   0.00     0.00
	  1 xset                     0.00     0.00
83 processes running
99 maximum processes

The following pattern is used for compilation

            58628) ninja            cpu=3 start=9.98  finish=9.98 
            58629) ninja            cpu=5 start=9.99  finish=9.99 
            58630) ninja            cpu=11 start=9.99  finish=9.99 
            58631) ninja            cpu=4 start=9.99  finish=10.03
              58632) sh               cpu=15 start=10.00 finish=10.01
                58633) cc               cpu=0 start=10.00 finish=10.01
                  58634) cc1              cpu=1 start=10.00 finish=10.01
                  58635) as               cpu=14 start=10.01 finish=10.01
              58636) sh               cpu=5 start=10.01 finish=10.03
                58637) cc               cpu=14 start=10.02 finish=10.03
                  58638) collect2         cpu=0 start=10.02 finish=10.03
                    58639) ld               cpu=15 start=10.02 finish=10.03