Measuring the time to build the Eigen examples. This is a single-threaded workload.

Topdown profile has frontend stalls highest, retirement medium and backend stalls lower.

AMD metrics confirm the single-threaded nature. There is not much floating point and a higher amount of branches. The page fault rate is high.

elapsed              192.066
on_cpu               0.058          # 0.93 / 16 cores
utime                155.424
stime                22.525
nvcsw                8485           # 81.00%
nivcsw               1990           # 19.00%
inblock              0              # 0.00/sec
onblock              1036120        # 5394.60/sec
cpu-clock            177932719483   # 177.933 seconds
task-clock           177939721084   # 177.940 seconds
page faults          10960537       # 61596.910/sec
context switches     10634          # 59.762/sec
cpu migrations       340            # 1.911/sec
major page faults    3              # 0.017/sec
minor page faults    10960534       # 61596.893/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             304119200242   # 212.340 branches per 1000 inst
branch misses        7766876223     # 2.55% branch miss
conditional          233491815928   # 163.027 conditional branches per 1000 inst
indirect             6247171032     # 4.362 indirect branches per 1000 inst
cpu-cycles           809822141694   # 0.27 GHz
instructions         1420863958147  # 1.75 IPC
slots                1632435207282  #
retiring             461919959768   # 28.3% (28.3%)
-- ucode             409300568      #     0.0%
-- fastpath          461510659200   #    28.3%
frontend             688181256897   # 42.2% (42.2%)
-- latency           489140540280   #    30.0%
-- bandwidth         199040716617   #    12.2%
backend              372531805791   # 22.8% (22.8%)
-- cpu               48735530739    #     3.0%
-- memory            323796275052   #    19.8%
speculation          109563846712   #  6.7% ( 6.7%)
-- branch mispredict 107887873935   #     6.6%
-- pipeline restart  1675972777     #     0.1%
smt-contention       237974821      #  0.0% ( 0.0%)
cpu-cycles           810688764115   # 0.27 GHz
instructions         1419010189378  # 1.75 IPC
instructions         475183462637   # 48.484 l2 access per 1000 inst
l2 hit from l1       20764952358    # 19.77% l2 miss
l2 miss from l1      3424321293     #
l2 hit from l2 pf    1144198001     #
l3 hit from l2 pf    771547943      #
l3 miss from l2 pf   358295900      #
instructions         477486479175   # 18.548 float per 1000 inst
float 512            172            # 0.000 AVX-512 per 1000 inst
float 256            257086         # 0.001 AVX-256 per 1000 inst
float 128            8856160036     # 18.547 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         0              # 0.000 scalar per 1000 inst
instructions         2692214        #
opcache              995150         # 369.640 opcache per 1000 inst
opcache miss         535421         # 53.8% opcache miss rate
l1 dTLB miss         6949           # 2.581 L1 dTLB per 1000 inst
l2 dTLB miss         1265           # 0.470 L2 dTLB per 1000 inst
instructions         2738628        #
icache               1321546        # 482.558 icache per 1000 inst
icache miss          110332         #  8.3% icache miss rate
l1 iTLB miss         7              # 0.003 L1 iTLB per 1000 inst
l2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst
tlb flush            19             # 0.007 TLB flush per 1000 inst

Intel metrics

elapsed              208.928
on_cpu               0.058          # 0.93 / 16 cores
utime                178.327
stime                15.590
nvcsw                11389          # 84.90%
nivcsw               2026           # 15.10%
inblock              47560          # 227.64/sec
onblock              1024896        # 4905.50/sec
cpu-clock            193849295915   # 193.849 seconds
task-clock           193858763133   # 193.859 seconds
page faults          10950372       # 56486.340/sec
context switches     13663          # 70.479/sec
cpu migrations       422            # 2.177/sec
major page faults    267            # 1.377/sec
minor page faults    10950105       # 56484.963/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             300412569450   # 210.542 branches per 1000 inst
branch misses        5055599168     # 1.68% branch miss
conditional          300412601450   # 210.542 conditional branches per 1000 inst
indirect             6253736930     # 4.383 indirect branches per 1000 inst
slots                4349057855702  #
retiring             1330460791228  # 30.6% (30.6%)
-- ucode             96253339275    #     2.2%
-- fastpath          1234207451953  #    28.4%
frontend             1618202986437  # 37.2% (37.2%)
-- latency           857127981988   #    19.7%
-- bandwidth         761075004449   #    17.5%
backend              810368367002   # 18.6% (18.6%)
-- cpu               282251961137   #     6.5%
-- memory            528116405865   #    12.1%
speculation          598085359038   # 13.8% (13.8%) high
-- branch mispredict 570655432259   #    13.1%
-- pipeline restart  27429926779    #     0.6%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           726240052738   # 0.22 GHz
instructions         1422860814808  # 1.96 IPC
l2 access            62747872551    # 44.119 l2 access per 1000 inst
l2 miss              16238642601    # 25.88% l2 miss
cpu-cycles           725580737432   # 22.6% memory latency
load stalls          158209076663   #  1.9% l1 bound
l1 miss              144170544495   # 10.3% l2 bound
l2 miss              69226808624    #  4.1% l3 bound
l3 miss              39673165073    #  5.5% dram bound
store_stalls         5452926964     #  0.8% store bound

Process summary suggests c++ code with time spent in the front end.

981 processes
	204 cc1plus                147.93    16.46
	 68 clinfo                  18.51     7.32
	204 as                       3.55     0.81
	 38 vulkaninfo               1.43     1.33
	  3 bzip2                    0.51     0.01
	  4 vulkani:disk$0           0.16     0.14
	  6 glxinfo:gdrv0            0.13     0.07
	  6 glxinfo:gl0              0.13     0.07
	  2 llvmpipe-0               0.09     0.07
	  2 llvmpipe-1               0.09     0.07
	  2 llvmpipe-10              0.09     0.07
	  2 llvmpipe-11              0.09     0.07
	  2 llvmpipe-12              0.09     0.07
	  2 llvmpipe-13              0.09     0.07
	  2 llvmpipe-14              0.09     0.07
	  2 llvmpipe-15              0.09     0.07
	  2 llvmpipe-2               0.09     0.07
	  2 llvmpipe-3               0.09     0.07
	  2 llvmpipe-4               0.09     0.07
	  2 llvmpipe-5               0.09     0.07
	  2 llvmpipe-6               0.09     0.07
	  2 llvmpipe-7               0.09     0.07
	  2 llvmpipe-8               0.09     0.07
	  2 llvmpipe-9               0.09     0.07
	  6 php                      0.08     0.07
	  2 glxinfo                  0.07     0.03
	  2 glxinfo:cs0              0.07     0.03
	  2 glxinfo:disk$0           0.07     0.03
	  2 glxinfo:sh0              0.07     0.03
	  2 glxinfo:shlo0            0.07     0.03
	  6 clang                    0.05     0.07
	  3 rocminfo                 0.03     0.00
	  1 lspci                    0.01     0.02
	  3 tar                      0.00     0.16
	  4 rm                       0.00     0.09
	  3 build-eigen              0.00     0.03
	  1 ps                       0.00     0.01
	204 c++                      0.00     0.00
	 86 sh                       0.00     0.00
	 13 gcc                      0.00     0.00
	 11 gsettings                0.00     0.00
	  8 stat                     0.00     0.00
	  8 systemd-detect-          0.00     0.00
	  6 llvm-link                0.00     0.00
	  5 phoronix-test-s          0.00     0.00
	  4 bash                     0.00     0.00
	  4 gmain                    0.00     0.00
	  2 cc                       0.00     0.00
	  2 lscpu                    0.00     0.00
	  2 uname                    0.00     0.00
	  2 which                    0.00     0.00
	  2 xset                     0.00     0.00
	  1 date                     0.00     0.00
	  1 dconf worker             0.00     0.00
	  1 dirname                  0.00     0.00
	  1 dmesg                    0.00     0.00
	  1 dmidecode                0.00     0.00
	  1 grep                     0.00     0.00
	  1 ifconfig                 0.00     0.00
	  1 ip                       0.00     0.00
	  1 lsmod                    0.00     0.00
	  1 mktemp                   0.00     0.00
	  1 qdbus                    0.00     0.00
	  1 readlink                 0.00     0.00
	  1 realpath                 0.00     0.00
	  1 sed                      0.00     0.00
	  1 sort                     0.00     0.00
	  1 stty                     0.00     0.00
	  1 systemctl                0.00     0.00
	  1 template.sh              0.00     0.00
	  1 wc                       0.00     0.00
	  1 xrandr                   0.00     0.00
0 processes running
47 maximum processes

Straightforward computation with the pattern of compiling one file after the next. I suspect there is a way to make these parallel since there isn’t a link step.

      877731) build-eigen      cpu=15 start=5.76  finish=64.25
        877732) c++              cpu=13 start=5.76  finish=6.44 
          877733) cc1plus          cpu=8 start=5.76  finish=6.41 
          877734) as               cpu=6 start=6.42  finish=6.44 
        877735) c++              cpu=8 start=6.44  finish=6.95 
          877736) cc1plus          cpu=2 start=6.44  finish=6.93 
          877737) as               cpu=1 start=6.94  finish=6.95 
        877738) c++              cpu=2 start=6.95  finish=7.38 
          877739) cc1plus          cpu=3 start=6.95  finish=7.36 
          877740) as               cpu=4 start=7.37  finish=7.38 
        877741) c++              cpu=0 start=7.38  finish=7.80 
          877742) cc1plus          cpu=9 start=7.38  finish=7.79 
          877744) as               cpu=10 start=7.80  finish=7.80 
        877745) c++              cpu=9 start=7.80  finish=8.25 
          877746) cc1plus          cpu=4 start=7.81  finish=8.24 
          877747) as               cpu=10 start=8.24  finish=8.25 
        877748) c++              cpu=0 start=8.25  finish=8.71 
          877749) cc1plus          cpu=3 start=8.26  finish=8.69 
          877750) as               cpu=9 start=8.70  finish=8.71 
        877751) c++              cpu=10 start=8.71  finish=9.18 
          877752) cc1plus          cpu=11 start=8.71  finish=9.16 
          877753) as               cpu=4 start=9.16  finish=9.17 
        877754) c++              cpu=0 start=9.18  finish=9.64 
          877755) cc1plus          cpu=3 start=9.18  finish=9.62 
          877756) as               cpu=10 start=9.63  finish=9.64 
        877757) c++              cpu=3 start=9.64  finish=10.12
          877758) cc1plus          cpu=4 start=9.64  finish=10.10
          877759) as               cpu=5 start=10.11 finish=10.12
        877760) c++              cpu=0 start=10.12 finish=10.76
          877761) cc1plus          cpu=10 start=10.12 finish=10.74
          877762) as               cpu=3 start=10.75 finish=10.76
        877763) c++              cpu=4 start=10.77 finish=11.41
          877764) cc1plus          cpu=5 start=10.77 finish=11.38
          877765) as               cpu=14 start=11.39 finish=11.41
        877766) c++              cpu=8 start=11.41 finish=11.95
          877767) cc1plus          cpu=2 start=11.41 finish=11.93
          877768) as               cpu=12 start=11.94 finish=11.95
        877769) c++              cpu=2 start=11.95 finish=12.48
          877770) cc1plus          cpu=5 start=11.95 finish=12.46
          877771) as               cpu=11 start=12.47 finish=12.48
        877772) c++              cpu=11 start=12.48 finish=13.85
          877773) cc1plus          cpu=4 start=12.48 finish=13.77
          877774) as               cpu=13 start=13.78 finish=13.85
        877775) c++              cpu=8 start=13.85 finish=14.96
          877776) cc1plus          cpu=10 start=13.85 finish=14.89
          877777) as               cpu=3 start=14.91 finish=14.95
        877778) c++              cpu=10 start=14.96 finish=16.21
          877779) cc1plus          cpu=11 start=14.96 finish=16.14
          877780) as               cpu=12 start=16.16 finish=16.21
        877781) c++              cpu=0 start=16.21 finish=19.44
          877782) cc1plus          cpu=3 start=16.21 finish=19.25
          877783) as               cpu=2 start=19.27 finish=19.43
        877784) c++              cpu=3 start=19.44 finish=20.60
          877785) cc1plus          cpu=4 start=19.44 finish=20.53
          877786) as               cpu=5 start=20.55 finish=20.60
        877787) c++              cpu=0 start=20.60 finish=21.20
          877788) cc1plus          cpu=10 start=20.60 finish=21.18
          877789) as               cpu=3 start=21.19 finish=21.20
        877790) c++              cpu=10 start=21.20 finish=22.39
          877791) cc1plus          cpu=4 start=21.20 finish=22.32
          877792) as               cpu=11 start=22.34 finish=22.38
        877793) c++              cpu=8 start=22.39 finish=29.74
          877794) cc1plus          cpu=11 start=22.39 finish=29.28
          877795) as               cpu=1 start=29.32 finish=29.73
        877796) c++              cpu=15 start=29.74 finish=31.55
          877797) cc1plus          cpu=10 start=29.74 finish=31.44
          877798) as               cpu=3 start=31.46 finish=31.55
        877799) c++              cpu=8 start=31.55 finish=32.32
          877800) cc1plus          cpu=10 start=31.55 finish=32.29
          877801) as               cpu=10 start=32.30 finish=32.32
        877802) c++              cpu=3 start=32.32 finish=32.81
          877803) cc1plus          cpu=12 start=32.33 finish=32.80
          877804) as               cpu=5 start=32.81 finish=32.81
        877805) c++              cpu=8 start=32.81 finish=33.35
          877806) cc1plus          cpu=2 start=32.82 finish=33.33
          877807) as               cpu=3 start=33.34 finish=33.35