Measuring the time to build the Eigen examples. This is a single-threaded workload.

Topdown profile has frontend stalls highest, retirement medium and backend stalls lower.

AMD metrics confirm the single-threaded nature. There is not much floating point and a higher amount of branches. The page fault rate is high.
elapsed 192.066
on_cpu 0.058 # 0.93 / 16 cores
utime 155.424
stime 22.525
nvcsw 8485 # 81.00%
nivcsw 1990 # 19.00%
inblock 0 # 0.00/sec
onblock 1036120 # 5394.60/sec
cpu-clock 177932719483 # 177.933 seconds
task-clock 177939721084 # 177.940 seconds
page faults 10960537 # 61596.910/sec
context switches 10634 # 59.762/sec
cpu migrations 340 # 1.911/sec
major page faults 3 # 0.017/sec
minor page faults 10960534 # 61596.893/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 304119200242 # 212.340 branches per 1000 inst
branch misses 7766876223 # 2.55% branch miss
conditional 233491815928 # 163.027 conditional branches per 1000 inst
indirect 6247171032 # 4.362 indirect branches per 1000 inst
cpu-cycles 809822141694 # 0.27 GHz
instructions 1420863958147 # 1.75 IPC
slots 1632435207282 #
retiring 461919959768 # 28.3% (28.3%)
-- ucode 409300568 # 0.0%
-- fastpath 461510659200 # 28.3%
frontend 688181256897 # 42.2% (42.2%)
-- latency 489140540280 # 30.0%
-- bandwidth 199040716617 # 12.2%
backend 372531805791 # 22.8% (22.8%)
-- cpu 48735530739 # 3.0%
-- memory 323796275052 # 19.8%
speculation 109563846712 # 6.7% ( 6.7%)
-- branch mispredict 107887873935 # 6.6%
-- pipeline restart 1675972777 # 0.1%
smt-contention 237974821 # 0.0% ( 0.0%)
cpu-cycles 810688764115 # 0.27 GHz
instructions 1419010189378 # 1.75 IPC
instructions 475183462637 # 48.484 l2 access per 1000 inst
l2 hit from l1 20764952358 # 19.77% l2 miss
l2 miss from l1 3424321293 #
l2 hit from l2 pf 1144198001 #
l3 hit from l2 pf 771547943 #
l3 miss from l2 pf 358295900 #
instructions 477486479175 # 18.548 float per 1000 inst
float 512 172 # 0.000 AVX-512 per 1000 inst
float 256 257086 # 0.001 AVX-256 per 1000 inst
float 128 8856160036 # 18.547 AVX-128 per 1000 inst
float MMX 0 # 0.000 MMX per 1000 inst
float scalar 0 # 0.000 scalar per 1000 inst
instructions 2692214 #
opcache 995150 # 369.640 opcache per 1000 inst
opcache miss 535421 # 53.8% opcache miss rate
l1 dTLB miss 6949 # 2.581 L1 dTLB per 1000 inst
l2 dTLB miss 1265 # 0.470 L2 dTLB per 1000 inst
instructions 2738628 #
icache 1321546 # 482.558 icache per 1000 inst
icache miss 110332 # 8.3% icache miss rate
l1 iTLB miss 7 # 0.003 L1 iTLB per 1000 inst
l2 iTLB miss 0 # 0.000 L2 iTLB per 1000 inst
tlb flush 19 # 0.007 TLB flush per 1000 inst
Intel metrics
elapsed 208.928
on_cpu 0.058 # 0.93 / 16 cores
utime 178.327
stime 15.590
nvcsw 11389 # 84.90%
nivcsw 2026 # 15.10%
inblock 47560 # 227.64/sec
onblock 1024896 # 4905.50/sec
cpu-clock 193849295915 # 193.849 seconds
task-clock 193858763133 # 193.859 seconds
page faults 10950372 # 56486.340/sec
context switches 13663 # 70.479/sec
cpu migrations 422 # 2.177/sec
major page faults 267 # 1.377/sec
minor page faults 10950105 # 56484.963/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 300412569450 # 210.542 branches per 1000 inst
branch misses 5055599168 # 1.68% branch miss
conditional 300412601450 # 210.542 conditional branches per 1000 inst
indirect 6253736930 # 4.383 indirect branches per 1000 inst
slots 4349057855702 #
retiring 1330460791228 # 30.6% (30.6%)
-- ucode 96253339275 # 2.2%
-- fastpath 1234207451953 # 28.4%
frontend 1618202986437 # 37.2% (37.2%)
-- latency 857127981988 # 19.7%
-- bandwidth 761075004449 # 17.5%
backend 810368367002 # 18.6% (18.6%)
-- cpu 282251961137 # 6.5%
-- memory 528116405865 # 12.1%
speculation 598085359038 # 13.8% (13.8%) high
-- branch mispredict 570655432259 # 13.1%
-- pipeline restart 27429926779 # 0.6%
smt-contention 0 # 0.0% ( 0.0%)
cpu-cycles 726240052738 # 0.22 GHz
instructions 1422860814808 # 1.96 IPC
l2 access 62747872551 # 44.119 l2 access per 1000 inst
l2 miss 16238642601 # 25.88% l2 miss
cpu-cycles 725580737432 # 22.6% memory latency
load stalls 158209076663 # 1.9% l1 bound
l1 miss 144170544495 # 10.3% l2 bound
l2 miss 69226808624 # 4.1% l3 bound
l3 miss 39673165073 # 5.5% dram bound
store_stalls 5452926964 # 0.8% store bound
Process summary suggests c++ code with time spent in the front end.
981 processes
204 cc1plus 147.93 16.46
68 clinfo 18.51 7.32
204 as 3.55 0.81
38 vulkaninfo 1.43 1.33
3 bzip2 0.51 0.01
4 vulkani:disk$0 0.16 0.14
6 glxinfo:gdrv0 0.13 0.07
6 glxinfo:gl0 0.13 0.07
2 llvmpipe-0 0.09 0.07
2 llvmpipe-1 0.09 0.07
2 llvmpipe-10 0.09 0.07
2 llvmpipe-11 0.09 0.07
2 llvmpipe-12 0.09 0.07
2 llvmpipe-13 0.09 0.07
2 llvmpipe-14 0.09 0.07
2 llvmpipe-15 0.09 0.07
2 llvmpipe-2 0.09 0.07
2 llvmpipe-3 0.09 0.07
2 llvmpipe-4 0.09 0.07
2 llvmpipe-5 0.09 0.07
2 llvmpipe-6 0.09 0.07
2 llvmpipe-7 0.09 0.07
2 llvmpipe-8 0.09 0.07
2 llvmpipe-9 0.09 0.07
6 php 0.08 0.07
2 glxinfo 0.07 0.03
2 glxinfo:cs0 0.07 0.03
2 glxinfo:disk$0 0.07 0.03
2 glxinfo:sh0 0.07 0.03
2 glxinfo:shlo0 0.07 0.03
6 clang 0.05 0.07
3 rocminfo 0.03 0.00
1 lspci 0.01 0.02
3 tar 0.00 0.16
4 rm 0.00 0.09
3 build-eigen 0.00 0.03
1 ps 0.00 0.01
204 c++ 0.00 0.00
86 sh 0.00 0.00
13 gcc 0.00 0.00
11 gsettings 0.00 0.00
8 stat 0.00 0.00
8 systemd-detect- 0.00 0.00
6 llvm-link 0.00 0.00
5 phoronix-test-s 0.00 0.00
4 bash 0.00 0.00
4 gmain 0.00 0.00
2 cc 0.00 0.00
2 lscpu 0.00 0.00
2 uname 0.00 0.00
2 which 0.00 0.00
2 xset 0.00 0.00
1 date 0.00 0.00
1 dconf worker 0.00 0.00
1 dirname 0.00 0.00
1 dmesg 0.00 0.00
1 dmidecode 0.00 0.00
1 grep 0.00 0.00
1 ifconfig 0.00 0.00
1 ip 0.00 0.00
1 lsmod 0.00 0.00
1 mktemp 0.00 0.00
1 qdbus 0.00 0.00
1 readlink 0.00 0.00
1 realpath 0.00 0.00
1 sed 0.00 0.00
1 sort 0.00 0.00
1 stty 0.00 0.00
1 systemctl 0.00 0.00
1 template.sh 0.00 0.00
1 wc 0.00 0.00
1 xrandr 0.00 0.00
0 processes running
47 maximum processes
Straightforward computation with the pattern of compiling one file after the next. I suspect there is a way to make these parallel since there isn’t a link step.
877731) build-eigen cpu=15 start=5.76 finish=64.25
877732) c++ cpu=13 start=5.76 finish=6.44
877733) cc1plus cpu=8 start=5.76 finish=6.41
877734) as cpu=6 start=6.42 finish=6.44
877735) c++ cpu=8 start=6.44 finish=6.95
877736) cc1plus cpu=2 start=6.44 finish=6.93
877737) as cpu=1 start=6.94 finish=6.95
877738) c++ cpu=2 start=6.95 finish=7.38
877739) cc1plus cpu=3 start=6.95 finish=7.36
877740) as cpu=4 start=7.37 finish=7.38
877741) c++ cpu=0 start=7.38 finish=7.80
877742) cc1plus cpu=9 start=7.38 finish=7.79
877744) as cpu=10 start=7.80 finish=7.80
877745) c++ cpu=9 start=7.80 finish=8.25
877746) cc1plus cpu=4 start=7.81 finish=8.24
877747) as cpu=10 start=8.24 finish=8.25
877748) c++ cpu=0 start=8.25 finish=8.71
877749) cc1plus cpu=3 start=8.26 finish=8.69
877750) as cpu=9 start=8.70 finish=8.71
877751) c++ cpu=10 start=8.71 finish=9.18
877752) cc1plus cpu=11 start=8.71 finish=9.16
877753) as cpu=4 start=9.16 finish=9.17
877754) c++ cpu=0 start=9.18 finish=9.64
877755) cc1plus cpu=3 start=9.18 finish=9.62
877756) as cpu=10 start=9.63 finish=9.64
877757) c++ cpu=3 start=9.64 finish=10.12
877758) cc1plus cpu=4 start=9.64 finish=10.10
877759) as cpu=5 start=10.11 finish=10.12
877760) c++ cpu=0 start=10.12 finish=10.76
877761) cc1plus cpu=10 start=10.12 finish=10.74
877762) as cpu=3 start=10.75 finish=10.76
877763) c++ cpu=4 start=10.77 finish=11.41
877764) cc1plus cpu=5 start=10.77 finish=11.38
877765) as cpu=14 start=11.39 finish=11.41
877766) c++ cpu=8 start=11.41 finish=11.95
877767) cc1plus cpu=2 start=11.41 finish=11.93
877768) as cpu=12 start=11.94 finish=11.95
877769) c++ cpu=2 start=11.95 finish=12.48
877770) cc1plus cpu=5 start=11.95 finish=12.46
877771) as cpu=11 start=12.47 finish=12.48
877772) c++ cpu=11 start=12.48 finish=13.85
877773) cc1plus cpu=4 start=12.48 finish=13.77
877774) as cpu=13 start=13.78 finish=13.85
877775) c++ cpu=8 start=13.85 finish=14.96
877776) cc1plus cpu=10 start=13.85 finish=14.89
877777) as cpu=3 start=14.91 finish=14.95
877778) c++ cpu=10 start=14.96 finish=16.21
877779) cc1plus cpu=11 start=14.96 finish=16.14
877780) as cpu=12 start=16.16 finish=16.21
877781) c++ cpu=0 start=16.21 finish=19.44
877782) cc1plus cpu=3 start=16.21 finish=19.25
877783) as cpu=2 start=19.27 finish=19.43
877784) c++ cpu=3 start=19.44 finish=20.60
877785) cc1plus cpu=4 start=19.44 finish=20.53
877786) as cpu=5 start=20.55 finish=20.60
877787) c++ cpu=0 start=20.60 finish=21.20
877788) cc1plus cpu=10 start=20.60 finish=21.18
877789) as cpu=3 start=21.19 finish=21.20
877790) c++ cpu=10 start=21.20 finish=22.39
877791) cc1plus cpu=4 start=21.20 finish=22.32
877792) as cpu=11 start=22.34 finish=22.38
877793) c++ cpu=8 start=22.39 finish=29.74
877794) cc1plus cpu=11 start=22.39 finish=29.28
877795) as cpu=1 start=29.32 finish=29.73
877796) c++ cpu=15 start=29.74 finish=31.55
877797) cc1plus cpu=10 start=29.74 finish=31.44
877798) as cpu=3 start=31.46 finish=31.55
877799) c++ cpu=8 start=31.55 finish=32.32
877800) cc1plus cpu=10 start=31.55 finish=32.29
877801) as cpu=10 start=32.30 finish=32.32
877802) c++ cpu=3 start=32.32 finish=32.81
877803) cc1plus cpu=12 start=32.33 finish=32.80
877804) as cpu=5 start=32.81 finish=32.81
877805) c++ cpu=8 start=32.81 finish=33.35
877806) cc1plus cpu=2 start=32.82 finish=33.33
877807) as cpu=3 start=33.34 finish=33.35
