MiniFE Finite Element is an application for unstructured finite element codes. There is a small, medium and large model. Only the small model runs on my 8GB Intel system or AMD 16 GB system. The large one fails quicker than the medium case.

Topdown profile shows backend stalls dominate and frontend stalls are also higher leaving a small retirement rate.

AMD metrics confirm a low 9% retirement rate. This is floating point code with a reasonable L2 miss rate.
elapsed 243.690
on_cpu 0.524 # 8.39 / 16 cores
utime 1993.160
stime 51.697
nvcsw 13870 # 46.17%
nivcsw 16171 # 53.83%
inblock 3344 # 13.72/sec
onblock 12608 # 51.74/sec
cpu-clock 2047157103380 # 2047.157 seconds
task-clock 2047257770017 # 2047.258 seconds
page faults 31046999 # 15165.164/sec
context switches 33542 # 16.384/sec
cpu migrations 971 # 0.474/sec
major page faults 720 # 0.352/sec
minor page faults 31046279 # 15164.812/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 706236278470 # 161.538 branches per 1000 inst
branch misses 5014293222 # 0.71% branch miss
conditional 641878405657 # 146.817 conditional branches per 1000 inst
indirect 2013712249 # 0.461 indirect branches per 1000 inst
cpu-cycles 9210992376881 # 2.38 GHz
instructions 4361725796880 # 0.47 IPC low
slots 18424742893032 #
retiring 1408820057528 # 7.6% ( 9.0%) low
-- ucode 2257470384 # 0.0%
-- fastpath 1406562587144 # 7.6%
frontend 5043746195655 # 27.4% (32.3%)
-- latency 4347570093138 # 23.6%
-- bandwidth 696176102517 # 3.8%
backend 9173912664192 # 49.8% (58.7%)
-- cpu 827799698650 # 4.5%
-- memory 8346112965542 # 45.3%
speculation 7079727654 # 0.0% ( 0.0%) low
-- branch mispredict 6974699002 # 0.0%
-- pipeline restart 105028652 # 0.0%
smt-contention 2791179374818 # 15.1% ( 0.0%)
cpu-cycles 9202270231305 # 2.38 GHz
instructions 4356300172549 # 0.47 IPC low
instructions 1452484540467 # 40.542 l2 access per 1000 inst
l2 hit from l1 36762046483 # 36.18% l2 miss
l2 miss from l1 2251102278 #
l2 hit from l2 pf 3068590825 #
l3 hit from l2 pf 61142096 #
l3 miss from l2 pf 18994133948 #
instructions 1454513550889 # 186.109 float per 1000 inst
float 512 91 # 0.000 AVX-512 per 1000 inst
float 256 624 # 0.000 AVX-256 per 1000 inst
float 128 270698676105 # 186.109 AVX-128 per 1000 inst
float MMX 0 # 0.000 MMX per 1000 inst
float scalar 2 # 0.000 scalar per 1000 inst
instructions 4359084815682 #
opcache 874086295911 # 200.521 opcache per 1000 inst
opcache miss 30119422719 # 3.4% opcache miss rate
l1 dTLB miss 1340796010 # 0.308 L1 dTLB per 1000 inst
l2 dTLB miss 1081626104 # 0.248 L2 dTLB per 1000 inst
instructions 4357779438705 #
icache 71313624570 # 16.365 icache per 1000 inst
icache miss 1458400180 # 2.0% icache miss rate
l1 iTLB miss 9490258 # 0.002 L1 iTLB per 1000 inst
l2 iTLB miss 0 # 0.000 L2 iTLB per 1000 inst
tlb flush 44877 # 0.000 TLB flush per 1000 inst
Intel metrics show 40% of the memory stalls are dram
elapsed 247.687
on_cpu 0.533 # 8.52 / 16 cores
utime 2075.936
stime 34.629
nvcsw 23274 # 59.55%
nivcsw 15812 # 40.45%
inblock 673936 # 2720.92/sec
onblock 1336 # 5.39/sec
cpu-clock 2111985298342 # 2111.985 seconds
task-clock 2112024687329 # 2112.025 seconds
page faults 23467689 # 11111.465/sec
context switches 48757 # 23.085/sec
cpu migrations 3111 # 1.473/sec
major page faults 12398 # 5.870/sec
minor page faults 23455288 # 11105.594/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 659038623656 # 157.161 branches per 1000 inst
branch misses 524366491 # 0.08% branch miss
conditional 659038641448 # 157.161 conditional branches per 1000 inst
indirect 208339155122 # 49.683 indirect branches per 1000 inst
slots 10368305007182 #
retiring 2669877911887 # 25.8% (25.8%)
-- ucode 222134393351 # 2.1%
-- fastpath 2447743518536 # 23.6%
frontend 1210078632156 # 11.7% (11.7%)
-- latency 859054106519 # 8.3%
-- bandwidth 351024525637 # 3.4%
backend 6892978942426 # 66.5% (66.5%)
-- cpu 2269699099109 # 21.9%
-- memory 4623279843317 # 44.6%
speculation 129944168269 # 1.3% ( 1.3%)
-- branch mispredict 104901134836 # 1.0%
-- pipeline restart 25043033433 # 0.2%
smt-contention 0 # 0.0% ( 0.0%)
cpu-cycles 14924883336483 # 1.80 GHz
instructions 10110279002150 # 0.68 IPC low
l2 access 178924470824 # 31.017 l2 access per 1000 inst
l2 miss 106730592575 # 59.65% l2 miss
cpu-cycles 3247227300486 # 60.2% memory latency
load stalls 1921815818188 # 0.0% l1 bound
l1 miss 1950070359204 # 8.5% l2 bound
l2 miss 1673267884336 # 11.9% l3 bound
l3 miss 1288195279271 # 39.7% dram bound
store_stalls 31750377590 # 1.0% store bound
Process overview shows miniFE.x as the running process
591 processes
180 miniFE.x 37870.42 922.58
68 clinfo 16.87 5.89
54 orted 2.29 5.33
38 vulkaninfo 1.71 0.95
4 vulkani:disk$0 0.18 0.10
6 php 0.13 0.13
2 llvmpipe-0 0.09 0.05
2 llvmpipe-1 0.09 0.05
2 llvmpipe-10 0.09 0.05
2 llvmpipe-11 0.09 0.05
2 llvmpipe-12 0.09 0.05
2 llvmpipe-13 0.09 0.05
2 llvmpipe-14 0.09 0.05
2 llvmpipe-15 0.09 0.05
2 llvmpipe-2 0.09 0.05
2 llvmpipe-3 0.09 0.05
2 llvmpipe-4 0.09 0.05
2 llvmpipe-5 0.09 0.05
2 llvmpipe-6 0.09 0.05
2 llvmpipe-7 0.09 0.05
2 llvmpipe-8 0.09 0.05
2 llvmpipe-9 0.09 0.05
6 clang 0.08 0.04
3 rocminfo 0.03 0.00
1 lspci 0.01 0.01
1 ps 0.00 0.01
83 sh 0.00 0.00
13 gcc 0.00 0.00
11 gsettings 0.00 0.00
9 cat 0.00 0.00
9 minife 0.00 0.00
9 rm 0.00 0.00
8 stat 0.00 0.00
8 systemd-detect- 0.00 0.00
6 llvm-link 0.00 0.00
5 phoronix-test-s 0.00 0.00
4 glxinfo 0.00 0.00
3 gmain 0.00 0.00
2 cc 0.00 0.00
2 dconf worker 0.00 0.00
2 lscpu 0.00 0.00
2 setterm 0.00 0.00
2 uname 0.00 0.00
2 which 0.00 0.00
1 date 0.00 0.00
1 dirname 0.00 0.00
1 dmesg 0.00 0.00
1 dmidecode 0.00 0.00
1 grep 0.00 0.00
1 ifconfig 0.00 0.00
1 ip 0.00 0.00
1 lsmod 0.00 0.00
1 mktemp 0.00 0.00
1 qdbus 0.00 0.00
1 readlink 0.00 0.00
1 realpath 0.00 0.00
1 sed 0.00 0.00
1 sort 0.00 0.00
1 stty 0.00 0.00
1 systemctl 0.00 0.00
1 template.sh 0.00 0.00
1 wc 0.00 0.00
0 processes running
47 maximum processes
Computation blocks
164783) minife cpu=7 start=5.62 finish=6.45
164784) rm cpu=6 start=5.63 finish=5.63
164785) miniFE.x cpu=1 start=5.63 finish=6.43
164786) orted cpu=0 start=5.63 finish=6.46
164789) orted cpu=3 start=5.83 finish=6.46
164790) orted cpu=6 start=5.84 finish=5.84
164791) orted cpu=6 start=5.86 finish=6.45
164792) orted cpu=1 start=5.96 finish=6.45
164793) orted cpu=5 start=5.96 finish=6.46
164794) miniFE.x cpu=4 start=5.97 finish=6.43
164795) miniFE.x cpu=0 start=5.97 finish=6.43
164798) miniFE.x cpu=10 start=6.12 finish=6.43
164799) miniFE.x cpu=14 start=6.12 finish=6.12
164800) miniFE.x cpu=9 start=6.35 finish=6.42
164801) miniFE.x cpu=11 start=6.35 finish=6.42
164802) miniFE.x cpu=4 start=6.35 finish=6.42
164803) miniFE.x cpu=8 start=6.35 finish=6.42
164804) miniFE.x cpu=7 start=6.35 finish=6.42
164805) miniFE.x cpu=6 start=6.35 finish=6.42
164806) miniFE.x cpu=2 start=6.35 finish=6.42
164807) miniFE.x cpu=8 start=6.35 finish=6.42
164808) miniFE.x cpu=9 start=6.35 finish=6.42
164809) miniFE.x cpu=3 start=6.35 finish=6.42
164810) miniFE.x cpu=12 start=6.35 finish=6.42
164811) miniFE.x cpu=13 start=6.35 finish=6.42
164812) miniFE.x cpu=15 start=6.35 finish=6.42
164813) miniFE.x cpu=11 start=6.35 finish=6.42
164814) miniFE.x cpu=14 start=6.35 finish=6.42
164816) cat cpu=2 start=6.45 finish=6.45
