Code for modeling electronic structures and materials. One benchmark and result is reported. The code is multi-threaded with a mix of one thread per hyperthreaded core and one per actual cor.e

Topdown profile shows a blur between being backend stalls and occasional higher retirement rates. Frontend stalls are low.

AMD metrics confirm backend stalls split between memory and CPU. The code doesn’t have much floating point or branches.
elapsed 2009.644
on_cpu 0.892 # 14.27 / 16 cores
utime 27794.710
stime 888.971
nvcsw 66275 # 16.07%
nivcsw 346047 # 83.93%
inblock 0 # 0.00/sec
onblock 705448 # 351.03/sec
cpu-clock 28689747750498 # 28689.748 seconds
task-clock 28690022795733 # 28690.023 seconds
page faults 81383819 # 2836.659/sec
context switches 422153 # 14.714/sec
cpu migrations 16702 # 0.582/sec
major page faults 458 # 0.016/sec
minor page faults 81383361 # 2836.643/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 11096495451105 # 63.130 branches per 1000 inst
branch misses 81279292087 # 0.73% branch miss
conditional 8071827699704 # 45.922 conditional branches per 1000 inst
indirect 385031131344 # 2.191 indirect branches per 1000 inst
cpu-cycles 108628710804268 # 3.38 GHz
instructions 175396936466455 # 1.61 IPC
slots 217224690829842 #
retiring 58560058869307 # 27.0% (36.9%)
-- ucode 59432676601 # 0.0%
-- fastpath 58500626192706 # 26.9%
frontend 11170785657522 # 5.1% ( 7.0%)
-- latency 6588015021012 # 3.0%
-- bandwidth 4582770636510 # 2.1%
backend 88710120335677 # 40.8% (55.9%)
-- cpu 45427624864433 # 20.9%
-- memory 43282495471244 # 19.9%
speculation 253387692929 # 0.1% ( 0.2%) low
-- branch mispredict 240323593293 # 0.1%
-- pipeline restart 13064099636 # 0.0%
smt-contention 58530240425718 # 26.9% ( 0.0%)
cpu-cycles 108712417550875 # 3.37 GHz
instructions 175866105021602 # 1.62 IPC
instructions 58623162544056 # 66.044 l2 access per 1000 inst
l2 hit from l1 2318656460357 # 17.18% l2 miss
l2 miss from l1 74091602930 #
l2 hit from l2 pf 961995367095 #
l3 hit from l2 pf 499044508637 #
l3 miss from l2 pf 92004877695 #
instructions 58611513941603 # 43.639 float per 1000 inst
float 512 73 # 0.000 AVX-512 per 1000 inst
float 256 666 # 0.000 AVX-256 per 1000 inst
float 128 2557768514987 # 43.639 AVX-128 per 1000 inst
float MMX 0 # 0.000 MMX per 1000 inst
float scalar 0 # 0.000 scalar per 1000 inst
Intel metrics
elapsed 6333.440
on_cpu 0.742 # 11.88 / 16 cores
utime 74750.330
stime 473.297
nvcsw 1313743 # 89.74%
nivcsw 150279 # 10.26%
inblock 30890000 # 4877.29/sec
onblock 694880 # 109.72/sec
cpu-clock 75222931189368 # 75222.931 seconds
task-clock 75223322994820 # 75223.323 seconds
page faults 85985637 # 1143.072/sec
context switches 1495425 # 19.880/sec
cpu migrations 54829 # 0.729/sec
major page faults 1233961 # 16.404/sec
minor page faults 84751671 # 1126.667/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 45503197495161 # 71.793 branches per 1000 inst
branch misses 90982505997 # 0.20% branch miss
conditional 45503197511129 # 71.793 conditional branches per 1000 inst
indirect 11993405533405 # 18.923 indirect branches per 1000 inst
slots 433838595012596 #
retiring 283286737116435 # 65.3% (65.3%) high
-- ucode 13222986405835 # 3.0%
-- fastpath 270063750710600 # 62.2%
frontend 19439778785847 # 4.5% ( 4.5%) low
-- latency 9361508360355 # 2.2%
-- bandwidth 10078270425492 # 2.3%
backend 112743375337721 # 26.0% (26.0%)
-- cpu 63018362871765 # 14.5%
-- memory 49725012465956 # 11.5%
speculation 12881927224817 # 3.0% ( 3.0%)
-- branch mispredict 12368006069230 # 2.9%
-- pipeline restart 513921155587 # 0.1%
smt-contention 0 # 0.0% ( 0.0%)
cpu-cycles 216187026514697 # 2.16 GHz
instructions 883987677201512 # 4.09 IPC high
l2 access 3937606297777 # 13.285 l2 access per 1000 inst
l2 miss 358227811293 # 9.10% l2 miss
Process overview shows pw.x as the primary process.
498 processes
120 pw.x 138331.72 4422.93
68 clinfo 16.54 5.99
18 mpirun 1.04 2.27
38 vulkaninfo 0.95 1.33
6 php 0.22 0.38
4 vulkani:disk$0 0.10 0.14
6 glxinfo:gdrv0 0.10 0.08
6 glxinfo:gl0 0.10 0.08
6 clang 0.05 0.07
2 llvmpipe-0 0.05 0.07
2 llvmpipe-1 0.05 0.07
2 llvmpipe-10 0.05 0.07
2 llvmpipe-11 0.05 0.07
2 llvmpipe-12 0.05 0.07
2 llvmpipe-13 0.05 0.07
2 llvmpipe-14 0.05 0.07
2 llvmpipe-15 0.05 0.07
2 llvmpipe-2 0.05 0.07
2 llvmpipe-3 0.05 0.07
2 llvmpipe-4 0.05 0.07
2 llvmpipe-5 0.05 0.07
2 llvmpipe-6 0.05 0.07
2 llvmpipe-7 0.05 0.07
2 llvmpipe-8 0.05 0.07
2 llvmpipe-9 0.05 0.07
2 glxinfo 0.04 0.04
2 glxinfo:cs0 0.04 0.04
2 glxinfo:disk$0 0.04 0.04
2 glxinfo:sh0 0.04 0.04
2 glxinfo:shlo0 0.04 0.04
3 rocminfo 0.03 0.00
1 lspci 0.00 0.02
1 ps 0.00 0.01
82 sh 0.00 0.00
14 gsettings 0.00 0.00
13 gcc 0.00 0.00
10 sed 0.00 0.00
8 stat 0.00 0.00
8 systemd-detect- 0.00 0.00
6 llvm-link 0.00 0.00
5 phoronix-test-s 0.00 0.00
3 qe 0.00 0.00
2 cc 0.00 0.00
2 lscpu 0.00 0.00
2 uname 0.00 0.00
2 which 0.00 0.00
2 xset 0.00 0.00
1 date 0.00 0.00
1 dconf worker 0.00 0.00
1 dirname 0.00 0.00
1 dmesg 0.00 0.00
1 dmidecode 0.00 0.00
1 gmain 0.00 0.00
1 grep 0.00 0.00
1 ifconfig 0.00 0.00
1 ip 0.00 0.00
1 lsmod 0.00 0.00
1 mktemp 0.00 0.00
1 qdbus 0.00 0.00
1 readlink 0.00 0.00
1 realpath 0.00 0.00
1 sort 0.00 0.00
1 stty 0.00 0.00
1 systemctl 0.00 0.00
1 template.sh 0.00 0.00
1 wc 0.00 0.00
1 xrandr 0.00 0.00
0 processes running
55 maximum processes
Sample computation shows MPI being used
44262) qe cpu=15 start=5.68 finish=665.97
44263) mpirun cpu=0 start=5.69 finish=665.94
44268) mpirun cpu=10 start=6.27 finish=665.94
44269) mpirun cpu=3 start=6.27 finish=6.27
44270) mpirun cpu=2 start=6.29 finish=665.93
44271) mpirun cpu=15 start=6.77 finish=665.93
44272) mpirun cpu=13 start=6.77 finish=665.94
44273) pw.x cpu=9 start=6.80 finish=665.92
44275) pw.x cpu=4 start=6.81 finish=665.92
44277) pw.x cpu=0 start=6.81 finish=665.92
44282) pw.x cpu=6 start=6.82 finish=665.92
44307) pw.x cpu=8 start=8.44 finish=665.92
44274) pw.x cpu=6 start=6.81 finish=665.92
44278) pw.x cpu=12 start=6.81 finish=665.92
44280) pw.x cpu=10 start=6.82 finish=665.92
44286) pw.x cpu=15 start=6.82 finish=665.92
44308) pw.x cpu=15 start=8.45 finish=665.92
44276) pw.x cpu=11 start=6.81 finish=665.92
44281) pw.x cpu=5 start=6.82 finish=665.92
44284) pw.x cpu=15 start=6.82 finish=665.92
44290) pw.x cpu=5 start=6.83 finish=665.92
44310) pw.x cpu=5 start=8.47 finish=665.92
44279) pw.x cpu=13 start=6.82 finish=665.92
44285) pw.x cpu=2 start=6.82 finish=665.92
44288) pw.x cpu=10 start=6.83 finish=665.92
44294) pw.x cpu=0 start=6.83 finish=665.92
44305) pw.x cpu=5 start=8.41 finish=665.92
44283) pw.x cpu=1 start=6.82 finish=665.92
44289) pw.x cpu=15 start=6.83 finish=665.92
44292) pw.x cpu=4 start=6.83 finish=665.92
44298) pw.x cpu=3 start=6.84 finish=665.92
44312) pw.x cpu=11 start=8.59 finish=665.92
44287) pw.x cpu=10 start=6.83 finish=665.92
44293) pw.x cpu=2 start=6.83 finish=665.92
44297) pw.x cpu=10 start=6.84 finish=665.92
44301) pw.x cpu=10 start=6.84 finish=665.92
44309) pw.x cpu=8 start=8.46 finish=665.92
44291) pw.x cpu=14 start=6.83 finish=665.92
44296) pw.x cpu=8 start=6.84 finish=665.92
44299) pw.x cpu=5 start=6.84 finish=665.92
44303) pw.x cpu=13 start=6.85 finish=665.92
44306) pw.x cpu=7 start=8.42 finish=665.92
44295) pw.x cpu=7 start=6.83 finish=665.92
44300) pw.x cpu=10 start=6.84 finish=665.92
44302) pw.x cpu=5 start=6.85 finish=665.92
44304) pw.x cpu=4 start=6.85 finish=665.92
44311) pw.x cpu=1 start=8.48 finish=665.92
44359) sed cpu=6 start=665.97 finish=665.97
44360) sed cpu=12 start=665.97 finish=665.97
44361) sed cpu=9 start=665.97 finish=665.97
