Lulesh is an acronym for Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics. This is a very quick running benchmark. Looks like MPI runs just on physical cores.

Topdown profile is sparse because the workload runs quickly. However on aggregate backend stalls predominate.

AMD metrics make it easier to see the summary. On-cpu is barely 1/4 of the cores. Backend memory stalls are high and CPU stalls also contribute. Approximately 40% of the instructions are floating point
elapsed 48.970
on_cpu 0.296 # 4.73 / 16 cores
utime 191.535
stime 40.245
nvcsw 45784 # 96.73%
nivcsw 1548 # 3.27%
inblock 8 # 0.16/sec
onblock 62080 # 1267.73/sec
cpu-clock 231739627464 # 231.740 seconds
task-clock 231757527279 # 231.758 seconds
page faults 19718776 # 85083.649/sec
context switches 47385 # 204.459/sec
cpu migrations 1131 # 4.880/sec
major page faults 234 # 1.010/sec
minor page faults 19718542 # 85082.639/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 82537114616 # 75.499 branches per 1000 inst
branch misses 3072310653 # 3.72% branch miss
conditional 55678608704 # 50.931 conditional branches per 1000 inst
indirect 2686831025 # 2.458 indirect branches per 1000 inst
cpu-cycles 986477655776 # 1.27 GHz
instructions 1082464092791 # 1.10 IPC
slots 1974576003252 #
retiring 377954134774 # 19.1% (19.2%)
-- ucode 509902685 # 0.0%
-- fastpath 377444232089 # 19.1%
frontend 228199081339 # 11.6% (11.6%)
-- latency 170613588120 # 8.6%
-- bandwidth 57585493219 # 2.9%
backend 1361308062049 # 68.9% (69.0%)
-- cpu 432111990127 # 21.9%
-- memory 929196071922 # 47.1%
speculation 5124623997 # 0.3% ( 0.3%) low
-- branch mispredict 5037122371 # 0.3%
-- pipeline restart 87501626 # 0.0%
smt-contention 1988503188 # 0.1% ( 0.0%)
cpu-cycles 986280789225 # 1.27 GHz
instructions 1079037081656 # 1.09 IPC
instructions 360898263957 # 40.265 l2 access per 1000 inst
l2 hit from l1 9682858043 # 24.84% l2 miss
l2 miss from l1 712645490 #
l2 hit from l2 pf 1951552116 #
l3 hit from l2 pf 139447103 #
l3 miss from l2 pf 2757536174 #
instructions 361496500819 # 406.170 float per 1000 inst
float 512 76 # 0.000 AVX-512 per 1000 inst
float 256 690 # 0.000 AVX-256 per 1000 inst
float 128 146829108908 # 406.170 AVX-128 per 1000 inst
float MMX 0 # 0.000 MMX per 1000 inst
float scalar 0 # 0.000 scalar per 1000 inst
Intel metrics
elapsed 55.049
on_cpu 0.317 # 5.07 / 16 cores
utime 242.439
stime 36.925
nvcsw 83411 # 98.53%
nivcsw 1248 # 1.47%
inblock 519472 # 9436.55/sec
onblock 50664 # 920.34/sec
cpu-clock 279287193118 # 279.287 seconds
task-clock 279309002068 # 279.309 seconds
page faults 19700096 # 70531.547/sec
context switches 84722 # 303.327/sec
cpu migrations 1460 # 5.227/sec
major page faults 3526 # 12.624/sec
minor page faults 19696570 # 70518.923/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 268786231444 # 136.459 branches per 1000 inst
branch misses 58265385 # 0.02% branch miss
conditional 268786245332 # 136.459 conditional branches per 1000 inst
indirect 41617151216 # 21.128 indirect branches per 1000 inst
slots 15445604506322 #
retiring 7770478502776 # 50.3% (50.3%)
-- ucode 790445170397 # 5.1%
-- fastpath 6980033332379 # 45.2%
frontend 748657265289 # 4.8% ( 4.8%) low
-- latency 355686368439 # 2.3%
-- bandwidth 392970896850 # 2.5%
backend 6870999071370 # 44.5% (44.5%)
-- cpu 2408157938972 # 15.6%
-- memory 4462841132398 # 28.9%
speculation 137401946204 # 0.9% ( 0.9%) low
-- branch mispredict 71814841286 # 0.5%
-- pipeline restart 65587104918 # 0.4%
smt-contention 0 # 0.0% ( 0.0%)
cpu-cycles 5203449155222 # 1.14 GHz
instructions 15428229523592 # 2.97 IPC
l2 access 75190794668 # 9.551 l2 access per 1000 inst
l2 miss 44769402597 # 59.54% l2 miss
Process overview shows lulesh2.0 invocations under MPI
441 processes
72 lulesh2.0 570.95 112.10
68 clinfo 15.88 6.32
38 vulkaninfo 0.94 1.33
18 mpirun 0.77 2.15
6 glxinfo:gdrv0 0.12 0.04
6 glxinfo:gl0 0.12 0.04
4 vulkani:disk$0 0.10 0.14
6 clang 0.08 0.03
6 php 0.07 0.07
2 glxinfo 0.06 0.03
2 glxinfo:cs0 0.06 0.02
2 glxinfo:disk$0 0.06 0.02
2 glxinfo:sh0 0.06 0.02
2 glxinfo:shlo0 0.06 0.02
2 llvmpipe-0 0.05 0.07
2 llvmpipe-1 0.05 0.07
2 llvmpipe-10 0.05 0.07
2 llvmpipe-11 0.05 0.07
2 llvmpipe-12 0.05 0.07
2 llvmpipe-13 0.05 0.07
2 llvmpipe-14 0.05 0.07
2 llvmpipe-15 0.05 0.07
2 llvmpipe-2 0.05 0.07
2 llvmpipe-3 0.05 0.07
2 llvmpipe-4 0.05 0.07
2 llvmpipe-5 0.05 0.07
2 llvmpipe-6 0.05 0.07
2 llvmpipe-7 0.05 0.07
2 llvmpipe-8 0.05 0.07
2 llvmpipe-9 0.05 0.07
3 rocminfo 0.03 0.00
1 lspci 0.00 0.02
1 ps 0.00 0.01
82 sh 0.00 0.00
13 gcc 0.00 0.00
13 gsettings 0.00 0.00
8 stat 0.00 0.00
8 systemd-detect- 0.00 0.00
6 llvm-link 0.00 0.00
5 phoronix-test-s 0.00 0.00
3 lulesh 0.00 0.00
2 cc 0.00 0.00
2 gmain 0.00 0.00
2 lscpu 0.00 0.00
2 uname 0.00 0.00
2 which 0.00 0.00
2 xset 0.00 0.00
1 date 0.00 0.00
1 dconf worker 0.00 0.00
1 dirname 0.00 0.00
1 dmesg 0.00 0.00
1 dmidecode 0.00 0.00
1 grep 0.00 0.00
1 ifconfig 0.00 0.00
1 ip 0.00 0.00
1 lsmod 0.00 0.00
1 mktemp 0.00 0.00
1 qdbus 0.00 0.00
1 readlink 0.00 0.00
1 realpath 0.00 0.00
1 sed 0.00 0.00
1 sort 0.00 0.00
1 stty 0.00 0.00
1 systemctl 0.00 0.00
1 template.sh 0.00 0.00
1 wc 0.00 0.00
1 xrandr 0.00 0.00
0 processes running
47 maximum processes
Computation blocks
7923) lulesh cpu=1 start=5.85 finish=16.90
7924) mpirun cpu=0 start=5.85 finish=16.88
7927) mpirun cpu=4 start=6.46 finish=16.88
7928) mpirun cpu=7 start=6.46 finish=6.46
7929) mpirun cpu=9 start=6.48 finish=16.87
7930) mpirun cpu=15 start=6.97 finish=16.87
7931) mpirun cpu=10 start=6.97 finish=16.88
7932) lulesh2.0 cpu=10 start=6.98 finish=16.82
7934) lulesh2.0 cpu=15 start=6.98 finish=16.81
7938) lulesh2.0 cpu=15 start=6.99 finish=16.81
7933) lulesh2.0 cpu=12 start=6.98 finish=16.82
7936) lulesh2.0 cpu=0 start=6.99 finish=16.81
7940) lulesh2.0 cpu=14 start=7.00 finish=16.81
7935) lulesh2.0 cpu=3 start=6.99 finish=16.82
7939) lulesh2.0 cpu=14 start=6.99 finish=16.81
7943) lulesh2.0 cpu=5 start=7.00 finish=16.81
7937) lulesh2.0 cpu=4 start=6.99 finish=16.77
7942) lulesh2.0 cpu=1 start=7.00 finish=16.77
7947) lulesh2.0 cpu=11 start=7.00 finish=16.77
7941) lulesh2.0 cpu=11 start=7.00 finish=16.77
7945) lulesh2.0 cpu=10 start=7.00 finish=16.77
7950) lulesh2.0 cpu=3 start=7.01 finish=16.77
7944) lulesh2.0 cpu=8 start=7.00 finish=16.77
7948) lulesh2.0 cpu=5 start=7.01 finish=16.77
7952) lulesh2.0 cpu=4 start=7.01 finish=16.77
7946) lulesh2.0 cpu=6 start=7.00 finish=16.77
7951) lulesh2.0 cpu=13 start=7.01 finish=16.77
7954) lulesh2.0 cpu=5 start=7.02 finish=16.77
7949) lulesh2.0 cpu=7 start=7.01 finish=16.77
7953) lulesh2.0 cpu=15 start=7.01 finish=16.77
7955) lulesh2.0 cpu=2 start=7.02 finish=16.77
