askap is the Australian SKA Pathfinder and their benchmarks. The first four run a tConvolve using MT (multi-threaded), MPI, OpenCL and OpenMP. The OpenCL fails and other three run. The last workload is different.

Topdown profile shows the MT and MPI workloads are mostly backend bound. Others are more variable.

AMD metrics show we run on half the cores on average. This is heavy floating point code with a very low level of frontend stalls and a low IPC. Backend stalls, particularly memory dominate.
Intel metrics also show heavy backend stalls. The memory latency breakdown suggests this is mostly at the DRAM level of the hierarchy.
elapsed 692.645
on_cpu 0.563 # 9.01 / 16 cores
utime 6197.316
stime 43.226
nvcsw 52242 # 59.50%
nivcsw 35559 # 40.50%
inblock 484744 # 699.84/sec
onblock 838424 # 1210.47/sec
cpu-clock 6241158137516 # 6241.158 seconds
task-clock 6241202869038 # 6241.203 seconds
page faults 18726253 # 3000.424/sec
context switches 90981 # 14.577/sec
cpu migrations 7128 # 1.142/sec
major page faults 1834 # 0.294/sec
minor page faults 18724419 # 3000.130/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 3273043868037 # 168.725 branches per 1000 inst
branch misses 3267599213 # 0.10% branch miss
conditional 3273043892613 # 168.725 conditional branches per 1000 inst
indirect 546410675684 # 28.167 indirect branches per 1000 inst
slots 60204722223794 #
retiring 20347657209861 # 33.8% (33.8%)
-- ucode 1484017206434 # 2.5%
-- fastpath 18863640003427 # 31.3%
frontend 2781813182443 # 4.6% ( 4.6%) low
-- latency 1956290061026 # 3.2%
-- bandwidth 825523121417 # 1.4%
backend 36406367639532 # 60.5% (60.5%)
-- cpu 6992955970232 # 11.6%
-- memory 29413411669300 # 48.9%
speculation 1190714838886 # 2.0% ( 2.0%)
-- branch mispredict 814314213017 # 1.4%
-- pipeline restart 376400625869 # 0.6%
smt-contention 0 # 0.0% ( 0.0%)
cpu-cycles 26919240110996 # 1.68 GHz
instructions 38106253618914 # 1.42 IPC
l2 access 272117741070 # 14.360 l2 access per 1000 inst
l2 miss 202448005121 # 74.40% l2 miss
cpu-cycles 12117126869957 # 57.8% memory latency
load stalls 6988606434582 # 0.0% l1 bound
l1 miss 7393019851320 # 7.4% l2 bound
l2 miss 6498182419310 # 4.6% l3 bound
l3 miss 5937412868393 # 49.0% dram bound
store_stalls 19618397714 # 0.2% store bound
Process summary shows most of the time at the tConvolve routine for multi-threaded and less for the other routines.
724 processes
99 tConvolveMT 67707.10 193.13
48 tHogbomCleanOMP 3498.04 18.24
48 tConvolveOMP 2308.80 14.08
72 tConvolveMPI 2105.73 46.25
133 clinfo 35.82 12.09
38 vulkaninfo 0.95 1.33
18 mpirun 0.84 2.07
6 glxinfo:gdrv0 0.11 0.05
6 glxinfo:gl0 0.11 0.05
4 vulkani:disk$0 0.10 0.14
6 php 0.07 0.21
2 glxinfo 0.06 0.02
2 llvmpipe-0 0.05 0.07
2 llvmpipe-1 0.05 0.07
2 llvmpipe-10 0.05 0.07
2 llvmpipe-11 0.05 0.07
2 llvmpipe-12 0.05 0.07
2 llvmpipe-13 0.05 0.07
2 llvmpipe-14 0.05 0.07
2 llvmpipe-15 0.05 0.07
2 llvmpipe-2 0.05 0.07
2 llvmpipe-3 0.05 0.07
2 llvmpipe-4 0.05 0.07
2 llvmpipe-5 0.05 0.07
2 llvmpipe-6 0.05 0.07
2 llvmpipe-7 0.05 0.07
2 llvmpipe-8 0.05 0.07
2 llvmpipe-9 0.05 0.07
2 glxinfo:cs0 0.05 0.02
2 glxinfo:disk$0 0.05 0.02
2 glxinfo:sh0 0.05 0.02
2 glxinfo:shlo0 0.05 0.02
6 clang 0.04 0.06
3 rocminfo 0.00 0.03
1 lspci 0.00 0.02
90 sh 0.00 0.00
15 askap 0.00 0.00
13 gcc 0.00 0.00
11 gsettings 0.00 0.00
8 stat 0.00 0.00
8 systemd-detect- 0.00 0.00
6 llvm-link 0.00 0.00
5 phoronix-test-s 0.00 0.00
4 gmain 0.00 0.00
2 cc 0.00 0.00
2 lscpu 0.00 0.00
2 uname 0.00 0.00
2 which 0.00 0.00
2 xset 0.00 0.00
1 date 0.00 0.00
1 dconf worker 0.00 0.00
1 dirname 0.00 0.00
1 dmesg 0.00 0.00
1 dmidecode 0.00 0.00
1 grep 0.00 0.00
1 ifconfig 0.00 0.00
1 ip 0.00 0.00
1 lsmod 0.00 0.00
1 mktemp 0.00 0.00
1 ps 0.00 0.00
1 qdbus 0.00 0.00
1 readlink 0.00 0.00
1 realpath 0.00 0.00
1 sed 0.00 0.00
1 sort 0.00 0.00
1 stty 0.00 0.00
1 systemctl 0.00 0.00
1 template.sh 0.00 0.00
1 wc 0.00 0.00
1 xrandr 0.00 0.00
0 processes running
47 maximum processes
Computation structure varies by workload here is the MT workload
570869) askap cpu=1 start=6.06 finish=77.94
570870) tConvolveMT cpu=0 start=6.07 finish=77.94
570871) tConvolveMT cpu=1 start=22.47 finish=49.58
570872) tConvolveMT cpu=5 start=22.47 finish=49.63
570873) tConvolveMT cpu=7 start=22.47 finish=49.61
570874) tConvolveMT cpu=0 start=22.47 finish=49.42
570875) tConvolveMT cpu=1 start=22.47 finish=49.43
570876) tConvolveMT cpu=14 start=22.47 finish=49.61
570877) tConvolveMT cpu=2 start=22.47 finish=49.48
570878) tConvolveMT cpu=3 start=22.47 finish=49.62
570879) tConvolveMT cpu=12 start=22.47 finish=49.60
570880) tConvolveMT cpu=13 start=22.47 finish=49.64
570881) tConvolveMT cpu=15 start=22.47 finish=49.64
570882) tConvolveMT cpu=0 start=22.47 finish=49.59
570883) tConvolveMT cpu=9 start=22.47 finish=49.50
570884) tConvolveMT cpu=6 start=22.47 finish=49.63
570885) tConvolveMT cpu=10 start=22.47 finish=49.58
570886) tConvolveMT cpu=11 start=22.47 finish=49.57
570887) tConvolveMT cpu=12 start=49.64 finish=77.40
570888) tConvolveMT cpu=13 start=49.64 finish=77.52
570889) tConvolveMT cpu=6 start=49.64 finish=77.54
570890) tConvolveMT cpu=0 start=49.64 finish=77.44
570891) tConvolveMT cpu=1 start=49.64 finish=77.46
570892) tConvolveMT cpu=2 start=49.64 finish=77.59
570893) tConvolveMT cpu=15 start=49.64 finish=77.53
570894) tConvolveMT cpu=3 start=49.64 finish=77.58
570895) tConvolveMT cpu=7 start=49.64 finish=77.60
570896) tConvolveMT cpu=10 start=49.65 finish=77.52
570897) tConvolveMT cpu=5 start=49.65 finish=77.60
570898) tConvolveMT cpu=14 start=49.65 finish=77.59
570899) tConvolveMT cpu=8 start=49.65 finish=77.52
570900) tConvolveMT cpu=9 start=49.65 finish=77.50
570901) tConvolveMT cpu=4 start=49.65 finish=77.59
570902) tConvolveMT cpu=11 start=49.65 finish=77.53
Here is the MPI section
571050) askap cpu=7 start=277.73 finish=309.15
571051) mpirun cpu=1 start=277.73 finish=309.12
571054) mpirun cpu=8 start=278.35 finish=309.12
571055) mpirun cpu=1 start=278.35 finish=278.35
571056) mpirun cpu=10 start=278.37 finish=309.11
571057) mpirun cpu=11 start=278.85 finish=309.11
571058) mpirun cpu=7 start=278.85 finish=309.12
571059) tConvolveMPI cpu=3 start=278.90 finish=309.11
571061) tConvolveMPI cpu=11 start=278.90 finish=309.00
571064) tConvolveMPI cpu=11 start=278.91 finish=309.00
571060) tConvolveMPI cpu=11 start=278.90 finish=309.11
571063) tConvolveMPI cpu=2 start=278.91 finish=309.00
571067) tConvolveMPI cpu=8 start=278.91 finish=309.00
571062) tConvolveMPI cpu=4 start=278.90 finish=309.10
571066) tConvolveMPI cpu=12 start=278.91 finish=309.00
571069) tConvolveMPI cpu=5 start=278.92 finish=309.00
571065) tConvolveMPI cpu=1 start=278.91 finish=309.10
571070) tConvolveMPI cpu=12 start=278.92 finish=309.00
571074) tConvolveMPI cpu=0 start=278.92 finish=309.00
571068) tConvolveMPI cpu=6 start=278.91 finish=309.10
571072) tConvolveMPI cpu=15 start=278.92 finish=309.00
571077) tConvolveMPI cpu=15 start=278.93 finish=309.00
571071) tConvolveMPI cpu=7 start=278.92 finish=309.10
571075) tConvolveMPI cpu=12 start=278.93 finish=309.00
571079) tConvolveMPI cpu=14 start=278.93 finish=309.00
571073) tConvolveMPI cpu=13 start=278.92 finish=309.10
571078) tConvolveMPI cpu=0 start=278.93 finish=309.00
571081) tConvolveMPI cpu=9 start=278.94 finish=309.00
571076) tConvolveMPI cpu=8 start=278.93 finish=309.10
571080) tConvolveMPI cpu=8 start=278.93 finish=309.00
571082) tConvolveMPI cpu=2 start=278.94 finish=309.00
