askap is the Australian SKA Pathfinder and their benchmarks. The first four run a tConvolve using MT (multi-threaded), MPI, OpenCL and OpenMP. The OpenCL fails and other three run. The last workload is different.

Topdown profile shows the MT and MPI workloads are mostly backend bound. Others are more variable.

AMD metrics show we run on half the cores on average. This is heavy floating point code with a very low level of frontend stalls and a low IPC. Backend stalls, particularly memory dominate.

Intel metrics also show heavy backend stalls. The memory latency breakdown suggests this is mostly at the DRAM level of the hierarchy.

elapsed              692.645
on_cpu               0.563          # 9.01 / 16 cores
utime                6197.316
stime                43.226
nvcsw                52242          # 59.50%
nivcsw               35559          # 40.50%
inblock              484744         # 699.84/sec
onblock              838424         # 1210.47/sec
cpu-clock            6241158137516  # 6241.158 seconds
task-clock           6241202869038  # 6241.203 seconds
page faults          18726253       # 3000.424/sec
context switches     90981          # 14.577/sec
cpu migrations       7128           # 1.142/sec
major page faults    1834           # 0.294/sec
minor page faults    18724419       # 3000.130/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             3273043868037  # 168.725 branches per 1000 inst
branch misses        3267599213     # 0.10% branch miss
conditional          3273043892613  # 168.725 conditional branches per 1000 inst
indirect             546410675684   # 28.167 indirect branches per 1000 inst
slots                60204722223794 #
retiring             20347657209861 # 33.8% (33.8%)
-- ucode             1484017206434  #     2.5%
-- fastpath          18863640003427 #    31.3%
frontend             2781813182443  #  4.6% ( 4.6%) low
-- latency           1956290061026  #     3.2%
-- bandwidth         825523121417   #     1.4%
backend              36406367639532 # 60.5% (60.5%)
-- cpu               6992955970232  #    11.6%
-- memory            29413411669300 #    48.9%
speculation          1190714838886  #  2.0% ( 2.0%)
-- branch mispredict 814314213017   #     1.4%
-- pipeline restart  376400625869   #     0.6%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           26919240110996 # 1.68 GHz
instructions         38106253618914 # 1.42 IPC
l2 access            272117741070   # 14.360 l2 access per 1000 inst
l2 miss              202448005121   # 74.40% l2 miss
cpu-cycles           12117126869957 # 57.8% memory latency
load stalls          6988606434582  #  0.0% l1 bound
l1 miss              7393019851320  #  7.4% l2 bound
l2 miss              6498182419310  #  4.6% l3 bound
l3 miss              5937412868393  # 49.0% dram bound
store_stalls         19618397714    #  0.2% store bound

Process summary shows most of the time at the tConvolve routine for multi-threaded and less for the other routines.

724 processes
	 99 tConvolveMT          67707.10   193.13
	 48 tHogbomCleanOMP       3498.04    18.24
	 48 tConvolveOMP          2308.80    14.08
	 72 tConvolveMPI          2105.73    46.25
	133 clinfo                  35.82    12.09
	 38 vulkaninfo               0.95     1.33
	 18 mpirun                   0.84     2.07
	  6 glxinfo:gdrv0            0.11     0.05
	  6 glxinfo:gl0              0.11     0.05
	  4 vulkani:disk$0           0.10     0.14
	  6 php                      0.07     0.21
	  2 glxinfo                  0.06     0.02
	  2 llvmpipe-0               0.05     0.07
	  2 llvmpipe-1               0.05     0.07
	  2 llvmpipe-10              0.05     0.07
	  2 llvmpipe-11              0.05     0.07
	  2 llvmpipe-12              0.05     0.07
	  2 llvmpipe-13              0.05     0.07
	  2 llvmpipe-14              0.05     0.07
	  2 llvmpipe-15              0.05     0.07
	  2 llvmpipe-2               0.05     0.07
	  2 llvmpipe-3               0.05     0.07
	  2 llvmpipe-4               0.05     0.07
	  2 llvmpipe-5               0.05     0.07
	  2 llvmpipe-6               0.05     0.07
	  2 llvmpipe-7               0.05     0.07
	  2 llvmpipe-8               0.05     0.07
	  2 llvmpipe-9               0.05     0.07
	  2 glxinfo:cs0              0.05     0.02
	  2 glxinfo:disk$0           0.05     0.02
	  2 glxinfo:sh0              0.05     0.02
	  2 glxinfo:shlo0            0.05     0.02
	  6 clang                    0.04     0.06
	  3 rocminfo                 0.00     0.03
	  1 lspci                    0.00     0.02
	 90 sh                       0.00     0.00
	 15 askap                    0.00     0.00
	 13 gcc                      0.00     0.00
	 11 gsettings                0.00     0.00
	  8 stat                     0.00     0.00
	  8 systemd-detect-          0.00     0.00
	  6 llvm-link                0.00     0.00
	  5 phoronix-test-s          0.00     0.00
	  4 gmain                    0.00     0.00
	  2 cc                       0.00     0.00
	  2 lscpu                    0.00     0.00
	  2 uname                    0.00     0.00
	  2 which                    0.00     0.00
	  2 xset                     0.00     0.00
	  1 date                     0.00     0.00
	  1 dconf worker             0.00     0.00
	  1 dirname                  0.00     0.00
	  1 dmesg                    0.00     0.00
	  1 dmidecode                0.00     0.00
	  1 grep                     0.00     0.00
	  1 ifconfig                 0.00     0.00
	  1 ip                       0.00     0.00
	  1 lsmod                    0.00     0.00
	  1 mktemp                   0.00     0.00
	  1 ps                       0.00     0.00
	  1 qdbus                    0.00     0.00
	  1 readlink                 0.00     0.00
	  1 realpath                 0.00     0.00
	  1 sed                      0.00     0.00
	  1 sort                     0.00     0.00
	  1 stty                     0.00     0.00
	  1 systemctl                0.00     0.00
	  1 template.sh              0.00     0.00
	  1 wc                       0.00     0.00
	  1 xrandr                   0.00     0.00
0 processes running
47 maximum processes

Computation structure varies by workload here is the MT workload

      570869) askap            cpu=1 start=6.06  finish=77.94
        570870) tConvolveMT      cpu=0 start=6.07  finish=77.94
          570871) tConvolveMT      cpu=1 start=22.47 finish=49.58
          570872) tConvolveMT      cpu=5 start=22.47 finish=49.63
          570873) tConvolveMT      cpu=7 start=22.47 finish=49.61
          570874) tConvolveMT      cpu=0 start=22.47 finish=49.42
          570875) tConvolveMT      cpu=1 start=22.47 finish=49.43
          570876) tConvolveMT      cpu=14 start=22.47 finish=49.61
          570877) tConvolveMT      cpu=2 start=22.47 finish=49.48
          570878) tConvolveMT      cpu=3 start=22.47 finish=49.62
          570879) tConvolveMT      cpu=12 start=22.47 finish=49.60
          570880) tConvolveMT      cpu=13 start=22.47 finish=49.64
          570881) tConvolveMT      cpu=15 start=22.47 finish=49.64
          570882) tConvolveMT      cpu=0 start=22.47 finish=49.59
          570883) tConvolveMT      cpu=9 start=22.47 finish=49.50
          570884) tConvolveMT      cpu=6 start=22.47 finish=49.63
          570885) tConvolveMT      cpu=10 start=22.47 finish=49.58
          570886) tConvolveMT      cpu=11 start=22.47 finish=49.57
          570887) tConvolveMT      cpu=12 start=49.64 finish=77.40
          570888) tConvolveMT      cpu=13 start=49.64 finish=77.52
          570889) tConvolveMT      cpu=6 start=49.64 finish=77.54
          570890) tConvolveMT      cpu=0 start=49.64 finish=77.44
          570891) tConvolveMT      cpu=1 start=49.64 finish=77.46
          570892) tConvolveMT      cpu=2 start=49.64 finish=77.59
          570893) tConvolveMT      cpu=15 start=49.64 finish=77.53
          570894) tConvolveMT      cpu=3 start=49.64 finish=77.58
          570895) tConvolveMT      cpu=7 start=49.64 finish=77.60
          570896) tConvolveMT      cpu=10 start=49.65 finish=77.52
          570897) tConvolveMT      cpu=5 start=49.65 finish=77.60
          570898) tConvolveMT      cpu=14 start=49.65 finish=77.59
          570899) tConvolveMT      cpu=8 start=49.65 finish=77.52
          570900) tConvolveMT      cpu=9 start=49.65 finish=77.50
          570901) tConvolveMT      cpu=4 start=49.65 finish=77.59
          570902) tConvolveMT      cpu=11 start=49.65 finish=77.53

Here is the MPI section

      571050) askap            cpu=7 start=277.73 finish=309.15
        571051) mpirun           cpu=1 start=277.73 finish=309.12
          571054) mpirun           cpu=8 start=278.35 finish=309.12
          571055) mpirun           cpu=1 start=278.35 finish=278.35
          571056) mpirun           cpu=10 start=278.37 finish=309.11
          571057) mpirun           cpu=11 start=278.85 finish=309.11
          571058) mpirun           cpu=7 start=278.85 finish=309.12
          571059) tConvolveMPI     cpu=3 start=278.90 finish=309.11
            571061) tConvolveMPI     cpu=11 start=278.90 finish=309.00
            571064) tConvolveMPI     cpu=11 start=278.91 finish=309.00
          571060) tConvolveMPI     cpu=11 start=278.90 finish=309.11
            571063) tConvolveMPI     cpu=2 start=278.91 finish=309.00
            571067) tConvolveMPI     cpu=8 start=278.91 finish=309.00
          571062) tConvolveMPI     cpu=4 start=278.90 finish=309.10
            571066) tConvolveMPI     cpu=12 start=278.91 finish=309.00
            571069) tConvolveMPI     cpu=5 start=278.92 finish=309.00
          571065) tConvolveMPI     cpu=1 start=278.91 finish=309.10
            571070) tConvolveMPI     cpu=12 start=278.92 finish=309.00
            571074) tConvolveMPI     cpu=0 start=278.92 finish=309.00
          571068) tConvolveMPI     cpu=6 start=278.91 finish=309.10
            571072) tConvolveMPI     cpu=15 start=278.92 finish=309.00
            571077) tConvolveMPI     cpu=15 start=278.93 finish=309.00
          571071) tConvolveMPI     cpu=7 start=278.92 finish=309.10
            571075) tConvolveMPI     cpu=12 start=278.93 finish=309.00
            571079) tConvolveMPI     cpu=14 start=278.93 finish=309.00
          571073) tConvolveMPI     cpu=13 start=278.92 finish=309.10
            571078) tConvolveMPI     cpu=0 start=278.93 finish=309.00
            571081) tConvolveMPI     cpu=9 start=278.94 finish=309.00
          571076) tConvolveMPI     cpu=8 start=278.93 finish=309.10
            571080) tConvolveMPI     cpu=8 start=278.93 finish=309.00
            571082) tConvolveMPI     cpu=2 start=278.94 finish=309.00