Software to simulate tsunami generation and propagation in context of early warning systems. There are three different sizes taking progressively longer time. In the chart below most of the time is spent on the third and very little on the first. This runs mostly on all cores.

Topdown profile shows almost all time spent in backend stalls with a low retirement rate and low frontend stalls.

AMD metrics confirm running on all cores, floating point application and 78% of the time in backend memory stalls. There is also a moderate branches.

elapsed              1922.175
on_cpu               0.909          # 14.55 / 16 cores
utime                27915.570
stime                46.870
nvcsw                39963          # 12.49%
nivcsw               279964         # 87.51%
inblock              0              # 0.00/sec
onblock              69134864       # 35967.00/sec
cpu-clock            27976304339567 # 27976.304 seconds
task-clock           27977334574561 # 27977.335 seconds
page faults          696190         # 24.884/sec
context switches     329321         # 11.771/sec
cpu migrations       2685           # 0.096/sec
major page faults    2              # 0.000/sec
minor page faults    696188         # 24.884/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             6576286403261  # 197.858 branches per 1000 inst
branch misses        12811375603    # 0.19% branch miss
conditional          6436339187318  # 193.648 conditional branches per 1000 inst
indirect             45656474482    # 1.374 indirect branches per 1000 inst
cpu-cycles           128164885936695 # 4.13 GHz
instructions         33244978034460 # 0.26 IPC
slots                256292148576894 #
retiring             11974794179054 #  4.7% ( 5.2%)
-- ucode             55136358437    #     0.0%
-- fastpath          11919657820617 #     4.7%
frontend             6620260644584  #  2.6% ( 2.9%)
-- latency           2623166581578  #     1.0%
-- bandwidth         3997094063006  #     1.6%
backend              212819428524789 # 83.0% (91.9%)
-- cpu               13141066975582 #     5.1%
-- memory            199678361549207 #    77.9%
speculation          220483707629   #  0.1% ( 0.1%)
-- branch mispredict 213211749567   #     0.1%
-- pipeline restart  7271958062     #     0.0%
smt-contention       24657088493168 #  9.6% ( 0.0%)
cpu-cycles           128093523958535 # 4.11 GHz
instructions         33242235307082 # 0.26 IPC
instructions         11083055466426 # 83.023 l2 access per 1000 inst
l2 hit from l1       525246389780   # 16.29% l2 miss
l2 miss from l1      15185401164    #
l2 hit from l2 pf    260187242099   #
l3 hit from l2 pf    5910338961     #
l3 miss from l2 pf   128808134580   #
instructions         11078878763592 # 357.411 float per 1000 inst
float 512            88             # 0.000 AVX-512 per 1000 inst
float 256            466            # 0.000 AVX-256 per 1000 inst
float 128            3959710616508  # 357.411 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         0              # 0.000 scalar per 1000 inst

Intel metrics

elapsed              4510.805
on_cpu               0.894          # 14.31 / 16 cores
utime                64492.705
stime                48.516
nvcsw                64519          # 10.62%
nivcsw               542930         # 89.38%
inblock              16             # 0.00/sec
onblock              108247000      # 23997.27/sec
cpu-clock            64550853397335 # 64550.853 seconds
task-clock           64551668228515 # 64551.668 seconds
page faults          1051104        # 16.283/sec
context switches     629765         # 9.756/sec
cpu migrations       25633          # 0.397/sec
major page faults    0              # 0.000/sec
minor page faults    1051104        # 16.283/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             10206746066589 # 196.935 branches per 1000 inst
branch misses        14336969272    # 0.14% branch miss
conditional          10206746087517 # 196.935 conditional branches per 1000 inst
indirect             1479536149642  # 28.547 indirect branches per 1000 inst
slots                179247222475460 #
retiring             21364128798497 # 11.9% (11.9%)
-- ucode             3236812364270  #     1.8%
-- fastpath          18127316434227 #    10.1%
frontend             9382535179469  #  5.2% ( 5.2%)
-- latency           7107848453687  #     4.0%
-- bandwidth         2274686725782  #     1.3%
backend              148857931511379 # 83.0% (83.0%)
-- cpu               50435694393639 #    28.1%
-- memory            98422237117740 #    54.9%
speculation          1685655017579  #  0.9% ( 0.9%)
-- branch mispredict 1663186096045  #     0.9%
-- pipeline restart  22468921534    #     0.0%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           106965986680387 # 2.32 GHz
instructions         32118256495506 # 0.30 IPC
l2 access            980416843402   # 55.491 l2 access per 1000 inst
l2 miss              448501745047   # 45.75% l2 miss

Process overview shows most time in the easywave processes.

523 processes
	153 easywave             447987.04   640.32
	 68 clinfo                  17.44     6.27
	 38 vulkaninfo               1.14     1.14
	  6 php                      0.32     1.20
	  4 vulkani:disk$0           0.12     0.12
	  6 glxinfo:gdrv0            0.12     0.05
	  6 glxinfo:gl0              0.12     0.05
	  2 llvmpipe-0               0.06     0.06
	  2 llvmpipe-1               0.06     0.06
	  2 llvmpipe-10              0.06     0.06
	  2 llvmpipe-11              0.06     0.06
	  2 llvmpipe-12              0.06     0.06
	  2 llvmpipe-13              0.06     0.06
	  2 llvmpipe-14              0.06     0.06
	  2 llvmpipe-15              0.06     0.06
	  2 llvmpipe-2               0.06     0.06
	  2 llvmpipe-3               0.06     0.06
	  2 llvmpipe-4               0.06     0.06
	  2 llvmpipe-5               0.06     0.06
	  2 llvmpipe-6               0.06     0.06
	  2 llvmpipe-7               0.06     0.06
	  2 llvmpipe-8               0.06     0.06
	  2 llvmpipe-9               0.06     0.06
	  2 glxinfo                  0.06     0.02
	  2 glxinfo:cs0              0.06     0.02
	  2 glxinfo:disk$0           0.06     0.02
	  2 glxinfo:sh0              0.06     0.02
	  2 glxinfo:shlo0            0.06     0.02
	  6 clang                    0.03     0.09
	 18 rm                       0.00     4.59
	  3 rocminfo                 0.00     0.03
	  1 lspci                    0.00     0.02
	  1 ps                       0.00     0.01
	 86 sh                       0.00     0.00
	 13 gcc                      0.00     0.00
	 11 gsettings                0.00     0.00
	  8 stat                     0.00     0.00
	  8 systemd-detect-          0.00     0.00
	  6 llvm-link                0.00     0.00
	  5 phoronix-test-s          0.00     0.00
	  4 gmain                    0.00     0.00
	  2 cc                       0.00     0.00
	  2 lscpu                    0.00     0.00
	  2 uname                    0.00     0.00
	  2 which                    0.00     0.00
	  2 xset                     0.00     0.00
	  1 date                     0.00     0.00
	  1 dconf worker             0.00     0.00
	  1 dirname                  0.00     0.00
	  1 dmesg                    0.00     0.00
	  1 dmidecode                0.00     0.00
	  1 grep                     0.00     0.00
	  1 ifconfig                 0.00     0.00
	  1 ip                       0.00     0.00
	  1 lsmod                    0.00     0.00
	  1 mktemp                   0.00     0.00
	  1 qdbus                    0.00     0.00
	  1 readlink                 0.00     0.00
	  1 realpath                 0.00     0.00
	  1 sed                      0.00     0.00
	  1 sort                     0.00     0.00
	  1 stty                     0.00     0.00
	  1 systemctl                0.00     0.00
	  1 template.sh              0.00     0.00
	  1 wc                       0.00     0.00
	  1 xrandr                   0.00     0.00
0 processes running
47 maximum processes

Process structure is simple with one process on each core.

      931467) easywave         cpu=3 start=5.58  finish=15.07
        931468) easywave         cpu=0 start=5.58  finish=15.04
          931469) easywave         cpu=12 start=6.48  finish=15.04
          931470) easywave         cpu=14 start=6.48  finish=15.04
          931471) easywave         cpu=7 start=6.48  finish=15.04
          931472) easywave         cpu=8 start=6.48  finish=15.04
          931473) easywave         cpu=9 start=6.48  finish=15.04
          931474) easywave         cpu=2 start=6.48  finish=15.04
          931475) easywave         cpu=11 start=6.48  finish=15.04
          931476) easywave         cpu=4 start=6.48  finish=15.04
          931477) easywave         cpu=13 start=6.48  finish=15.04
          931478) easywave         cpu=6 start=6.48  finish=15.04
          931479) easywave         cpu=15 start=6.48  finish=15.04
          931480) easywave         cpu=1 start=6.48  finish=15.04
          931481) easywave         cpu=10 start=6.48  finish=15.04
          931482) easywave         cpu=3 start=6.48  finish=15.04
          931483) easywave         cpu=5 start=6.48  finish=15.04
        931484) rm               cpu=0 start=15.04 finish=15.07
        931485) rm               cpu=6 start=15.07 finish=15.07