Hydrodynamics on unstructured meshes. There are two workloads. Overall, half the threads are kept busy.

Topdown profile shows backend stalls dominating wit h low frontend stalls.

AMD metrics show floating point code. Backend stalls are balanced between memory and CPU. Frontend stalls are low.

elapsed              348.985
on_cpu               0.440          # 7.03 / 16 cores
utime                2447.057
stime                7.693
nvcsw                64303          # 90.24%
nivcsw               6956           # 9.76%
inblock              0              # 0.00/sec
onblock              100816         # 288.88/sec
cpu-clock            2454839428334  # 2454.839 seconds
task-clock           2454864590772  # 2454.865 seconds
page faults          585693         # 238.585/sec
context switches     72808          # 29.659/sec
cpu migrations       4729           # 1.926/sec
major page faults    510            # 0.208/sec
minor page faults    585183         # 238.377/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             1405170392315  # 66.567 branches per 1000 inst
branch misses        2889441026     # 0.21% branch miss
conditional          1268663493424  # 60.100 conditional branches per 1000 inst
indirect             23984196334    # 1.136 indirect branches per 1000 inst
cpu-cycles           10932622319963 # 1.92 GHz
instructions         21196256443065 # 1.94 IPC
slots                21869456447430 #
retiring             7385737678440  # 33.8% (33.8%)
-- ucode             1290957661     #     0.0%
-- fastpath          7384446720779  #    33.8%
frontend             805383992505   #  3.7% ( 3.7%) low
-- latency           491365910592   #     2.2%
-- bandwidth         314018081913   #     1.4%
backend              13420593631108 # 61.4% (61.4%)
-- cpu               6892842816665  #    31.5%
-- memory            6527750814443  #    29.8%
speculation          240024275753   #  1.1% ( 1.1%)
-- branch mispredict 109870243527   #     0.5%
-- pipeline restart  130154032226   #     0.6%
smt-contention       17709448967    #  0.1% ( 0.0%)
cpu-cycles           10923049238973 # 1.96 GHz
instructions         21070010698041 # 1.93 IPC
instructions         7025290484683  # 27.155 l2 access per 1000 inst
l2 hit from l1       152693226788   # 17.98% l2 miss
l2 miss from l1      8245427283     #
l2 hit from l2 pf    12025097090    #
l3 hit from l2 pf    1003945621     #
l3 miss from l2 pf   25051373796    #
instructions         7021052031386  # 331.768 float per 1000 inst
float 512            93             # 0.000 AVX-512 per 1000 inst
float 256            888            # 0.000 AVX-256 per 1000 inst
float 128            2329362650245  # 331.768 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         0              # 0.000 scalar per 1000 inst
instructions         2384561        #
opcache              896903         # 376.129 opcache per 1000 inst
opcache miss         477070         # 53.2% opcache miss rate
l1 dTLB miss         4104           # 1.721 L1 dTLB per 1000 inst
l2 dTLB miss         973            # 0.408 L2 dTLB per 1000 inst
instructions         2422756        #
icache               1198318        # 494.609 icache per 1000 inst
icache miss          112292         #  9.4% icache miss rate
l1 iTLB miss         8              # 0.003 L1 iTLB per 1000 inst
l2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst
tlb flush            19             # 0.008 TLB flush per 1000 inst

Intel metrics

elapsed              2939.717
on_cpu               0.733          # 11.73 / 16 cores
utime                34465.043
stime                14.981
nvcsw                179750         # 80.58%
nivcsw               43316          # 19.42%
inblock              22208          # 7.55/sec
onblock              185160         # 62.99/sec
cpu-clock            34480504457611 # 34480.504 seconds
task-clock           34480558245085 # 34480.558 seconds
page faults          1314803        # 38.132/sec
context switches     237469         # 6.887/sec
cpu migrations       34346          # 0.996/sec
major page faults    1445           # 0.042/sec
minor page faults    1313358        # 38.090/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             44244560364890 # 165.908 branches per 1000 inst
branch misses        113417834935   # 0.26% branch miss
conditional          44244560391034 # 165.908 conditional branches per 1000 inst
indirect             14350734555496 # 53.812 indirect branches per 1000 inst
slots                258818070782372 #
retiring             165943745858655 # 64.1% (64.1%) high
-- ucode             11698884463456 #     4.5%
-- fastpath          154244861395199 #    59.6%
frontend             22228896025978 #  8.6% ( 8.6%)
-- latency           8941122571912  #     3.5%
-- bandwidth         13287773454066 #     5.1%
backend              57414286431989 # 22.2% (22.2%)
-- cpu               40082922132210 #    15.5%
-- memory            17331364299779 #     6.7%
speculation          11415921070534 #  4.4% ( 4.4%)
-- branch mispredict 7032646758570  #     2.7%
-- pipeline restart  4383274311964  #     1.7%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           96461605636760 # 1.97 GHz
instructions         388850975926936 # 4.03 IPC high
l2 access            351070041147   # 2.360 l2 access per 1000 inst
l2 miss              138857980889   # 39.55% l2 miss
cpu-cycles           37072612143813 #  7.2% memory latency
load stalls          2501412244162  #  2.1% l1 bound
l1 miss              1724915419158  #  1.9% l2 bound
l2 miss              1037045631873  #  0.9% l3 bound
l3 miss              705376648870   #  1.9% dram bound
store_stalls         158962328723   #  0.4% store bound

Process summary shows the pennant process driving the majority of the time.

428 processes
	150 pennant               7307.63    17.49
	 38 vulkaninfo               1.33     1.14
	 24 mpirun                   0.52     2.76
	  4 vulkani:disk$0           0.14     0.12
	  6 glxinfo:gdrv0            0.14     0.09
	  6 glxinfo:gl0              0.13     0.09
	  6 php                      0.07     0.15
	  2 llvmpipe-0               0.07     0.06
	  2 llvmpipe-1               0.07     0.06
	  2 llvmpipe-10              0.07     0.06
	  2 llvmpipe-11              0.07     0.06
	  2 llvmpipe-12              0.07     0.06
	  2 llvmpipe-13              0.07     0.06
	  2 llvmpipe-14              0.07     0.06
	  2 llvmpipe-15              0.07     0.06
	  2 llvmpipe-2               0.07     0.06
	  2 llvmpipe-3               0.07     0.06
	  2 llvmpipe-4               0.07     0.06
	  2 llvmpipe-5               0.07     0.06
	  2 llvmpipe-6               0.07     0.06
	  2 llvmpipe-7               0.07     0.06
	  2 llvmpipe-8               0.07     0.06
	  2 llvmpipe-9               0.07     0.06
	  2 glxinfo                  0.07     0.03
	  2 glxinfo:cs0              0.07     0.03
	  2 glxinfo:disk$0           0.07     0.03
	  2 glxinfo:sh0              0.07     0.03
	  2 glxinfo:shlo0            0.07     0.03
	  1 lspci                    0.02     0.01
	 70 sh                       0.00     0.00
	 13 gcc                      0.00     0.00
	  8 gsettings                0.00     0.00
	  8 stat                     0.00     0.00
	  8 systemd-detect-          0.00     0.00
	  5 gmain                    0.00     0.00
	  5 phoronix-test-s          0.00     0.00
	  3 dconf worker             0.00     0.00
	  2 cc                       0.00     0.00
	  2 clinfo                   0.00     0.00
	  2 lscpu                    0.00     0.00
	  2 uname                    0.00     0.00
	  2 which                    0.00     0.00
	  2 xset                     0.00     0.00
	  1 date                     0.00     0.00
	  1 dirname                  0.00     0.00
	  1 dmesg                    0.00     0.00
	  1 dmidecode                0.00     0.00
	  1 grep                     0.00     0.00
	  1 ifconfig                 0.00     0.00
	  1 ip                       0.00     0.00
	  1 lsmod                    0.00     0.00
	  1 mktemp                   0.00     0.00
	  1 ps                       0.00     0.00
	  1 readlink                 0.00     0.00
	  1 realpath                 0.00     0.00
	  1 sed                      0.00     0.00
	  1 sort                     0.00     0.00
	  1 stty                     0.00     0.00
	  1 systemctl                0.00     0.00
	  1 template.sh              0.00     0.00
	  1 wc                       0.00     0.00
	  1 xrandr                   0.00     0.00
0 processes running
47 maximum processes

Process structure shows MPI being used to distribute pennant processes.

      124349) pennant          cpu=0 start=4.89  finish=65.66
        124350) mpirun           cpu=3 start=4.89  finish=65.66
          124352) mpirun           cpu=15 start=5.43  finish=65.65
          124353) mpirun           cpu=7 start=5.92  finish=65.65
          124354) mpirun           cpu=11 start=5.92  finish=65.66
          124355) pennant          cpu=11 start=5.95  finish=65.65
            124357) pennant          cpu=15 start=5.96  finish=65.64
            124361) pennant          cpu=1 start=5.96  finish=65.64
          124356) pennant          cpu=2 start=5.96  finish=65.65
            124360) pennant          cpu=11 start=5.96  finish=65.64
            124364) pennant          cpu=6 start=5.97  finish=65.64
          124358) pennant          cpu=5 start=5.96  finish=65.65
            124362) pennant          cpu=15 start=5.97  finish=65.64
            124367) pennant          cpu=8 start=5.97  finish=65.64
          124359) pennant          cpu=12 start=5.96  finish=65.65
            124366) pennant          cpu=15 start=5.97  finish=65.64
            124370) pennant          cpu=4 start=5.97  finish=65.64
          124363) pennant          cpu=14 start=5.97  finish=65.64
            124369) pennant          cpu=4 start=5.97  finish=65.64
            124373) pennant          cpu=3 start=5.98  finish=65.64
          124365) pennant          cpu=10 start=5.97  finish=65.65
            124372) pennant          cpu=7 start=5.98  finish=65.64
            124375) pennant          cpu=11 start=5.98  finish=65.64
          124368) pennant          cpu=9 start=5.97  finish=65.64
            124374) pennant          cpu=15 start=5.98  finish=65.64
            124377) pennant          cpu=15 start=5.99  finish=65.64
          124371) pennant          cpu=0 start=5.98  finish=65.64
            124376) pennant          cpu=1 start=5.98  finish=65.64
            124378) pennant          cpu=13 start=5.99  finish=65.64