A framework for general purpose Computational Fluid Dynamics (CFD). This has five workload of the Cavity3d benchmark with different sizes. The first three work on AMD system and first two work on Intel system. The other larger ones fail. Looks like these run on the physical (not hyperthreaded) cores, perhaps with MPI.

Topdown profile shows a workload with increasing backend stalls and lower retirement rate.

AMD metrics show this as floating point code with a low level of frontend stalls or speculation stalls.

elapsed              836.698
on_cpu               0.435          # 6.96 / 16 cores
utime                5702.901
stime                123.549
nvcsw                123143         # 87.25%
nivcsw               17988          # 12.75%
inblock              2196096        # 2624.72/sec
onblock              265480         # 317.29/sec
cpu-clock            5887597402841  # 5887.597 seconds
task-clock           5887670199615  # 5887.670 seconds
page faults          53493667       # 9085.711/sec
context switches     226140         # 38.409/sec
cpu migrations       12308          # 2.090/sec
major page faults    17874          # 3.036/sec
minor page faults    53475762       # 9082.669/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             1869589369583  # 59.480 branches per 1000 inst
branch misses        10705318795    # 0.57% branch miss
conditional          1042825476361  # 33.177 conditional branches per 1000 inst
indirect             129681340683   # 4.126 indirect branches per 1000 inst
cpu-cycles           26305189422647 # 1.94 GHz
instructions         31441737916631 # 1.20 IPC
slots                52622345068602 #
retiring             10889011000500 # 20.7% (20.7%)
-- ucode             30457782863    #     0.1%
-- fastpath          10858553217637 #    20.6%
frontend             1557951220896  #  3.0% ( 3.0%) low
-- latency           979774237896   #     1.9%
-- bandwidth         578176983000   #     1.1%
backend              40040356750516 # 76.1% (76.1%) high
-- cpu               6363537379963  #    12.1%
-- memory            33676819370553 #    64.0%
speculation          100408613268   #  0.2% ( 0.2%) low
-- branch mispredict 91032876463    #     0.2%
-- pipeline restart  9375736805     #     0.0%
smt-contention       34598398476    #  0.1% ( 0.0%)
cpu-cycles           26161990460125 # 1.94 GHz
instructions         31564444349673 # 1.21 IPC
instructions         10526717004116 # 17.343 l2 access per 1000 inst
l2 hit from l1       128701953755   # 39.78% l2 miss
l2 miss from l1      37747882576    #
l2 hit from l2 pf    18980760670    #
l3 hit from l2 pf    4459996432     #
l3 miss from l2 pf   30423851490    #
instructions         10525823050089 # 357.908 float per 1000 inst
float 512            126            # 0.000 AVX-512 per 1000 inst
float 256            1080           # 0.000 AVX-256 per 1000 inst
float 128            3767274378333  # 357.908 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         0              # 0.000 scalar per 1000 inst
instructions         31499731796006 #
opcache              3728490971511  # 118.366 opcache per 1000 inst
opcache miss         92748095595    #  2.5% opcache miss rate
l1 dTLB miss         12301153494    # 0.391 L1 dTLB per 1000 inst
l2 dTLB miss         4858292548     # 0.154 L2 dTLB per 1000 inst
instructions         31580053403705 #
icache               191823644658   # 6.074 icache per 1000 inst
icache miss          6594453889     #  3.4% icache miss rate
l1 iTLB miss         36650924       # 0.001 L1 iTLB per 1000 inst
l2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst
tlb flush            510515         # 0.000 TLB flush per 1000 inst

Intel metrics show a 30% dram bound aspect to the backend stalls.

elapsed              1048.526
on_cpu               0.668          # 10.69 / 16 cores
utime                10973.031
stime                231.736
nvcsw                139902         # 82.38%
nivcsw               29919          # 17.62%
inblock              719856         # 686.54/sec
onblock              254096         # 242.34/sec
cpu-clock            11324229583345 # 11324.230 seconds
task-clock           11324321973368 # 11324.322 seconds
page faults          38708439       # 3418.168/sec
context switches     339854         # 30.011/sec
cpu migrations       45994          # 4.062/sec
major page faults    19561          # 1.727/sec
minor page faults    38688804       # 3416.434/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             5415329661932  # 102.842 branches per 1000 inst
branch misses        13712134509    # 0.25% branch miss
conditional          5415329691340  # 102.842 conditional branches per 1000 inst
indirect             1101290966057  # 20.914 indirect branches per 1000 inst
slots                75572942608994 #
retiring             30196634336577 # 40.0% (40.0%)
-- ucode             2181580200259  #     2.9%
-- fastpath          28015054136318 #    37.1%
frontend             2905024145432  #  3.8% ( 3.8%) low
-- latency           1591010362308  #     2.1%
-- bandwidth         1314013783124  #     1.7%
backend              40975150606718 # 54.2% (54.2%)
-- cpu               10926288753421 #    14.5%
-- memory            30048861853297 #    39.8%
speculation          2197782267550  #  2.9% ( 2.9%)
-- branch mispredict 1871529610415  #     2.5%
-- pipeline restart  326252657135   #     0.4%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           33605195706463 # 2.00 GHz
instructions         80442309203356 # 2.39 IPC
l2 access            208504898413   # 6.909 l2 access per 1000 inst
l2 miss              113006068651   # 54.20% l2 miss
cpu-cycles           12599040970847 # 41.5% memory latency
load stalls          5100446679052  #  0.6% l1 bound
l1 miss              5028041939034  #  4.0% l2 bound
l2 miss              4528516333239  #  5.3% l3 bound
l3 miss              3866423296808  # 30.7% dram bound
store_stalls         123166211851   #  1.0% store bound