Cloverleaf is a hydrodynamics benchmark with three workloads. Almost all the time is spent in the second workload. The overall profile suggests a runable process on every core.

Topdown metrics show a very memory bound application.with little time in retiring instructions.

AMD metrics show floating point code without many branches. There is a reasonably high L2 miss rate.

elapsed              5873.670
on_cpu               0.972          # 15.56 / 16 cores
utime                91203.508
stime                189.167
nvcsw                3950849        # 83.79%
nivcsw               764581         # 16.21%
inblock              8              # 0.00/sec
onblock              47808          # 8.14/sec
cpu-clock            91496573378200 # 91496.573 seconds
task-clock           91501724220020 # 91501.724 seconds
page faults          11117501       # 121.500/sec
context switches     4744569        # 51.852/sec
cpu migrations       100280         # 1.096/sec
major page faults    55             # 0.001/sec
minor page faults    11117446       # 121.500/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             5634504972305  # 52.699 branches per 1000 inst
branch misses        34518004796    # 0.61% branch miss
conditional          4723812723676  # 44.181 conditional branches per 1000 inst
indirect             450208732      # 0.004 indirect branches per 1000 inst
cpu-cycles           416297076847393 # 4.42 GHz
instructions         106923594819988 # 0.26 IPC
slots                832408287294990 #
retiring             36729774536367 #  4.4% ( 4.7%)
-- ucode             16033010471    #     0.0%
-- fastpath          36713741525896 #     4.4%
frontend             19057802930313 #  2.3% ( 2.4%)
-- latency           13207251742008 #     1.6%
-- bandwidth         5850551188305  #     0.7%
backend              733264248976694 # 88.1% (92.9%)
-- cpu               125417997459295 #    15.1%
-- memory            607846251517399 #    73.0%
speculation          665948773067   #  0.1% ( 0.1%)
-- branch mispredict 591782708845   #     0.1%
-- pipeline restart  74166064222    #     0.0%
smt-contention       42689832224123 #  5.1% ( 0.0%)
cpu-cycles           416281856017187 # 4.42 GHz
instructions         106924788382785 # 0.26 IPC
instructions         35640174860193 # 85.328 l2 access per 1000 inst
l2 hit from l1       1723378795510  # 34.64% l2 miss
l2 miss from l1      239934204030   #
l2 hit from l2 pf    504278478852   #
l3 hit from l2 pf    37875169844    #
l3 miss from l2 pf   775570101725   #
instructions         35622630123443 # 271.330 float per 1000 inst
float 512            60             # 0.000 AVX-512 per 1000 inst
float 256            672            # 0.000 AVX-256 per 1000 inst
float 128            9665485999618  # 271.330 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         1              # 0.000 scalar per 1000 inst

Intel metrics similarly show missing L2 cache.

elapsed              9195.794
on_cpu               0.893          # 14.28 / 16 cores
utime                131158.728
stime                180.440
nvcsw                4459732        # 79.97%
nivcsw               1117243        # 20.03%
inblock              7232           # 0.79/sec
onblock              53072          # 5.77/sec
cpu-clock            131303755757174 # 131303.756 seconds
task-clock           131309362105649 # 131309.362 seconds
page faults          11289647       # 85.977/sec
context switches     5622700        # 42.820/sec
cpu migrations       690226         # 5.256/sec
major page faults    162            # 0.001/sec
minor page faults    11289485       # 85.976/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             6119594162100  # 52.521 branches per 1000 inst
branch misses        32286755507    # 0.53% branch miss
conditional          6119594182260  # 52.521 conditional branches per 1000 inst
indirect             2090955801758  # 17.945 indirect branches per 1000 inst
slots                474390606585344 #
retiring             71789897080078 # 15.1% (15.1%)
-- ucode             13179968617235 #     2.8%
-- fastpath          58609928462843 #    12.4%
frontend             29418725753857 #  6.2% ( 6.2%)
-- latency           18241180851442 #     3.8%
-- bandwidth         11177544902415 #     2.4%
backend              378719773656819 # 79.8% (79.8%)
-- cpu               62995505319387 #    13.3%
-- memory            315724268337432 #    66.6%
speculation          4567039699865  #  1.0% ( 1.0%)
-- branch mispredict 3065558847064  #     0.6%
-- pipeline restart  1501480852801  #     0.3%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           345428388454584 # 2.51 GHz
instructions         122694459382065 # 0.36 IPC
l2 access            3480548081213  # 62.611 l2 access per 1000 inst
l2 miss              1791504627857  # 51.47% l2 miss

Process overview crashed part way through the second workload so we don’t have a full account.

elapsed              9195.794
on_cpu               0.893          # 14.28 / 16 cores
utime                131158.728
stime                180.440
nvcsw                4459732        # 79.97%
nivcsw               1117243        # 20.03%
inblock              7232           # 0.79/sec
onblock              53072          # 5.77/sec
cpu-clock            131303755757174 # 131303.756 seconds
task-clock           131309362105649 # 131309.362 seconds
page faults          11289647       # 85.977/sec
context switches     5622700        # 42.820/sec
cpu migrations       690226         # 5.256/sec
major page faults    162            # 0.001/sec
minor page faults    11289485       # 85.976/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             6119594162100  # 52.521 branches per 1000 inst
branch misses        32286755507    # 0.53% branch miss
conditional          6119594182260  # 52.521 conditional branches per 1000 inst
indirect             2090955801758  # 17.945 indirect branches per 1000 inst
slots                474390606585344 #
retiring             71789897080078 # 15.1% (15.1%)
-- ucode             13179968617235 #     2.8%
-- fastpath          58609928462843 #    12.4%
frontend             29418725753857 #  6.2% ( 6.2%)
-- latency           18241180851442 #     3.8%
-- bandwidth         11177544902415 #     2.4%
backend              378719773656819 # 79.8% (79.8%)
-- cpu               62995505319387 #    13.3%
-- memory            315724268337432 #    66.6%
speculation          4567039699865  #  1.0% ( 1.0%)
-- branch mispredict 3065558847064  #     0.6%
-- pipeline restart  1501480852801  #     0.3%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           345428388454584 # 2.51 GHz
instructions         122694459382065 # 0.36 IPC
l2 access            3480548081213  # 62.611 l2 access per 1000 inst
l2 miss              1791504627857  # 51.47% l2 miss