cam4 is a SPEC CPU(R) benchmark described here and written in Fortran and C. The workload runs on all logical cores.

Topdown profile shows it is dominated by backend stalls but with a varying profile over time.

AMD metrics confirm this as ~40% memory bound and ~20% CPU bound. Only ~60 L2 accesses per 1000 instructions with a 20% miss rate.

elapsed              1312.508
on_cpu               0.988          # 15.81 / 16 cores
utime                20528.728
stime                225.953
nvcsw                29409          # 12.94%
nivcsw               197920         # 87.06%
inblock              9840           # 7.50/sec
onblock              1228368        # 935.89/sec
cpu-clock            20757774966166 # 20757.775 seconds
task-clock           20757987236593 # 20757.987 seconds
page faults          75899772       # 3656.413/sec
context switches     226656         # 10.919/sec
cpu migrations       203            # 0.010/sec
major page faults    1395           # 0.067/sec
minor page faults    75898377       # 3656.346/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             9015331689309  # 124.383 branches per 1000 inst
branch misses        103625466297   # 1.15% branch miss
conditional          6474016811013  # 89.321 conditional branches per 1000 inst
indirect             635818303619   # 8.772 indirect branches per 1000 inst
cpu-cycles           85797916011836 # 4.07 GHz
instructions         72472521907425 # 0.84 IPC
slots                171567233970366 #
retiring             24958749784066 # 14.5% (16.4%)
-- ucode             5475009509     #     0.0%
-- fastpath          24953274774557 #    14.5%
frontend             24472498750726 # 14.3% (16.0%)
-- latency           15983869130694 #     9.3%
-- bandwidth         8488629620032  #     4.9%
backend              102011177142788 # 59.5% (66.9%)
-- cpu               33453535278844 #    19.5%
-- memory            68557641863944 #    40.0%
speculation          1135201453063  #  0.7% ( 0.7%) low
-- branch mispredict 1086579188867  #     0.6%
-- pipeline restart  48622264196    #     0.0%
smt-contention       18989527666963 # 11.1% ( 0.0%)
cpu-cycles           86324947438235 # 4.07 GHz
instructions         72455714789886 # 0.84 IPC
instructions         24154762289200 # 62.695 l2 access per 1000 inst
l2 hit from l1       1211423015562  # 20.78% l2 miss
l2 miss from l1      176222621478   #
l2 hit from l2 pf    164446062366   #
l3 hit from l2 pf    60313265001    #
l3 miss from l2 pf   78199571898    #
instructions         24154893211351 # 189.588 float per 1000 inst
float 512            299            # 0.000 AVX-512 per 1000 inst
float 256            23880669948    # 0.989 AVX-256 per 1000 inst
float 128            4555597500473  # 188.599 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         0              # 0.000 scalar per 1000 inst
instructions         72455795340647 #
opcache              11296118454852 # 155.904 opcache per 1000 inst
opcache miss         898107672809   #  8.0% opcache miss rate
l1 dTLB miss         224971601225   # 3.105 L1 dTLB per 1000 inst
l2 dTLB miss         9665338857     # 0.133 L2 dTLB per 1000 inst
instructions         72441840668917 #
icache               1522351345670  # 21.015 icache per 1000 inst
icache miss          394640134744   # 25.9% icache miss rate
l1 iTLB miss         13022382440    # 0.180 L1 iTLB per 1000 inst
l2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst
tlb flush            260298         # 0.000 TLB flush per 1000 inst

Process overview shows time spent in cam4_r_base.mev

693 processes
	 48 cam4_r_base.mev      20627.37   219.18
	 71 specperl                13.77     2.95
	 48 cam4_validate_5          1.01     0.41
	  2 clang                    0.01     0.01
	  2 flang                    0.01     0.01
	  1 lsb_release              0.01     0.00
	 11 ps                       0.00     0.01
	226 sh                       0.00     0.00
	 54 specrxp                  0.00     0.00
	 48 bash                     0.00     0.00
	 41 specinvoke               0.00     0.00
	 22 cat                      0.00     0.00
	 21 grep                     0.00     0.00
	 12 uniq                     0.00     0.00
	 11 sort                     0.00     0.00
	 10 expand                   0.00     0.00
	  7 specmake                 0.00     0.00
	  6 pwd                      0.00     0.00
	  5 basename                 0.00     0.00
	  5 systemctl                0.00     0.00
	  4 rm                       0.00     0.00
	  4 specpp                   0.00     0.00
	  4 uname                    0.00     0.00
	  3 dirname                  0.00     0.00
	  3 dmidecode                0.00     0.00
	  3 lscpu                    0.00     0.00
	  2 df                       0.00     0.00
	  2 dpkg                     0.00     0.00
	  2 runcpu                   0.00     0.00
	  2 specsha512sum            0.00     0.00
	  2 specxz                   0.00     0.00
	  2 who                      0.00     0.00
	  1 cpupower                 0.00     0.00
	  1 head                     0.00     0.00
	  1 logname                  0.00     0.00
	  1 ls                       0.00     0.00
	  1 numactl                  0.00     0.00
	  1 sysctl                   0.00     0.00
	  1 w                        0.00     0.00
	  1 wc                       0.00     0.00
	  1 which                    0.00     0.00
0 processes running
53 maximum processes

specinvoke runs separate copies on each core.

440422) specinvoke       cpu=14 start=4.62  finish=438.07
  440424) sh               cpu=13 start=4.62  finish=435.22
    440430) bash             cpu=0 start=4.63  finish=435.22
      440455) cam4_r_base.mev  cpu=0 start=4.63  finish=435.11
  440425) sh               cpu=8 start=4.63  finish=434.75
    440431) bash             cpu=1 start=4.63  finish=434.75
      440456) cam4_r_base.mev  cpu=1 start=4.63  finish=434.62
  440426) sh               cpu=10 start=4.63  finish=436.92
    440433) bash             cpu=2 start=4.63  finish=436.91
      440457) cam4_r_base.mev  cpu=2 start=4.63  finish=436.84
  440427) sh               cpu=9 start=4.63  finish=437.71
    440437) bash             cpu=3 start=4.63  finish=437.71
      440458) cam4_r_base.mev  cpu=3 start=4.63  finish=437.64
  440428) sh               cpu=4 start=4.63  finish=438.07
    440438) bash             cpu=4 start=4.63  finish=438.07
      440462) cam4_r_base.mev  cpu=4 start=4.63  finish=437.99
  440429) sh               cpu=13 start=4.63  finish=436.44
    440440) bash             cpu=5 start=4.63  finish=436.44
      440459) cam4_r_base.mev  cpu=5 start=4.63  finish=436.37
  440432) sh               cpu=14 start=4.63  finish=435.07
    440439) bash             cpu=6 start=4.63  finish=435.07
      440460) cam4_r_base.mev  cpu=6 start=4.63  finish=434.98
  440434) sh               cpu=7 start=4.63  finish=437.53
    440443) bash             cpu=7 start=4.63  finish=437.53
      440464) cam4_r_base.mev  cpu=7 start=4.63  finish=437.43
  440435) sh               cpu=8 start=4.63  finish=432.12
    440446) bash             cpu=8 start=4.63  finish=432.12
      440463) cam4_r_base.mev  cpu=8 start=4.63  finish=431.97
  440436) sh               cpu=1 start=4.63  finish=435.17
    440448) bash             cpu=9 start=4.63  finish=435.17
      440466) cam4_r_base.mev  cpu=9 start=4.63  finish=435.07
  440441) sh               cpu=9 start=4.63  finish=436.55
    440450) bash             cpu=10 start=4.63  finish=436.55
      440465) cam4_r_base.mev  cpu=10 start=4.63  finish=436.46
  440442) sh               cpu=5 start=4.63  finish=437.18
    440451) bash             cpu=11 start=4.63  finish=437.18
      440469) cam4_r_base.mev  cpu=11 start=4.63  finish=437.08
  440444) sh               cpu=12 start=4.63  finish=438.04
    440452) bash             cpu=12 start=4.63  finish=438.04
      440468) cam4_r_base.mev  cpu=12 start=4.63  finish=437.94
  440445) sh               cpu=14 start=4.63  finish=435.20
    440453) bash             cpu=13 start=4.63  finish=435.20
      440467) cam4_r_base.mev  cpu=13 start=4.63  finish=435.07
  440447) sh               cpu=8 start=4.63  finish=432.74
    440454) bash             cpu=14 start=4.63  finish=432.74
      440470) cam4_r_base.mev  cpu=14 start=4.63  finish=432.62
  440449) sh               cpu=15 start=4.63  finish=437.93
    440461) bash             cpu=15 start=4.63  finish=437.92
      440471) cam4_r_base.mev  cpu=15 start=4.63  finish=437.86