wrf is a SPEC CPU(R) benchmark described here and written in C and Fortran. The workload runs on all logical cores.

Topdown profile shows this is backend-bound workload.

AMD metrics on 7840 processor show memory stalls as the largest issue.
elapsed 1953.561
on_cpu 0.987 # 15.79 / 16 cores
utime 30800.669
stime 53.643
nvcsw 41538 # 11.04%
nivcsw 334706 # 88.96%
inblock 0 # 0.00/sec
onblock 2841304 # 1454.42/sec
cpu-clock 30864242737605 # 30864.243 seconds
task-clock 30864977164171 # 30864.977 seconds
page faults 6126037 # 198.479/sec
context switches 375594 # 12.169/sec
cpu migrations 215 # 0.007/sec
major page faults 1090 # 0.035/sec
minor page faults 6124947 # 198.443/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 6636680282416 # 113.379 branches per 1000 inst
branch misses 59682724436 # 0.90% branch miss
conditional 4546842780846 # 77.677 conditional branches per 1000 inst
indirect 816253071028 # 13.945 indirect branches per 1000 inst
cpu-cycles 136220022944715 # 4.35 GHz
instructions 58535248389856 # 0.43 IPC low
slots 272403659326590 #
retiring 20112540460746 # 7.4% ( 7.8%) low
-- ucode 2521696306 # 0.0%
-- fastpath 20110018764440 # 7.4%
frontend 20130144250554 # 7.4% ( 7.9%)
-- latency 16833110598090 # 6.2%
-- bandwidth 3297033652464 # 1.2%
backend 214918740039030 # 78.9% (83.8%) high
-- cpu 42070191486014 # 15.4%
-- memory 172848548553016 # 63.5%
speculation 1185885235261 # 0.4% ( 0.5%) low
-- branch mispredict 1078145850915 # 0.4%
-- pipeline restart 107739384346 # 0.0%
smt-contention 16056201938033 # 5.9% ( 0.0%)
cpu-cycles 136075231851736 # 4.34 GHz
instructions 58546847147625 # 0.43 IPC low
instructions 19516296439585 # 77.326 l2 access per 1000 inst
l2 hit from l1 1160015123136 # 26.15% l2 miss
l2 miss from l1 126776457921 #
l2 hit from l2 pf 81216020790 #
l3 hit from l2 pf 60953233587 #
l3 miss from l2 pf 206927874633 #
instructions 19507693906773 # 278.072 float per 1000 inst
float 512 220 # 0.000 AVX-512 per 1000 inst
float 256 42784064091 # 2.193 AVX-256 per 1000 inst
float 128 5381755505920 # 275.879 AVX-128 per 1000 inst
float MMX 0 # 0.000 MMX per 1000 inst
float scalar 3 # 0.000 scalar per 1000 inst
instructions 58532587204494 #
opcache 9384365925865 # 160.327 opcache per 1000 inst
opcache miss 287236589767 # 3.1% opcache miss rate
l1 dTLB miss 186729836520 # 3.190 L1 dTLB per 1000 inst
l2 dTLB miss 13005504047 # 0.222 L2 dTLB per 1000 inst
instructions 58532845110136 #
icache 380662582609 # 6.503 icache per 1000 inst
icache miss 103844206789 # 27.3% icache miss rate
l1 iTLB miss 1719496360 # 0.029 L1 iTLB per 1000 inst
l2 iTLB miss 0 # 0.000 L2 iTLB per 1000 inst
tlb flush 613828 # 0.000 TLB flush per 1000 inst
Process overview shows time spent in wrf_r_base.mev
691 processes
48 wrf_r_base.mev- 30711.47 40.93
71 specperl 25.07 6.50
48 diffwrf_521_bas 4.92 0.33
2 clang 0.02 0.00
2 flang 0.01 0.02
1 lsb_release 0.01 0.00
10 ps 0.00 0.01
224 sh 0.00 0.00
54 specrxp 0.00 0.00
48 bash 0.00 0.00
41 specinvoke 0.00 0.00
22 cat 0.00 0.00
21 grep 0.00 0.00
12 uniq 0.00 0.00
11 sort 0.00 0.00
10 expand 0.00 0.00
7 specmake 0.00 0.00
6 pwd 0.00 0.00
5 basename 0.00 0.00
5 systemctl 0.00 0.00
4 rm 0.00 0.00
4 specpp 0.00 0.00
4 uname 0.00 0.00
3 dirname 0.00 0.00
3 dmidecode 0.00 0.00
3 lscpu 0.00 0.00
2 df 0.00 0.00
2 dpkg 0.00 0.00
2 runcpu 0.00 0.00
2 specsha512sum 0.00 0.00
2 specxz 0.00 0.00
2 who 0.00 0.00
1 cpupower 0.00 0.00
1 head 0.00 0.00
1 logname 0.00 0.00
1 ls 0.00 0.00
1 numactl 0.00 0.00
1 sysctl 0.00 0.00
1 w 0.00 0.00
1 wc 0.00 0.00
1 which 0.00 0.00
1 processes running
54 maximum processes
specinvoke fires off separate copies on each logical core.
420317) specinvoke cpu=1 start=4.39 finish=652.66
420319) sh cpu=7 start=4.40 finish=648.02
420330) bash cpu=0 start=4.40 finish=648.02
420353) wrf_r_base.mev- cpu=0 start=4.40 finish=648.00
420320) ?? cpu=0 start=4.40 finish=0.00
420329) bash cpu=1 start=4.40 finish=643.64
420355) wrf_r_base.mev- cpu=1 start=4.40 finish=643.60
420321) sh cpu=12 start=4.40 finish=646.09
420331) bash cpu=2 start=4.40 finish=646.08
420352) wrf_r_base.mev- cpu=2 start=4.40 finish=646.06
420322) sh cpu=12 start=4.40 finish=652.66
420333) bash cpu=3 start=4.40 finish=652.66
420351) wrf_r_base.mev- cpu=3 start=4.40 finish=652.65
420323) sh cpu=8 start=4.40 finish=644.40
420334) bash cpu=4 start=4.40 finish=644.40
420354) wrf_r_base.mev- cpu=4 start=4.40 finish=644.38
420324) sh cpu=8 start=4.40 finish=650.22
420343) bash cpu=5 start=4.40 finish=650.22
420357) wrf_r_base.mev- cpu=5 start=4.40 finish=650.21
420325) sh cpu=1 start=4.40 finish=646.64
420344) bash cpu=6 start=4.40 finish=646.64
420358) wrf_r_base.mev- cpu=6 start=4.40 finish=646.62
420326) sh cpu=8 start=4.40 finish=646.89
420336) bash cpu=7 start=4.40 finish=646.89
420359) wrf_r_base.mev- cpu=7 start=4.40 finish=646.88
420327) sh cpu=14 start=4.40 finish=643.50
420338) bash cpu=8 start=4.40 finish=643.50
420363) wrf_r_base.mev- cpu=8 start=4.40 finish=643.46
420328) sh cpu=1 start=4.40 finish=643.75
420340) bash cpu=9 start=4.40 finish=643.75
420356) wrf_r_base.mev- cpu=9 start=4.40 finish=643.72
420332) sh cpu=1 start=4.40 finish=646.08
420341) bash cpu=10 start=4.40 finish=646.08
420360) wrf_r_base.mev- cpu=10 start=4.40 finish=646.06
420335) sh cpu=12 start=4.40 finish=652.19
420346) bash cpu=11 start=4.40 finish=652.19
420361) wrf_r_base.mev- cpu=11 start=4.40 finish=652.17
420337) sh cpu=12 start=4.40 finish=643.75
420347) bash cpu=12 start=4.40 finish=643.75
420362) wrf_r_base.mev- cpu=12 start=4.40 finish=643.72
420339) sh cpu=0 start=4.40 finish=649.31
420348) bash cpu=13 start=4.40 finish=649.31
420364) wrf_r_base.mev- cpu=13 start=4.40 finish=649.29
420342) sh cpu=14 start=4.40 finish=643.44
420349) bash cpu=14 start=4.40 finish=643.44
420365) wrf_r_base.mev- cpu=14 start=4.40 finish=643.39
420345) sh cpu=15 start=4.40 finish=646.88
420350) bash cpu=15 start=4.40 finish=646.88
420366) wrf_r_base.mev- cpu=15 start=4.40 finish=646.85
