bwaves is a SPEC CPU(R) benchmark described here. This Fortran workload runs consistently on all logical cores.

Topdown profile shows this as a backend-bound workload.

AMD metrics for a 7840 processor confirms a backend-bound and particularly memory-bound process. The L2 access rate is ~133 per 1000 instructions and half of these are misses. There are very few branches. Approximately 1/4 of the instructions are floating point.
elapsed 4632.332
on_cpu 0.990 # 15.85 / 16 cores
utime 73220.919
stime 181.889
nvcsw 96986 # 11.93%
nivcsw 715815 # 88.07%
inblock 0 # 0.00/sec
onblock 21864 # 4.72/sec
cpu-clock 73420734604801 # 73420.735 seconds
task-clock 73422148131842 # 73422.148 seconds
page faults 41516197 # 565.445/sec
context switches 811557 # 11.053/sec
cpu migrations 521 # 0.007/sec
major page faults 1855 # 0.025/sec
minor page faults 41514342 # 565.420/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 833760239357 # 19.172 branches per 1000 inst
branch misses 11492503371 # 1.38% branch miss
conditional 651248492065 # 14.975 conditional branches per 1000 inst
indirect 60811077930 # 1.398 indirect branches per 1000 inst
cpu-cycles 332144753197546 # 4.47 GHz
instructions 43479218645688 # 0.13 IPC low
slots 664161383718216 #
retiring 14604192045965 # 2.2% ( 2.3%) low
-- ucode 3068808550 # 0.0%
-- fastpath 14601123237415 # 2.2%
frontend 8982526992280 # 1.4% ( 1.4%) low
-- latency 7953765511854 # 1.2%
-- bandwidth 1028761480426 # 0.2%
backend 619986995494093 # 93.3% (96.3%) high
-- cpu 45180566558648 # 6.8%
-- memory 574806428935445 # 86.5%
speculation 473785697223 # 0.1% ( 0.1%) low
-- branch mispredict 231332716080 # 0.0%
-- pipeline restart 242452981143 # 0.0%
smt-contention 20113692014147 # 3.0% ( 0.0%)
cpu-cycles 332893296051422 # 4.47 GHz
instructions 43483758640393 # 0.13 IPC low
instructions 14498863832586 # 132.988 l2 access per 1000 inst
l2 hit from l1 1503844106419 # 49.20% l2 miss
l2 miss from l1 707659574769 #
l2 hit from l2 pf 183385007792 #
l3 hit from l2 pf 9300133264 #
l3 miss from l2 pf 231652380877 #
instructions 14491614030746 # 260.237 float per 1000 inst
float 512 580 # 0.000 AVX-512 per 1000 inst
float 256 2082 # 0.000 AVX-256 per 1000 inst
float 128 3771247626425 # 260.237 AVX-128 per 1000 inst
float MMX 0 # 0.000 MMX per 1000 inst
float scalar 2 # 0.000 scalar per 1000 inst
instructions 43471275755268 #
opcache 5086289745292 # 117.003 opcache per 1000 inst
opcache miss 495051701654 # 9.7% opcache miss rate
l1 dTLB miss 79670864160 # 1.833 L1 dTLB per 1000 inst
l2 dTLB miss 48345344515 # 1.112 L2 dTLB per 1000 inst
instructions 43470965049493 #
icache 600576925212 # 13.816 icache per 1000 inst
icache miss 39215141772 # 6.5% icache miss rate
l1 iTLB miss 578741019 # 0.013 L1 iTLB per 1000 inst
l2 iTLB miss 0 # 0.000 L2 iTLB per 1000 inst
tlb flush 272150 # 0.000 TLB flush per 1000 inst
The process overview shows bwaves_r as primary process with the small amount of rest being the spec harness.
1016 processes
144 bwaves_r_base.m 53541.39 89.25
142 specperl 23.53 4.73
1 lsb_release 0.01 0.00
33 specinvoke 0.00 0.09
144 bash 0.00 0.03
10 ps 0.00 0.02
1 flang 0.00 0.02
348 sh 0.00 0.00
21 grep 0.00 0.00
20 cat 0.00 0.00
12 uniq 0.00 0.00
11 sort 0.00 0.00
10 expand 0.00 0.00
6 pwd 0.00 0.00
5 basename 0.00 0.00
5 specmake 0.00 0.00
5 specrxp 0.00 0.00
5 systemctl 0.00 0.00
4 specpp 0.00 0.00
4 uname 0.00 0.00
3 dirname 0.00 0.00
3 dmidecode 0.00 0.00
3 lscpu 0.00 0.00
2 df 0.00 0.00
2 dpkg 0.00 0.00
2 rm 0.00 0.00
2 runcpu 0.00 0.00
2 specsha512sum 0.00 0.00
2 specxz 0.00 0.00
2 who 0.00 0.00
1 cpupower 0.00 0.00
1 head 0.00 0.00
1 logname 0.00 0.00
1 ls 0.00 0.00
1 numactl 0.00 0.00
1 sysctl 0.00 0.00
1 w 0.00 0.00
1 wc 0.00 0.00
1 which 0.00 0.00
53 processes running
53 maximum processes
Specinvoke fires up separate processes for each core.
356101) specinvoke cpu=7 start=3.24 finish=1546.35
356103) sh cpu=0 start=3.24 finish=295.94
356109) bash cpu=0 start=3.24 finish=295.94
356135) bwaves_r_base.m cpu=0 start=3.24 finish=295.69
356104) sh cpu=1 start=3.24 finish=289.88
356110) bash cpu=1 start=3.24 finish=289.88
356133) bwaves_r_base.m cpu=1 start=3.24 finish=289.62
356105) sh cpu=2 start=3.24 finish=294.24
356114) bash cpu=2 start=3.24 finish=294.24
356137) bwaves_r_base.m cpu=2 start=3.24 finish=294.02
356106) sh cpu=3 start=3.24 finish=294.03
356117) bash cpu=3 start=3.24 finish=294.03
356140) bwaves_r_base.m cpu=3 start=3.24 finish=293.80
356107) sh cpu=4 start=3.24 finish=287.40
356115) bash cpu=4 start=3.24 finish=287.39
356138) bwaves_r_base.m cpu=4 start=3.24 finish=287.17
356108) sh cpu=5 start=3.24 finish=293.24
356113) bash cpu=5 start=3.24 finish=293.23
356139) bwaves_r_base.m cpu=5 start=3.24 finish=292.99
356111) sh cpu=6 start=3.24 finish=303.56
356120) bash cpu=6 start=3.24 finish=303.56
356141) bwaves_r_base.m cpu=6 start=3.24 finish=303.33
356112) sh cpu=7 start=3.24 finish=295.06
356122) bash cpu=7 start=3.24 finish=295.06
356142) bwaves_r_base.m cpu=7 start=3.24 finish=294.80
356116) sh cpu=8 start=3.24 finish=291.15
356125) bash cpu=8 start=3.24 finish=291.15
356143) bwaves_r_base.m cpu=8 start=3.25 finish=290.90
356118) sh cpu=9 start=3.24 finish=291.24
356127) bash cpu=9 start=3.24 finish=291.24
356147) bwaves_r_base.m cpu=9 start=3.25 finish=291.03
356119) sh cpu=10 start=3.24 finish=296.70
356128) bash cpu=10 start=3.24 finish=296.70
356149) bwaves_r_base.m cpu=10 start=3.25 finish=296.45
356121) sh cpu=11 start=3.24 finish=294.39
356130) bash cpu=11 start=3.24 finish=294.39
356148) bwaves_r_base.m cpu=11 start=3.25 finish=294.19
356123) sh cpu=12 start=3.24 finish=288.30
356131) bash cpu=12 start=3.24 finish=288.30
356146) bwaves_r_base.m cpu=12 start=3.25 finish=288.11
356124) sh cpu=13 start=3.24 finish=293.74
356132) bash cpu=13 start=3.24 finish=293.74
356145) bwaves_r_base.m cpu=13 start=3.25 finish=293.56
356126) sh cpu=14 start=3.24 finish=298.69
356134) bash cpu=14 start=3.24 finish=298.69
356144) bwaves_r_base.m cpu=14 start=3.25 finish=298.44
356129) sh cpu=15 start=3.24 finish=302.56
356136) bash cpu=15 start=3.24 finish=302.56
356150) bwaves_r_base.m cpu=15 start=3.25 finish=302.30
