xalancbmk is a SPEC CPU(R) benchmark written in C++ and described here. The workload runs on all logical cores.

Topdown profile shows two different workload regions with the first with higher backend stalls

AMD metrics on 7840 show over 1/4 of instructions are branches and that memory stalls are ~40% of the time.
elapsed 716.422
on_cpu 0.983 # 15.72 / 16 cores
utime 11200.976
stime 62.076
nvcsw 16868 # 14.21%
nivcsw 101844 # 85.79%
inblock 0 # 0.00/sec
onblock 5941288 # 8293.00/sec
cpu-clock 11264034913550 # 11264.035 seconds
task-clock 11264141447655 # 11264.141 seconds
page faults 10479547 # 930.346/sec
context switches 118151 # 10.489/sec
cpu migrations 156 # 0.014/sec
major page faults 1111 # 0.099/sec
minor page faults 10478436 # 930.247/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 10322703995232 # 267.659 branches per 1000 inst
branch misses 34592086095 # 0.34% branch miss
conditional 9047084912417 # 234.583 conditional branches per 1000 inst
indirect 287264629665 # 7.449 indirect branches per 1000 inst
cpu-cycles 49888709082342 # 4.32 GHz
instructions 38564460869272 # 0.77 IPC
slots 99786843927960 #
retiring 12243245308066 # 12.3% (19.0%)
-- ucode 71983786453 # 0.1%
-- fastpath 12171261521613 # 12.2%
frontend 5763112454235 # 5.8% ( 8.9%)
-- latency 3553925492466 # 3.6%
-- bandwidth 2209186961769 # 2.2%
backend 45856010835439 # 46.0% (71.1%) high
-- cpu 2998476011384 # 3.0%
-- memory 42857534824055 # 42.9%
speculation 590816854271 # 0.6% ( 0.9%) low
-- branch mispredict 530059406767 # 0.5%
-- pipeline restart 60757447504 # 0.1%
smt-contention 35333602096762 # 35.4% ( 0.0%)
cpu-cycles 49821898939519 # 4.31 GHz
instructions 38564401740635 # 0.77 IPC
instructions 12856131091504 # 75.229 l2 access per 1000 inst
l2 hit from l1 817320093125 # 15.96% l2 miss
l2 miss from l1 49304847483 #
l2 hit from l2 pf 44749019911 #
l3 hit from l2 pf 33478037235 #
l3 miss from l2 pf 71608977929 #
instructions 12853296360811 # 34.341 float per 1000 inst
float 512 183 # 0.000 AVX-512 per 1000 inst
float 256 334329 # 0.000 AVX-256 per 1000 inst
float 128 441397062278 # 34.341 AVX-128 per 1000 inst
float MMX 0 # 0.000 MMX per 1000 inst
float scalar 7 # 0.000 scalar per 1000 inst
instructions 38562362441417 #
opcache 5403832937953 # 140.132 opcache per 1000 inst
opcache miss 214599945034 # 4.0% opcache miss rate
l1 dTLB miss 245187730687 # 6.358 L1 dTLB per 1000 inst
l2 dTLB miss 7784051070 # 0.202 L2 dTLB per 1000 inst
instructions 38561867880756 #
icache 295547014102 # 7.664 icache per 1000 inst
icache miss 61133657855 # 20.7% icache miss rate
l1 iTLB miss 61677389585 # 1.599 L1 iTLB per 1000 inst
l2 iTLB miss 0 # 0.000 L2 iTLB per 1000 inst
tlb flush 87745 # 0.000 TLB flush per 1000 inst
Process overview shows computation primarily in cpuxaln_r_base
581 processes
48 cpuxalan_r_base 11171.98 53.53
69 specperl 10.35 3.47
1 clang++ 0.01 0.00
1 lsb_release 0.01 0.00
11 ps 0.00 0.01
173 sh 0.00 0.00
54 specrxp 0.00 0.00
48 bash 0.00 0.00
41 specinvoke 0.00 0.00
21 grep 0.00 0.00
20 cat 0.00 0.00
12 uniq 0.00 0.00
11 sort 0.00 0.00
10 expand 0.00 0.00
6 pwd 0.00 0.00
5 basename 0.00 0.00
5 specmake 0.00 0.00
5 systemctl 0.00 0.00
4 specpp 0.00 0.00
4 uname 0.00 0.00
3 dirname 0.00 0.00
3 dmidecode 0.00 0.00
3 lscpu 0.00 0.00
2 df 0.00 0.00
2 dpkg 0.00 0.00
2 rm 0.00 0.00
2 runcpu 0.00 0.00
2 specsha512sum 0.00 0.00
2 specxz 0.00 0.00
2 who 0.00 0.00
1 cpupower 0.00 0.00
1 head 0.00 0.00
1 logname 0.00 0.00
1 ls 0.00 0.00
1 numactl 0.00 0.00
1 sysctl 0.00 0.00
1 w 0.00 0.00
1 wc 0.00 0.00
1 which 0.00 0.00
0 processes running
53 maximum processes
specinvoke fires up separate copies on each logical core.
47048) specinvoke cpu=8 start=3.51 finish=239.04
47050) sh cpu=2 start=3.51 finish=235.55
47056) bash cpu=0 start=3.51 finish=235.54
47082) cpuxalan_r_base cpu=0 start=3.51 finish=235.48
47051) sh cpu=3 start=3.51 finish=237.87
47058) bash cpu=1 start=3.51 finish=237.87
47083) cpuxalan_r_base cpu=1 start=3.51 finish=237.83
47052) sh cpu=2 start=3.51 finish=235.00
47059) bash cpu=2 start=3.51 finish=235.00
47080) cpuxalan_r_base cpu=2 start=3.51 finish=234.93
47053) sh cpu=0 start=3.51 finish=236.37
47060) bash cpu=3 start=3.51 finish=236.37
47081) cpuxalan_r_base cpu=3 start=3.51 finish=236.30
47054) sh cpu=8 start=3.51 finish=236.94
47068) bash cpu=4 start=3.51 finish=236.94
47090) cpuxalan_r_base cpu=4 start=3.52 finish=236.88
47055) sh cpu=4 start=3.51 finish=238.19
47066) bash cpu=5 start=3.51 finish=238.19
47088) cpuxalan_r_base cpu=5 start=3.52 finish=238.15
47057) sh cpu=8 start=3.51 finish=236.78
47063) bash cpu=6 start=3.51 finish=236.77
47086) cpuxalan_r_base cpu=6 start=3.52 finish=236.71
47061) sh cpu=0 start=3.51 finish=239.04
47067) bash cpu=7 start=3.51 finish=239.04
47087) cpuxalan_r_base cpu=7 start=3.52 finish=239.01
47062) sh cpu=8 start=3.51 finish=235.89
47072) bash cpu=8 start=3.51 finish=235.88
47093) cpuxalan_r_base cpu=8 start=3.52 finish=235.83
47064) sh cpu=8 start=3.51 finish=236.63
47071) bash cpu=9 start=3.51 finish=236.63
47089) cpuxalan_r_base cpu=9 start=3.52 finish=236.56
47065) sh cpu=12 start=3.51 finish=237.21
47075) bash cpu=10 start=3.51 finish=237.21
47091) cpuxalan_r_base cpu=10 start=3.52 finish=237.16
47069) sh cpu=15 start=3.51 finish=237.53
47077) bash cpu=11 start=3.51 finish=237.53
47092) cpuxalan_r_base cpu=11 start=3.52 finish=237.49
47070) sh cpu=0 start=3.51 finish=236.97
47078) bash cpu=12 start=3.51 finish=236.97
47094) cpuxalan_r_base cpu=12 start=3.52 finish=236.92
47073) sh cpu=15 start=3.51 finish=237.90
47079) bash cpu=13 start=3.51 finish=237.90
47096) cpuxalan_r_base cpu=13 start=3.52 finish=237.84
47074) sh cpu=12 start=3.51 finish=237.30
47085) bash cpu=14 start=3.51 finish=237.30
47097) cpuxalan_r_base cpu=14 start=3.52 finish=237.26
47076) sh cpu=0 start=3.51 finish=237.23
47084) bash cpu=15 start=3.51 finish=237.23
47095) cpuxalan_r_base cpu=15 start=3.52 finish=237.17
