povray is a SPEC CPU(R) benchmark written in C and C++ and described here. The workload runs on all logical cores.

Topdown profile shows a high retirement rate with some backend stalls.

AMD metrics confirms this runs on all cores. The backend stalls are a mixture of memory and CPU. There is a moderate level of L2 access and almost no L2 misses.
elapsed 1207.797
on_cpu 0.975 # 15.60 / 16 cores
utime 18835.978
stime 7.810
nvcsw 26880 # 13.67%
nivcsw 169808 # 86.33%
inblock 0 # 0.00/sec
onblock 1750920 # 1449.68/sec
cpu-clock 18843959086829 # 18843.959 seconds
task-clock 18844041233899 # 18844.041 seconds
page faults 1200044 # 63.683/sec
context switches 196017 # 10.402/sec
cpu migrations 187 # 0.010/sec
major page faults 848 # 0.045/sec
minor page faults 1199196 # 63.638/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 20597032052711 # 157.689 branches per 1000 inst
branch misses 52441407427 # 0.25% branch miss
conditional 14324263755382 # 109.665 conditional branches per 1000 inst
indirect 1423684510013 # 10.900 indirect branches per 1000 inst
cpu-cycles 72261524343074 # 3.78 GHz
instructions 130633370200321 # 1.81 IPC
slots 144529562538048 #
retiring 45790610884003 # 31.7% (52.4%)
-- ucode 452654600211 # 0.3%
-- fastpath 45337956283792 # 31.4%
frontend 5742560763746 # 4.0% ( 6.6%)
-- latency 3599231892750 # 2.5%
-- bandwidth 2143328870996 # 1.5%
backend 34057388037444 # 23.6% (38.9%)
-- cpu 16179137901330 # 11.2%
-- memory 17878250136114 # 12.4%
speculation 1877350050617 # 1.3% ( 2.1%)
-- branch mispredict 1551312091401 # 1.1%
-- pipeline restart 326037959216 # 0.2%
smt-contention 57061496732727 # 39.5% ( 0.0%)
cpu-cycles 72310084126317 # 3.76 GHz
instructions 130637928317245 # 1.81 IPC
instructions 43547668810124 # 63.187 l2 access per 1000 inst
l2 hit from l1 2423517159190 # 0.06% l2 miss
l2 miss from l1 873514352 #
l2 hit from l2 pf 327458307998 #
l3 hit from l2 pf 606029657 #
l3 miss from l2 pf 59682300 #
instructions 43529508664435 # 244.832 float per 1000 inst
float 512 289 # 0.000 AVX-512 per 1000 inst
float 256 2219972971 # 0.051 AVX-256 per 1000 inst
float 128 10655211619067 # 244.781 AVX-128 per 1000 inst
float MMX 0 # 0.000 MMX per 1000 inst
float scalar 0 # 0.000 scalar per 1000 inst
instructions 130617322132691 #
opcache 20744462326405 # 158.819 opcache per 1000 inst
opcache miss 771836069050 # 3.7% opcache miss rate
l1 dTLB miss 658886665708 # 5.044 L1 dTLB per 1000 inst
l2 dTLB miss 8891149600 # 0.068 L2 dTLB per 1000 inst
instructions 130617394601927 #
icache 1101558221297 # 8.433 icache per 1000 inst
icache miss 230432987096 # 20.9% icache miss rate
l1 iTLB miss 73327277129 # 0.561 L1 iTLB per 1000 inst
l2 iTLB miss 0 # 0.000 L2 iTLB per 1000 inst
tlb flush 105996 # 0.000 TLB flush per 1000 inst
Process overview shows almost all time spent in povray_r_base.m
691 processes
48 povray_r_base.m 18713.39 2.26
71 specperl 41.33 1.60
48 imagevalidate_5 8.77 1.21
2 clang++ 0.02 0.01
2 clang 0.01 0.01
10 ps 0.00 0.01
225 sh 0.00 0.00
54 specrxp 0.00 0.00
48 bash 0.00 0.00
41 specinvoke 0.00 0.00
22 cat 0.00 0.00
21 grep 0.00 0.00
12 uniq 0.00 0.00
11 sort 0.00 0.00
10 expand 0.00 0.00
7 specmake 0.00 0.00
6 pwd 0.00 0.00
5 basename 0.00 0.00
5 systemctl 0.00 0.00
4 rm 0.00 0.00
4 specpp 0.00 0.00
4 uname 0.00 0.00
3 dirname 0.00 0.00
3 dmidecode 0.00 0.00
3 lscpu 0.00 0.00
2 df 0.00 0.00
2 dpkg 0.00 0.00
2 runcpu 0.00 0.00
2 specsha512sum 0.00 0.00
2 specxz 0.00 0.00
2 who 0.00 0.00
1 cpupower 0.00 0.00
1 head 0.00 0.00
1 logname 0.00 0.00
1 ls 0.00 0.00
1 lsb_release 0.00 0.00
1 numactl 0.00 0.00
1 sysctl 0.00 0.00
1 w 0.00 0.00
1 wc 0.00 0.00
1 which 0.00 0.00
0 processes running
53 maximum processes
specinvoke fires up separate processes for each logical core.
400740) specinvoke cpu=2 start=3.81 finish=395.96
400742) sh cpu=13 start=3.81 finish=393.80
400749) bash cpu=0 start=3.81 finish=393.80
400775) povray_r_base.m cpu=0 start=3.81 finish=393.79
400743) sh cpu=2 start=3.81 finish=393.56
400752) bash cpu=1 start=3.81 finish=393.56
400776) povray_r_base.m cpu=1 start=3.81 finish=393.56
400744) sh cpu=2 start=3.81 finish=391.13
400751) bash cpu=2 start=3.81 finish=391.13
400777) povray_r_base.m cpu=2 start=3.81 finish=391.13
400745) sh cpu=15 start=3.81 finish=393.58
400753) bash cpu=3 start=3.81 finish=393.58
400774) povray_r_base.m cpu=3 start=3.81 finish=393.58
400746) sh cpu=2 start=3.81 finish=392.22
400759) bash cpu=4 start=3.81 finish=392.22
400779) povray_r_base.m cpu=4 start=3.81 finish=392.22
400747) sh cpu=8 start=3.81 finish=395.02
400754) bash cpu=5 start=3.81 finish=395.02
400778) povray_r_base.m cpu=5 start=3.81 finish=395.02
400748) sh cpu=9 start=3.81 finish=395.96
400757) bash cpu=6 start=3.81 finish=395.96
400780) povray_r_base.m cpu=6 start=3.82 finish=395.96
400750) sh cpu=7 start=3.81 finish=393.05
400764) bash cpu=7 start=3.81 finish=393.05
400782) povray_r_base.m cpu=7 start=3.82 finish=393.05
400755) sh cpu=9 start=3.81 finish=394.66
400765) bash cpu=8 start=3.81 finish=394.65
400781) povray_r_base.m cpu=8 start=3.82 finish=394.65
400756) sh cpu=2 start=3.81 finish=393.30
400763) bash cpu=9 start=3.81 finish=393.30
400783) povray_r_base.m cpu=9 start=3.82 finish=393.30
400758) sh cpu=12 start=3.81 finish=394.22
400768) bash cpu=10 start=3.81 finish=394.22
400784) povray_r_base.m cpu=10 start=3.82 finish=394.22
400760) sh cpu=2 start=3.81 finish=392.57
400769) bash cpu=11 start=3.81 finish=392.57
400787) povray_r_base.m cpu=11 start=3.82 finish=392.57
400761) sh cpu=2 start=3.81 finish=393.60
400770) bash cpu=12 start=3.81 finish=393.60
400785) povray_r_base.m cpu=12 start=3.82 finish=393.60
400762) sh cpu=15 start=3.81 finish=393.13
400771) bash cpu=13 start=3.81 finish=393.13
400789) povray_r_base.m cpu=13 start=3.82 finish=393.13
400766) sh cpu=10 start=3.81 finish=395.03
400772) bash cpu=14 start=3.81 finish=395.03
400786) povray_r_base.m cpu=14 start=3.82 finish=395.03
400767) sh cpu=2 start=3.81 finish=393.01
400773) bash cpu=15 start=3.81 finish=393.01
400788) povray_r_base.m cpu=15 start=3.82 finish=393.01
