The Darmstadt Automotive Parallel Heterogeneous Benchmark Suite that tries benchmarks with OpenCL and OpenMP for automotive benchmarks. The OpenCL ones do not run. Most of these appear to be single-threaded despite the OpenMP indicator.

Topdown profile shows mix of retiring slots and backend stalls.

AMD metrics are a composite of the above showing not much floating point and low L2 access.
elapsed 647.523
on_cpu 0.143 # 2.29 / 16 cores
utime 1195.537
stime 284.194
nvcsw 1342214 # 9.73%
nivcsw 12450912 # 90.27%
inblock 277614544 # 428733.21/sec
onblock 35936 # 55.50/sec
cpu-clock 1480594180040 # 1480.594 seconds
task-clock 1481141930675 # 1481.142 seconds
page faults 84864057 # 57296.371/sec
context switches 13796149 # 9314.535/sec
cpu migrations 326886 # 220.699/sec
major page faults 150 # 0.101/sec
minor page faults 84863907 # 57296.269/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 1589971314596 # 177.095 branches per 1000 inst
branch misses 22901058854 # 1.44% branch miss
conditional 1003628468174 # 111.787 conditional branches per 1000 inst
indirect 161161234244 # 17.951 indirect branches per 1000 inst
cpu-cycles 6094697416325 # 0.59 GHz
instructions 8898725499100 # 1.46 IPC
slots 12168710168130 #
retiring 3221769683765 # 26.5% (32.7%)
-- ucode 25931388043 # 0.2%
-- fastpath 3195838295722 # 26.3%
frontend 2229448759160 # 18.3% (22.6%)
-- latency 1339052845638 # 11.0%
-- bandwidth 890395913522 # 7.3%
backend 4376772220333 # 36.0% (44.4%)
-- cpu 1570743126155 # 12.9%
-- memory 2806029094178 # 23.1%
speculation 32943244673 # 0.3% ( 0.3%) low
-- branch mispredict 32596017992 # 0.3%
-- pipeline restart 347226681 # 0.0%
smt-contention 2307718181346 # 19.0% ( 0.0%)
cpu-cycles 6142429904000 # 0.59 GHz
instructions 8913235202843 # 1.45 IPC
instructions 2969698292677 # 9.421 l2 access per 1000 inst
l2 hit from l1 19591167482 # 25.90% l2 miss
l2 miss from l1 1643216687 #
l2 hit from l2 pf 2783547056 #
l3 hit from l2 pf 1420250892 #
l3 miss from l2 pf 4182540997 #
instructions 2964104532232 # 58.608 float per 1000 inst
float 512 73 # 0.000 AVX-512 per 1000 inst
float 256 844 # 0.000 AVX-256 per 1000 inst
float 128 173721135233 # 58.608 AVX-128 per 1000 inst
float MMX 0 # 0.000 MMX per 1000 inst
float scalar 0 # 0.000 scalar per 1000 inst
instructions 8915527065582 #
opcache 1346487074649 # 151.027 opcache per 1000 inst
opcache miss 134602372282 # 10.0% opcache miss rate
l1 dTLB miss 2340482149 # 0.263 L1 dTLB per 1000 inst
l2 dTLB miss 614271217 # 0.069 L2 dTLB per 1000 inst
instructions 8904278828800 #
icache 310867012549 # 34.912 icache per 1000 inst
icache miss 9574737682 # 3.1% icache miss rate
l1 iTLB miss 60611547 # 0.007 L1 iTLB per 1000 inst
l2 iTLB miss 0 # 0.000 L2 iTLB per 1000 inst
tlb flush 239333 # 0.000 TLB flush per 1000 inst
Intel metrics
elapsed 1436.480
on_cpu 0.128 # 2.04 / 16 cores
utime 2636.012
stime 300.586
nvcsw 2152216 # 98.57%
nivcsw 31297 # 1.43%
inblock 554279288 # 385859.26/sec
onblock 2368 # 1.65/sec
cpu-clock 2930803533535 # 2930.804 seconds
task-clock 2931453232627 # 2931.453 seconds
page faults 120740962 # 41188.091/sec
context switches 2190497 # 747.239/sec
cpu migrations 12032 # 4.104/sec
major page faults 76 # 0.026/sec
minor page faults 120740886 # 41188.065/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 2498902996927 # 176.736 branches per 1000 inst
branch misses 1064616276 # 0.04% branch miss
conditional 2498903022559 # 176.736 conditional branches per 1000 inst
indirect 399913304330 # 28.284 indirect branches per 1000 inst
slots 41640024301754 #
retiring 8457987971296 # 20.3% (20.3%)
-- ucode 1333390304549 # 3.2%
-- fastpath 7124597666747 # 17.1%
frontend 5693102775992 # 13.7% (13.7%)
-- latency 3332728754447 # 8.0%
-- bandwidth 2360374021545 # 5.7%
backend 26553295817784 # 63.8% (63.8%)
-- cpu 22615742223392 # 54.3%
-- memory 3937553594392 # 9.5%
speculation 617223955381 # 1.5% ( 1.5%)
-- branch mispredict 342222801500 # 0.8%
-- pipeline restart 275001153881 # 0.7%
smt-contention 0 # 0.0% ( 0.0%)
cpu-cycles 17570217701955 # 0.46 GHz
instructions 31250072553799 # 1.78 IPC
l2 access 148699094152 # 7.007 l2 access per 1000 inst
l2 miss 96697170846 # 65.03% l2 miss
cpu-cycles 7587612284914 # 10.3% memory latency
load stalls 566094003027 # 4.0% l1 bound
l1 miss 261284310669 # 1.3% l2 bound
l2 miss 164160337975 # 0.9% l3 bound
l3 miss 93582661571 # 1.2% dram bound
store_stalls 217319060957 # 2.9% store bound
Process overview shows kernel is the primary driver
764 processes
246 kernel 23220.31 7341.70
204 clinfo 50.68 21.18
38 vulkaninfo 1.31 1.33
4 vulkani:disk$0 0.14 0.14
6 php 0.09 0.20
2 llvmpipe-0 0.07 0.07
2 llvmpipe-1 0.07 0.07
2 llvmpipe-10 0.07 0.07
2 llvmpipe-11 0.07 0.07
2 llvmpipe-12 0.07 0.07
2 llvmpipe-13 0.07 0.07
2 llvmpipe-14 0.07 0.07
2 llvmpipe-15 0.07 0.07
2 llvmpipe-2 0.07 0.07
2 llvmpipe-3 0.07 0.07
2 llvmpipe-4 0.07 0.07
2 llvmpipe-5 0.07 0.07
2 llvmpipe-6 0.07 0.07
2 llvmpipe-7 0.07 0.07
2 llvmpipe-8 0.07 0.07
2 llvmpipe-9 0.07 0.07
6 glxinfo:gdrv0 0.06 0.12
6 glxinfo:gl0 0.06 0.12
6 clang 0.06 0.05
2 glxinfo 0.04 0.04
2 glxinfo:cs0 0.04 0.04
2 glxinfo:disk$0 0.04 0.04
2 glxinfo:sh0 0.04 0.04
2 glxinfo:shlo0 0.04 0.04
3 rocminfo 0.00 0.03
1 lspci 0.00 0.02
92 sh 0.00 0.00
24 daphne 0.00 0.00
13 gcc 0.00 0.00
9 gsettings 0.00 0.00
8 stat 0.00 0.00
8 systemd-detect- 0.00 0.00
6 llvm-link 0.00 0.00
5 phoronix-test-s 0.00 0.00
4 gmain 0.00 0.00
3 dconf worker 0.00 0.00
2 cc 0.00 0.00
2 lscpu 0.00 0.00
2 uname 0.00 0.00
2 which 0.00 0.00
2 xset 0.00 0.00
1 date 0.00 0.00
1 dirname 0.00 0.00
1 dmesg 0.00 0.00
1 dmidecode 0.00 0.00
1 grep 0.00 0.00
1 ifconfig 0.00 0.00
1 ip 0.00 0.00
1 lsmod 0.00 0.00
1 mktemp 0.00 0.00
1 ps 0.00 0.00
1 qdbus 0.00 0.00
1 readlink 0.00 0.00
1 realpath 0.00 0.00
1 sed 0.00 0.00
1 sort 0.00 0.00
1 stty 0.00 0.00
1 systemctl 0.00 0.00
1 template.sh 0.00 0.00
1 wc 0.00 0.00
1 xrandr 0.00 0.00
0 processes running
47 maximum processes
Computation block
882112) daphne cpu=4 start=24.47 finish=46.59
882113) kernel cpu=5 start=24.47 finish=46.52
882114) kernel cpu=14 start=35.86 finish=46.52
882115) kernel cpu=0 start=35.86 finish=46.52
882116) kernel cpu=8 start=35.86 finish=46.52
882117) kernel cpu=1 start=35.86 finish=46.52
882118) kernel cpu=2 start=35.86 finish=46.52
882119) kernel cpu=15 start=35.86 finish=46.52
882120) kernel cpu=10 start=35.86 finish=46.52
882121) kernel cpu=12 start=35.86 finish=46.52
882122) kernel cpu=6 start=35.86 finish=46.52
882123) kernel cpu=7 start=35.86 finish=46.52
882124) kernel cpu=11 start=35.86 finish=46.52
882125) kernel cpu=9 start=35.86 finish=46.52
882126) kernel cpu=4 start=35.86 finish=46.52
882127) kernel cpu=3 start=35.86 finish=46.52
882128) kernel cpu=13 start=35.86 finish=46.52
