An accelerator test including OpenMP, CUDA and OpenCL Five tests are OpenMP and two are OpenCL. The OpenCL fail so this is really five subtests.These tests take some time to settle down and at least one is single-threaded.

Topdown profile shows differences in the profiles with a few higher retirement rates and others with more backend stalls.

AMD metrics show floating point code, not much L2 access and the backend stalls are more CPU than memory. The number of branches is moderate.
elapsed 2493.966
on_cpu 0.540 # 8.64 / 16 cores
utime 21518.722
stime 21.889
nvcsw 30584 # 13.26%
nivcsw 200012 # 86.74%
inblock 0 # 0.00/sec
onblock 1021736 # 409.68/sec
cpu-clock 21542010047079 # 21542.010 seconds
task-clock 21542202986873 # 21542.203 seconds
page faults 8858867 # 411.233/sec
context switches 242783 # 11.270/sec
cpu migrations 7688 # 0.357/sec
major page faults 47 # 0.002/sec
minor page faults 8858820 # 411.231/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 16114747147990 # 101.889 branches per 1000 inst
branch misses 19153578824 # 0.12% branch miss
conditional 10609186092769 # 67.079 conditional branches per 1000 inst
indirect 1695742623714 # 10.722 indirect branches per 1000 inst
cpu-cycles 87215208815204 # 2.22 GHz
instructions 178678953072578 # 2.05 IPC
slots 174423368262540 #
retiring 62401214102905 # 35.8% (56.0%) high
-- ucode 131586618110 # 0.1%
-- fastpath 62269627484795 # 35.7%
frontend 2321571740608 # 1.3% ( 2.1%) low
-- latency 706428625452 # 0.4%
-- bandwidth 1615143115156 # 0.9%
backend 46480039019914 # 26.6% (41.7%)
-- cpu 37908047673805 # 21.7%
-- memory 8571991346109 # 4.9%
speculation 299016986173 # 0.2% ( 0.3%) low
-- branch mispredict 281309161062 # 0.2%
-- pipeline restart 17707825111 # 0.0%
smt-contention 62921325104833 # 36.1% ( 0.0%)
cpu-cycles 74673715961135 # 2.10 GHz
instructions 152456531522489 # 2.04 IPC
instructions 50818766143828 # 7.328 l2 access per 1000 inst
l2 hit from l1 196442306356 # 23.41% l2 miss
l2 miss from l1 13403485931 #
l2 hit from l2 pf 102187178346 #
l3 hit from l2 pf 45428238002 #
l3 miss from l2 pf 28327299005 #
instructions 50821318031341 # 323.549 float per 1000 inst
float 512 97 # 0.000 AVX-512 per 1000 inst
float 256 630 # 0.000 AVX-256 per 1000 inst
float 128 16443198436737 # 323.549 AVX-128 per 1000 inst
float MMX 0 # 0.000 MMX per 1000 inst
float scalar 0 # 0.000 scalar per 1000 inst
instructions 2670454 #
opcache 988041 # 369.990 opcache per 1000 inst
opcache miss 532227 # 53.9% opcache miss rate
l1 dTLB miss 5238 # 1.961 L1 dTLB per 1000 inst
l2 dTLB miss 1129 # 0.423 L2 dTLB per 1000 inst
instructions 2699471 #
icache 1306392 # 483.944 icache per 1000 inst
icache miss 110562 # 8.5% icache miss rate
l1 iTLB miss 13 # 0.005 L1 iTLB per 1000 inst
l2 iTLB miss 0 # 0.000 L2 iTLB per 1000 inst
tlb flush 22 # 0.008 TLB flush per 1000 inst
Intel metrics
elapsed 1645.268
on_cpu 0.715 # 11.44 / 16 cores
utime 18811.581
stime 15.025
nvcsw 28926 # 16.89%
nivcsw 142292 # 83.11%
inblock 559936 # 340.33/sec
onblock 230440 # 140.06/sec
cpu-clock 18826064492000 # 18826.064 seconds
task-clock 18826151799361 # 18826.152 seconds
page faults 8281354 # 439.886/sec
context switches 179211 # 9.519/sec
cpu migrations 19260 # 1.023/sec
major page faults 85 # 0.005/sec
minor page faults 8281269 # 439.881/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 13193076197482 # 112.251 branches per 1000 inst
branch misses 16326408207 # 0.12% branch miss
conditional 13193076222218 # 112.251 conditional branches per 1000 inst
indirect 4434825753165 # 37.733 indirect branches per 1000 inst
slots 97563657619520 #
retiring 59925790784692 # 61.4% (61.4%) high
-- ucode 3166614337055 # 3.2%
-- fastpath 56759176447637 # 58.2%
frontend 10212256156300 # 10.5% (10.5%)
-- latency 8543550138460 # 8.8%
-- bandwidth 1668706017840 # 1.7%
backend 26403767081345 # 27.1% (27.1%)
-- cpu 14690616659153 # 15.1%
-- memory 11713150422192 # 12.0%
speculation 815731012482 # 0.8% ( 0.8%) low
-- branch mispredict 760996322204 # 0.8%
-- pipeline restart 54734690278 # 0.1%
smt-contention 0 # 0.0% ( 0.0%)
cpu-cycles 50707273870129 # 1.92 GHz
instructions 105254985989712 # 2.08 IPC
l2 access 223186481571 # 3.799 l2 access per 1000 inst
l2 miss 76102210380 # 34.10% l2 miss
Process overview shows each process with a name. Looks like LavaMD took a while to settle and consumes the largest share.
979 processes
48 lavaMD 152420.00 109.60
240 3D 67068.45 21.12
48 leukocyte 48273.12 105.60
48 euler3d_cpu_dou 17002.74 51.68
48 sc_omp 12816.32 3.52
204 clinfo 58.31 18.29
38 vulkaninfo 1.33 1.33
9 OCL_particlefil 0.26 0.27
9 myocyte.out 0.20 0.26
4 vulkani:disk$0 0.14 0.14
6 glxinfo:gdrv0 0.14 0.07
6 glxinfo:gl0 0.14 0.07
6 php 0.13 0.37
2 llvmpipe-0 0.07 0.07
2 llvmpipe-1 0.07 0.07
2 llvmpipe-10 0.07 0.07
2 llvmpipe-11 0.07 0.07
2 llvmpipe-12 0.07 0.07
2 llvmpipe-13 0.07 0.07
2 llvmpipe-14 0.07 0.07
2 llvmpipe-15 0.07 0.07
2 llvmpipe-2 0.07 0.07
2 llvmpipe-3 0.07 0.07
2 llvmpipe-4 0.07 0.07
2 llvmpipe-5 0.07 0.07
2 llvmpipe-6 0.07 0.07
2 llvmpipe-7 0.07 0.07
2 llvmpipe-8 0.07 0.07
2 llvmpipe-9 0.07 0.07
2 glxinfo 0.07 0.04
2 glxinfo:cs0 0.06 0.03
2 glxinfo:disk$0 0.06 0.03
2 glxinfo:sh0 0.06 0.03
2 glxinfo:shlo0 0.06 0.03
6 clang 0.04 0.04
3 rocminfo 0.03 0.00
1 lspci 0.01 0.02
94 sh 0.00 0.00
33 rodinia 0.00 0.00
13 gcc 0.00 0.00
9 gsettings 0.00 0.00
8 stat 0.00 0.00
8 systemd-detect- 0.00 0.00
6 llvm-link 0.00 0.00
5 phoronix-test-s 0.00 0.00
4 gmain 0.00 0.00
3 dconf worker 0.00 0.00
2 cc 0.00 0.00
2 lscpu 0.00 0.00
2 uname 0.00 0.00
2 which 0.00 0.00
2 xset 0.00 0.00
1 date 0.00 0.00
1 dirname 0.00 0.00
1 dmesg 0.00 0.00
1 dmidecode 0.00 0.00
1 grep 0.00 0.00
1 ifconfig 0.00 0.00
1 ip 0.00 0.00
1 lsmod 0.00 0.00
1 mktemp 0.00 0.00
1 ps 0.00 0.00
1 qdbus 0.00 0.00
1 readlink 0.00 0.00
1 realpath 0.00 0.00
1 sed 0.00 0.00
1 sort 0.00 0.00
1 stty 0.00 0.00
1 systemctl 0.00 0.00
1 template.sh 0.00 0.00
1 wc 0.00 0.00
1 xrandr 0.00 0.00
0 processes running
47 maximum processes
Example computation blocks
222891) rodinia cpu=2 start=6.65 finish=211.78
222892) lavaMD cpu=1 start=6.65 finish=211.78
222895) lavaMD cpu=13 start=11.00 finish=211.78
222896) lavaMD cpu=8 start=11.00 finish=211.78
222897) lavaMD cpu=15 start=11.00 finish=211.78
222898) lavaMD cpu=0 start=11.00 finish=211.78
222899) lavaMD cpu=9 start=11.00 finish=211.78
222900) lavaMD cpu=10 start=11.00 finish=211.78
222901) lavaMD cpu=12 start=11.00 finish=211.78
222902) lavaMD cpu=11 start=11.00 finish=211.78
222903) lavaMD cpu=5 start=11.00 finish=211.78
222904) lavaMD cpu=6 start=11.00 finish=211.78
222905) lavaMD cpu=3 start=11.00 finish=211.78
222906) lavaMD cpu=2 start=11.00 finish=211.78
222907) lavaMD cpu=14 start=11.00 finish=211.78
222908) lavaMD cpu=7 start=11.00 finish=211.78
222909) lavaMD cpu=4 start=11.00 finish=211.78
