ncnn is a neural network framework from Tencent. This test works through a dozen different neural networks first on CPU and then on Vulkan GPU. Each individual case runs quickly so that ~60 second period only runs on average a neural net for five seconds before going to the next one. It looks like we are taking advantage of all cores.

Topdown profile shows on average we are dominated by backend stalls, except for a few of the last network being slightly less so. The CPU and GPU profiles are similar, so perhaps my “Vulkan GPU” tests really are CPU tests as well.

AMD metrics show a workload high in backend stalls and low in both retiring rate and frontend stalls. There are few floating point instructions (using int8?).
elapsed 1946.111
on_cpu 0.924 # 14.78 / 16 cores
utime 28665.305
stime 97.446
nvcsw 1458053 # 82.31%
nivcsw 313322 # 17.69%
inblock 0 # 0.00/sec
onblock 17848 # 9.17/sec
cpu-clock 28772710292348 # 28772.710 seconds
task-clock 28773501421019 # 28773.501 seconds
page faults 17841468 # 620.066/sec
context switches 1780869 # 61.893/sec
cpu migrations 715 # 0.025/sec
major page faults 2079 # 0.072/sec
minor page faults 17839389 # 619.994/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 7400706707040 # 129.148 branches per 1000 inst
branch misses 22112539866 # 0.30% branch miss
conditional 7306720045682 # 127.508 conditional branches per 1000 inst
indirect 9788450464 # 0.171 indirect branches per 1000 inst
cpu-cycles 24435176519386 # 3.85 GHz
instructions 11449051374995 # 0.47 IPC low
slots 48868759359042 #
retiring 4017392153178 # 8.2% (11.1%) low
-- ucode 52206341384 # 0.1%
-- fastpath 3965185811794 # 8.1%
frontend 1521688467294 # 3.1% ( 4.2%) low
-- latency 584270471004 # 1.2%
-- bandwidth 937417996290 # 1.9%
backend 30426520761785 # 62.3% (84.4%) high
-- cpu 15311691089077 # 31.3%
-- memory 15114829672708 # 30.9%
speculation 83894665674 # 0.2% ( 0.2%) low
-- branch mispredict 64506051329 # 0.1%
-- pipeline restart 19388614345 # 0.0%
smt-contention 12819183408457 # 26.2% ( 0.0%)
cpu-cycles 24319189058245 # 3.85 GHz
instructions 11439573868940 # 0.47 IPC low
instructions 3817148970093 # 129.638 l2 access per 1000 inst
l2 hit from l1 313108322322 # 28.43% l2 miss
l2 miss from l1 15187002493 #
l2 hit from l2 pf 56261875456 #
l3 hit from l2 pf 109448230684 #
l3 miss from l2 pf 16030874460 #
instructions 3814302904658 # 15.070 float per 1000 inst
float 512 62 # 0.000 AVX-512 per 1000 inst
float 256 8826 # 0.000 AVX-256 per 1000 inst
float 128 57480143661 # 15.070 AVX-128 per 1000 inst
float MMX 0 # 0.000 MMX per 1000 inst
float scalar 0 # 0.000 scalar per 1000 inst
Intel metrics
elapsed 2809.052
on_cpu 0.949 # 15.18 / 16 cores
utime 42567.071
stime 76.882
nvcsw 2200133 # 85.22%
nivcsw 381476 # 14.78%
inblock 8432 # 3.00/sec
onblock 4912 # 1.75/sec
cpu-clock 42647720312779 # 42647.720 seconds
task-clock 42648149083383 # 42648.149 seconds
page faults 18818807 # 441.257/sec
context switches 2595429 # 60.857/sec
cpu migrations 61584 # 1.444/sec
major page faults 1942 # 0.046/sec
minor page faults 18816865 # 441.212/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 4804748523665 # 49.759 branches per 1000 inst
branch misses 22716229759 # 0.47% branch miss
conditional 4804748550129 # 49.759 conditional branches per 1000 inst
indirect 2009633101318 # 20.812 indirect branches per 1000 inst
slots 235806293789726 #
retiring 48622809170505 # 20.6% (20.6%)
-- ucode 3412287001594 # 1.4%
-- fastpath 45210522168911 # 19.2%
frontend 17158999336433 # 7.3% ( 7.3%)
-- latency 13837393028606 # 5.9%
-- bandwidth 3321606307827 # 1.4%
backend 168582773381581 # 71.5% (71.5%) high
-- cpu 124294100887241 # 52.7%
-- memory 44288672494340 # 18.8%
speculation 1609099917873 # 0.7% ( 0.7%) low
-- branch mispredict 1470207468290 # 0.6%
-- pipeline restart 138892449583 # 0.1%
smt-contention 0 # 0.0% ( 0.0%)
cpu-cycles 90957267456764 # 2.96 GHz
instructions 63982684727203 # 0.70 IPC
l2 access 2072680943817 # 59.690 l2 access per 1000 inst
l2 miss 855274244675 # 41.26% l2 miss
Process overview is simile with 96 benchcnn processes.
452 processes
96 benchncnn 91381.12 284.48
68 clinfo 16.28 6.65
38 vulkaninfo 0.95 1.34
6 glxinfo:gdrv0 0.15 0.00
6 glxinfo:gl0 0.15 0.00
6 php 0.14 0.08
4 vulkani:disk$0 0.10 0.15
2 glxinfo 0.08 0.00
2 glxinfo:cs0 0.08 0.00
2 glxinfo:disk$0 0.08 0.00
2 glxinfo:sh0 0.08 0.00
2 glxinfo:shlo0 0.08 0.00
6 clang 0.06 0.06
2 llvmpipe-0 0.05 0.08
2 llvmpipe-10 0.05 0.08
2 llvmpipe-11 0.05 0.08
2 llvmpipe-12 0.05 0.08
2 llvmpipe-13 0.05 0.08
2 llvmpipe-14 0.05 0.08
2 llvmpipe-15 0.05 0.08
2 llvmpipe-2 0.05 0.08
2 llvmpipe-3 0.05 0.08
2 llvmpipe-4 0.05 0.08
2 llvmpipe-5 0.05 0.08
2 llvmpipe-6 0.05 0.08
2 llvmpipe-7 0.05 0.08
2 llvmpipe-8 0.05 0.08
2 llvmpipe-9 0.05 0.08
2 llvmpipe-1 0.05 0.07
3 rocminfo 0.03 0.00
1 lspci 0.00 0.02
1 ps 0.00 0.01
84 sh 0.00 0.00
13 gcc 0.00 0.00
9 gsettings 0.00 0.00
8 stat 0.00 0.00
8 systemd-detect- 0.00 0.00
6 llvm-link 0.00 0.00
6 ncnn 0.00 0.00
5 phoronix-test-s 0.00 0.00
4 gmain 0.00 0.00
3 dconf worker 0.00 0.00
2 cc 0.00 0.00
2 lscpu 0.00 0.00
2 uname 0.00 0.00
2 which 0.00 0.00
2 xset 0.00 0.00
1 date 0.00 0.00
1 dirname 0.00 0.00
1 dmesg 0.00 0.00
1 dmidecode 0.00 0.00
1 grep 0.00 0.00
1 ifconfig 0.00 0.00
1 ip 0.00 0.00
1 lsmod 0.00 0.00
1 mktemp 0.00 0.00
1 qdbus 0.00 0.00
1 readlink 0.00 0.00
1 realpath 0.00 0.00
1 sed 0.00 0.00
1 sort 0.00 0.00
1 stty 0.00 0.00
1 systemctl 0.00 0.00
1 template.sh 0.00 0.00
1 wc 0.00 0.00
1 xrandr 0.00 0.00
0 processes running
47 maximum processes
The core computation blocks
1129043) ncnn cpu=12 start=5.50 finish=66.00
1129044) benchncnn cpu=6 start=5.50 finish=66.00
1129045) benchncnn cpu=0 start=5.51 finish=66.00
1129046) benchncnn cpu=11 start=5.51 finish=66.00
1129047) benchncnn cpu=9 start=5.51 finish=66.00
1129048) benchncnn cpu=10 start=5.51 finish=66.00
1129049) benchncnn cpu=15 start=5.51 finish=66.00
1129050) benchncnn cpu=14 start=5.51 finish=66.00
1129051) benchncnn cpu=4 start=5.51 finish=66.00
1129052) benchncnn cpu=13 start=5.51 finish=66.00
1129053) benchncnn cpu=7 start=5.51 finish=66.00
1129054) benchncnn cpu=8 start=5.51 finish=66.00
1129055) benchncnn cpu=3 start=5.51 finish=66.00
1129056) benchncnn cpu=1 start=5.51 finish=66.00
1129057) benchncnn cpu=2 start=5.51 finish=66.00
1129058) benchncnn cpu=12 start=5.51 finish=66.00
1129059) benchncnn cpu=5 start=5.51 finish=66.00
