Test of OpenVino with Intel internal tests. There is a sequence of 18 different tests with different profiles as show below but overall high amounts of backend memory waiting.

AMD metrics. One thing that surprises me is not as much floating point code as I might expect. Also not many branches.
elapsed 3666.089
on_cpu 0.887 # 14.20 / 16 cores
utime 50536.802
stime 1508.811
nvcsw 25564303 # 63.91%
nivcsw 14438697 # 36.09%
inblock 647168 # 176.53/sec
onblock 450208 # 122.80/sec
cpu-clock 52059503721702 # 52059.504 seconds
task-clock 52065965279085 # 52065.965 seconds
page faults 4182456 # 80.330/sec
context switches 40020993 # 768.659/sec
cpu migrations 3345690 # 64.259/sec
major page faults 3308 # 0.064/sec
minor page faults 4179147 # 80.266/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 6104425916302 # 35.074 branches per 1000 inst
branch misses 146139238875 # 2.39% branch miss
conditional 4259728336297 # 24.475 conditional branches per 1000 inst
indirect 335057737990 # 1.925 indirect branches per 1000 inst
cpu-cycles 210012994377891 # 3.64 GHz
instructions 170500080670606 # 0.81 IPC
slots 419981201012916 #
retiring 59508125166069 # 14.2% (18.2%)
-- ucode 162907627794 # 0.0%
-- fastpath 59345217538275 # 14.1%
frontend 20743298000928 # 4.9% ( 6.3%)
-- latency 16183935494766 # 3.9%
-- bandwidth 4559362506162 # 1.1%
backend 246776556780422 # 58.8% (75.3%)
-- cpu 167073230539072 # 39.8%
-- memory 79703326241350 # 19.0%
speculation 774774235160 # 0.2% ( 0.2%)
-- branch mispredict 732884814485 # 0.2%
-- pipeline restart 41889420675 # 0.0%
smt-contention 92175643332376 # 21.9% ( 0.0%)
cpu-cycles 209906778717120 # 3.64 GHz
instructions 170496426937503 # 0.81 IPC
instructions 56819543515048 # 105.906 l2 access per 1000 inst
l2 hit from l1 4611962526922 # 11.12% l2 miss
l2 miss from l1 247542205697 #
l2 hit from l2 pf 984103822417 #
l3 hit from l2 pf 329461554612 #
l3 miss from l2 pf 91984854375 #
instructions 56801900395044 # 36.633 float per 1000 inst
float 512 139 # 0.000 AVX-512 per 1000 inst
float 256 388543342299 # 6.840 AVX-256 per 1000 inst
float 128 1692275011173 # 29.793 AVX-128 per 1000 inst
float MMX 0 # 0.000 MMX per 1000 inst
float scalar 0 # 0.000 scalar per 1000 inst
Corresponding Intel metrics
elapsed 4364.999
on_cpu 0.888 # 14.20 / 16 cores
utime 61143.155
stime 844.689
nvcsw 15956416 # 71.35%
nivcsw 6406167 # 28.65%
inblock 1066104 # 244.24/sec
onblock 450688 # 103.25/sec
cpu-clock 61996971778983 # 61996.972 seconds
task-clock 62000399512424 # 62000.400 seconds
page faults 7556280 # 121.875/sec
context switches 22384091 # 361.031/sec
cpu migrations 5449898 # 87.901/sec
major page faults 5233 # 0.084/sec
minor page faults 7551047 # 121.790/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 5124546089326 # 22.836 branches per 1000 inst
branch misses 19662667907 # 0.38% branch miss
conditional 5124546172878 # 22.836 conditional branches per 1000 inst
indirect 1443260482309 # 6.432 indirect branches per 1000 inst
slots 238516915623770 #
retiring 117670618634802 # 49.3% (49.3%)
-- ucode 5510128010357 # 2.3%
-- fastpath 112160490624445 # 47.0%
frontend 53552026560725 # 22.5% (22.5%)
-- latency 46173376494573 # 19.4%
-- bandwidth 7378650066152 # 3.1%
backend 65661215027393 # 27.5% (27.5%)
-- cpu 31552095091589 # 13.2%
-- memory 34109119935804 # 14.3%
speculation 3334009366321 # 1.4% ( 1.4%)
-- branch mispredict 2300226126067 # 1.0%
-- pipeline restart 1033783240254 # 0.4%
smt-contention 0 # 0.0% ( 0.0%)
cpu-cycles 154893255896222 # 2.54 GHz
instructions 231596341850754 # 1.50 IPC
l2 access 6182143166384 # 51.532 l2 access per 1000 inst
l2 miss 1297190616085 # 20.98% l2 miss
Most the time all in the benchmark_app. There was still some mix op processes hanging so don’t have all the processes here, though it is close in elapsed time.
2111 processes
1757 benchmark_app 1624694.47 39658.01
32 clinfo 5.12 1.92
19 vulkaninfo 0.38 0.39
3 glxinfo:gdrv0 0.05 0.03
2 vulkani:disk$0 0.04 0.05
6 clang 0.03 0.03
1 glxinfo 0.03 0.01
1 glxinfo:cs0 0.03 0.01
1 glxinfo:disk$0 0.03 0.01
1 glxinfo:sh0 0.03 0.01
1 glxinfo:shlo0 0.03 0.01
1 llvmpipe-0 0.02 0.02
1 llvmpipe-1 0.02 0.02
1 llvmpipe-10 0.02 0.02
1 llvmpipe-11 0.02 0.02
1 llvmpipe-12 0.02 0.02
1 llvmpipe-13 0.02 0.02
1 llvmpipe-14 0.02 0.02
1 llvmpipe-15 0.02 0.02
1 llvmpipe-2 0.02 0.02
1 llvmpipe-3 0.02 0.02
1 llvmpipe-4 0.02 0.02
1 llvmpipe-5 0.02 0.02
1 llvmpipe-6 0.02 0.02
1 llvmpipe-7 0.02 0.02
1 llvmpipe-8 0.02 0.02
1 llvmpipe-9 0.02 0.02
99 sh 0.00 0.00
54 openvino 0.00 0.00
13 gcc 0.00 0.00
9 stty 0.00 0.00
8 systemd-detect- 0.00 0.00
7 gsettings 0.00 0.00
7 stat 0.00 0.00
6 llvm-link 0.00 0.00
5 gmain 0.00 0.00
4 phoronix-test-s 0.00 0.00
3 dconf worker 0.00 0.00
2 which 0.00 0.00
1 cc 0.00 0.00
1 date 0.00 0.00
1 dirname 0.00 0.00
1 grep 0.00 0.00
1 ifconfig 0.00 0.00
1 ip 0.00 0.00
1 lscpu 0.00 0.00
1 mktemp 0.00 0.00
1 ps 0.00 0.00
1 python3 0.00 0.00
1 qdbus 0.00 0.00
1 readlink 0.00 0.00
1 realpath 0.00 0.00
1 sed 0.00 0.00
1 sort 0.00 0.00
1 systemctl 0.00 0.00
1 template.sh 0.00 0.00
1 wc 0.00 0.00
1 xrandr 0.00 0.00
1 xset 0.00 0.00
34 processes running
68 maximum processes
Relatively straightforward block when the benchmark runs.
59339) openvino cpu=3 start=5.96 finish=68.21
59340) benchmark_app cpu=2 start=5.96 finish=68.17
59341) benchmark_app cpu=12 start=5.99 finish=68.17
59342) benchmark_app cpu=13 start=5.99 finish=68.17
59343) benchmark_app cpu=14 start=5.99 finish=68.17
59344) benchmark_app cpu=10 start=5.99 finish=68.17
59345) benchmark_app cpu=9 start=5.99 finish=68.17
59346) benchmark_app cpu=8 start=5.99 finish=68.17
59347) benchmark_app cpu=11 start=5.99 finish=68.17
59348) benchmark_app cpu=7 start=5.99 finish=68.17
59349) benchmark_app cpu=0 start=5.99 finish=68.17
59350) benchmark_app cpu=4 start=5.99 finish=68.17
59351) benchmark_app cpu=5 start=5.99 finish=68.17
59352) benchmark_app cpu=6 start=5.99 finish=68.17
59353) benchmark_app cpu=2 start=5.99 finish=68.17
59354) benchmark_app cpu=3 start=5.99 finish=68.17
59355) benchmark_app cpu=15 start=5.99 finish=68.17
59356) benchmark_app cpu=12 start=6.38 finish=68.16
59369) benchmark_app cpu=6 start=7.21 finish=68.17
59372) benchmark_app cpu=10 start=7.21 finish=68.17
59357) benchmark_app cpu=8 start=6.38 finish=68.16
59364) benchmark_app cpu=7 start=6.79 finish=68.17
59365) benchmark_app cpu=9 start=6.79 finish=68.17
59368) benchmark_app cpu=15 start=7.21 finish=68.17
59358) benchmark_app cpu=0 start=6.38 finish=68.16
59361) benchmark_app cpu=5 start=6.78 finish=68.17
59363) benchmark_app cpu=0 start=6.78 finish=68.17
59366) benchmark_app cpu=11 start=6.79 finish=68.17
59367) benchmark_app cpu=3 start=7.21 finish=68.17
59371) benchmark_app cpu=14 start=7.21 finish=68.17
59362) benchmark_app cpu=13 start=6.78 finish=68.17
59359) benchmark_app cpu=4 start=6.38 finish=68.16
59370) benchmark_app cpu=4 start=7.21 finish=68.17
59360) benchmark_app cpu=0 start=6.39 finish=68.16
