A C++ implementation of OpenAI Whisper model for audio transcription. Three different models are used to transcribe the same audio file. Looks like the workload runs in parallel on half the cores. The AMD processor does over 2.5x faster overall on this workload.

Topdown profile shows it is dominated by backend stalls and that frontend stalls are low. A very similar profile is found with llama.cpp – written by the same author.

AMD profile shows half the cores busy. There is some floating point, though not as much as other fp codes. There are reasonable number of L2 misses.
elapsed 5530.727
on_cpu 0.475 # 7.60 / 16 cores
utime 41948.223
stime 74.837
nvcsw 88182 # 21.69%
nivcsw 318356 # 78.31%
inblock 121448 # 21.96/sec
onblock 39528 # 7.15/sec
cpu-clock 43185206747693 # 43185.207 seconds
task-clock 43185307800638 # 43185.308 seconds
page faults 4596579 # 106.438/sec
context switches 433591 # 10.040/sec
cpu migrations 66000 # 1.528/sec
major page faults 3 # 0.000/sec
minor page faults 4596576 # 106.438/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 43085660001008 # 117.442 branches per 1000 inst
branch misses 37668494486 # 0.09% branch miss
conditional 42866388416712 # 116.844 conditional branches per 1000 inst
indirect 45204891247 # 0.123 indirect branches per 1000 inst
cpu-cycles 170523008924934 # 1.93 GHz
instructions 360130923953209 # 2.11 IPC
slots 353078813046786 #
retiring 114261747623360 # 32.4% (32.4%)
-- ucode 687310766168 # 0.2%
-- fastpath 113574436857192 # 32.2%
frontend 10740260701037 # 3.0% ( 3.0%) low
-- latency 5933745440010 # 1.7%
-- bandwidth 4806515261027 # 1.4%
backend 226502059685765 # 64.2% (64.2%)
-- cpu 54930607366266 # 15.6%
-- memory 171571452319499 # 48.6%
speculation 1399502085660 # 0.4% ( 0.4%) low
-- branch mispredict 911735370202 # 0.3%
-- pipeline restart 487766715458 # 0.1%
smt-contention 175114447106 # 0.0% ( 0.0%)
cpu-cycles 170419137834019 # 1.93 GHz
instructions 360038148851712 # 2.11 IPC
instructions 122240539336491 # 78.098 l2 access per 1000 inst
l2 hit from l1 5310074428252 # 39.15% l2 miss
l2 miss from l1 207073886950 #
l2 hit from l2 pf 705880443601 #
l3 hit from l2 pf 3106777843787 #
l3 miss from l2 pf 424027854896 #
instructions 122178608709053 # 66.991 float per 1000 inst
float 512 75 # 0.000 AVX-512 per 1000 inst
float 256 672 # 0.000 AVX-256 per 1000 inst
float 128 8184827126616 # 66.991 AVX-128 per 1000 inst
float MMX 0 # 0.000 MMX per 1000 inst
float scalar 0 # 0.000 scalar per 1000 inst
Intel metrics
elapsed 14368.253
on_cpu 0.730 # 11.68 / 16 cores
utime 167717.180
stime 147.268
nvcsw 97481 # 10.82%
nivcsw 803547 # 89.18%
inblock 6863912 # 477.71/sec
onblock 25736 # 1.79/sec
cpu-clock 169666851070848 # 169666.851 seconds
task-clock 169668782732666 # 169668.783 seconds
page faults 5269925 # 31.060/sec
context switches 970642 # 5.721/sec
cpu migrations 262405 # 1.547/sec
major page faults 33 # 0.000/sec
minor page faults 5269892 # 31.060/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 227128281463420 # 208.515 branches per 1000 inst
branch misses 29121652723 # 0.01% branch miss
conditional 227128311572956 # 208.515 conditional branches per 1000 ins
t
indirect 22920041459410 # 21.042 indirect branches per 1000 inst
slots 1150135161525986 #
retiring 651783482407376 # 56.7% (56.7%) high
-- ucode 4936987792977 # 0.4%
-- fastpath 646846494614399 # 56.2%
frontend 16180417417749 # 1.4% ( 1.4%) low
-- latency 8941995996169 # 0.8%
-- bandwidth 7238421421580 # 0.6%
backend 479717990945457 # 41.7% (41.7%)
-- cpu 383188217770082 # 33.3%
-- memory 96529773175375 # 8.4%
speculation 2980965978192 # 0.3% ( 0.3%) low
-- branch mispredict 657433004912 # 0.1%
-- pipeline restart 2323532973280 # 0.2%
smt-contention 0 # 0.0% ( 0.0%)
cpu-cycles 245659047612452 # 1.06 GHz
instructions 1011662256267344 # 4.12 IPC high
l2 access 8328352125465 # 10.041 l2 access per 1000 inst
l2 miss 4426583244304 # 53.15% l2 miss
The process profile includes almost 500,000 processes.
496963 processes
496713 main 1507285.76 2012754.91
34 clinfo 9.26 3.99
19 vulkaninfo 0.76 0.57
2 vulkani:disk$0 0.08 0.06
6 clang 0.05 0.07
3 glxinfo:gdrv0 0.05 0.06
3 glxinfo:gl0 0.05 0.06
1 llvmpipe-0 0.04 0.03
1 llvmpipe-1 0.04 0.03
1 llvmpipe-10 0.04 0.03
1 llvmpipe-11 0.04 0.03
1 llvmpipe-12 0.04 0.03
1 llvmpipe-13 0.04 0.03
1 llvmpipe-14 0.04 0.03
1 llvmpipe-15 0.04 0.03
1 llvmpipe-2 0.04 0.03
1 llvmpipe-3 0.04 0.03
1 llvmpipe-4 0.04 0.03
1 llvmpipe-5 0.04 0.03
1 llvmpipe-6 0.04 0.03
1 llvmpipe-7 0.04 0.03
1 llvmpipe-8 0.04 0.03
1 llvmpipe-9 0.04 0.03
1 glxinfo 0.03 0.02
1 glxinfo:cs0 0.03 0.02
1 glxinfo:disk$0 0.03 0.02
1 glxinfo:sh0 0.03 0.02
1 glxinfo:shlo0 0.03 0.02
1 ps 0.00 0.01
62 sh 0.00 0.00
13 gcc 0.00 0.00
11 gsettings 0.00 0.00
8 systemd-detect- 0.00 0.00
7 stat 0.00 0.00
7 whisper-cpp 0.00 0.00
6 llvm-link 0.00 0.00
4 phoronix-test-s 0.00 0.00
3 gmain 0.00 0.00
2 which 0.00 0.00
1 cc 0.00 0.00
1 date 0.00 0.00
1 dconf worker 0.00 0.00
1 dirname 0.00 0.00
1 grep 0.00 0.00
1 ifconfig 0.00 0.00
1 ip 0.00 0.00
1 lscpu 0.00 0.00
1 mktemp 0.00 0.00
1 qdbus 0.00 0.00
1 readlink 0.00 0.00
1 realpath 0.00 0.00
1 sed 0.00 0.00
1 sort 0.00 0.00
1 stty 0.00 0.00
1 systemctl 0.00 0.00
1 template.sh 0.00 0.00
1 wc 0.00 0.00
1 xrandr 0.00 0.00
1 xset 0.00 0.00
18 processes running
47 maximum processes
