A recurrent neural network (RNN) being used to remove noise from an audio file. This is a single-threaded test that runs quickly within a minute.

Topdown profile shows a high retirement rate with some backend stalls.

AMD metrics show this is floating point code, has a lot L2 access rate. The backend stalls are CPU and not memory. The IPC is also high and the on-code is only one thread.
elapsed 63.856
on_cpu 0.049 # 0.78 / 16 cores
utime 48.308
stime 1.258
nvcsw 2249 # 78.14%
nivcsw 629 # 21.86%
inblock 0 # 0.00/sec
onblock 830800 # 13010.62/sec
cpu-clock 49580676026 # 49.581 seconds
task-clock 49583458884 # 49.583 seconds
page faults 153052 # 3086.755/sec
context switches 2981 # 60.121/sec
cpu migrations 283 # 5.708/sec
major page faults 2 # 0.040/sec
minor page faults 153050 # 3086.715/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 59793040149 # 86.519 branches per 1000 inst
branch misses 530140420 # 0.89% branch miss
conditional 58646138320 # 84.860 conditional branches per 1000 inst
indirect 73098614 # 0.106 indirect branches per 1000 inst
cpu-cycles 223501986208 # 0.22 GHz
instructions 687431622162 # 3.08 IPC high
slots 450174913080 #
retiring 227142696458 # 50.5% (50.5%)
-- ucode 18076186 # 0.0%
-- fastpath 227124620272 # 50.5%
frontend 38449157594 # 8.5% ( 8.5%)
-- latency 21249828816 # 4.7%
-- bandwidth 17199328778 # 3.8%
backend 165199650470 # 36.7% (36.7%)
-- cpu 156259197366 # 34.7%
-- memory 8940453104 # 2.0%
speculation 19315582910 # 4.3% ( 4.3%)
-- branch mispredict 13384756102 # 3.0%
-- pipeline restart 5930826808 # 1.3%
smt-contention 67545062 # 0.0% ( 0.0%)
cpu-cycles 223849769349 # 0.22 GHz
instructions 688448773654 # 3.08 IPC high
instructions 230555151663 # 5.204 l2 access per 1000 inst
l2 hit from l1 910722479 # 2.37% l2 miss
l2 miss from l1 16628528 #
l2 hit from l2 pf 277196298 #
l3 hit from l2 pf 4811504 #
l3 miss from l2 pf 7019572 #
instructions 230161618213 # 287.307 float per 1000 inst
float 512 60 # 0.000 AVX-512 per 1000 inst
float 256 694 # 0.000 AVX-256 per 1000 inst
float 128 66126936685 # 287.307 AVX-128 per 1000 inst
float MMX 0 # 0.000 MMX per 1000 inst
float scalar 0 # 0.000 scalar per 1000 inst
instructions 2658571 #
opcache 977408 # 367.644 opcache per 1000 inst
opcache miss 526815 # 53.9% opcache miss rate
l1 dTLB miss 4675 # 1.758 L1 dTLB per 1000 inst
l2 dTLB miss 977 # 0.367 L2 dTLB per 1000 inst
instructions 2711875 #
icache 1319384 # 486.521 icache per 1000 inst
icache miss 110969 # 8.4% icache miss rate
l1 iTLB miss 13 # 0.005 L1 iTLB per 1000 inst
l2 iTLB miss 0 # 0.000 L2 iTLB per 1000 inst
tlb flush 19 # 0.007 TLB flush per 1000 inst
Intel metrics
elapsed 70.816
on_cpu 0.050 # 0.80 / 16 cores
utime 55.772
stime 0.869
nvcsw 2614 # 87.22%
nivcsw 383 # 12.78%
inblock 912 # 12.88/sec
onblock 819560 # 11573.16/sec
cpu-clock 56644354100 # 56.644 seconds
task-clock 56647446066 # 56.647 seconds
page faults 142227 # 2510.740/sec
context switches 3141 # 55.448/sec
cpu migrations 317 # 5.596/sec
major page faults 0 # 0.000/sec
minor page faults 142227 # 2510.740/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 59376435112 # 86.107 branches per 1000 inst
branch misses 520249709 # 0.88% branch miss
conditional 59376448392 # 86.107 conditional branches per 1000 inst
indirect 86449665 # 0.125 indirect branches per 1000 inst
slots 1277820930584 #
retiring 697388803314 # 54.6% (54.6%) high
-- ucode 58189167796 # 4.6%
-- fastpath 639199635518 # 50.0%
frontend 277314084053 # 21.7% (21.7%)
-- latency 132884455940 # 10.4%
-- bandwidth 144429628113 # 11.3%
backend 164291231386 # 12.9% (12.9%) low
-- cpu 126996458023 # 9.9%
-- memory 37294773363 # 2.9%
speculation 140493052981 # 11.0% (11.0%) high
-- branch mispredict 61574068334 # 4.8%
-- pipeline restart 78918984647 # 6.2%
smt-contention 0 # 0.0% ( 0.0%)
cpu-cycles 213165050359 # 0.19 GHz
instructions 689591636128 # 3.24 IPC high
l2 access 2089755683 # 3.031 l2 access per 1000 inst
l2 miss 118515140 # 5.67% l2 miss
Process overview invokes the rnnoise_demo program. It runs fast enough that the test harness includes ~1/4 of the total user time.
396 processes
30 rnnoise_demo 47.13 0.52
67 clinfo 16.34 5.90
38 vulkaninfo 0.95 1.33
4 vulkani:disk$0 0.10 0.14
6 glxinfo:gdrv0 0.08 0.10
6 glxinfo:gl0 0.08 0.10
6 clang 0.06 0.06
2 llvmpipe-0 0.05 0.07
2 llvmpipe-1 0.05 0.07
2 llvmpipe-10 0.05 0.07
2 llvmpipe-11 0.05 0.07
2 llvmpipe-12 0.05 0.07
2 llvmpipe-13 0.05 0.07
2 llvmpipe-14 0.05 0.07
2 llvmpipe-15 0.05 0.07
2 llvmpipe-2 0.05 0.07
2 llvmpipe-3 0.05 0.07
2 llvmpipe-4 0.05 0.07
2 llvmpipe-5 0.05 0.07
2 llvmpipe-6 0.05 0.07
2 llvmpipe-7 0.05 0.07
2 llvmpipe-8 0.05 0.07
2 llvmpipe-9 0.05 0.07
2 glxinfo 0.04 0.04
2 glxinfo:cs0 0.04 0.04
2 glxinfo:disk$0 0.04 0.04
2 glxinfo:sh0 0.04 0.04
2 glxinfo:shlo0 0.04 0.04
6 php 0.03 0.10
3 rocminfo 0.03 0.00
1 lspci 0.00 0.02
1 rm 0.00 0.01
83 sh 0.00 0.00
13 gcc 0.00 0.00
12 gsettings 0.00 0.00
10 sed 0.00 0.00
8 stat 0.00 0.00
8 systemd-detect- 0.00 0.00
6 llvm-link 0.00 0.00
5 phoronix-test-s 0.00 0.00
3 gmain 0.00 0.00
3 ls 0.00 0.00
3 rnnoise 0.00 0.00
2 cc 0.00 0.00
2 lscpu 0.00 0.00
2 uname 0.00 0.00
2 which 0.00 0.00
2 xset 0.00 0.00
1 bash 0.00 0.00
1 date 0.00 0.00
1 dconf worker 0.00 0.00
1 dirname 0.00 0.00
1 dmesg 0.00 0.00
1 dmidecode 0.00 0.00
1 grep 0.00 0.00
1 ifconfig 0.00 0.00
1 ip 0.00 0.00
1 lsmod 0.00 0.00
1 mktemp 0.00 0.00
1 ps 0.00 0.00
1 qdbus 0.00 0.00
1 readlink 0.00 0.00
1 realpath 0.00 0.00
1 sort 0.00 0.00
1 stty 0.00 0.00
1 systemctl 0.00 0.00
1 template.sh 0.00 0.00
1 wc 0.00 0.00
1 xrandr 0.00 0.00
0 processes running
47 maximum processes
Computation structure
204018) rnnoise cpu=0 start=5.59 finish=21.43
204019) rnnoise_demo cpu=3 start=5.59 finish=21.43
204020) rnnoise_demo cpu=15 start=5.59 finish=5.59
204021) rnnoise_demo cpu=12 start=5.59 finish=5.59
204022) rnnoise_demo cpu=5 start=5.59 finish=5.59
204023) rnnoise_demo cpu=2 start=5.59 finish=5.60
204024) rnnoise_demo cpu=1 start=5.59 finish=5.59
204025) sed cpu=14 start=5.60 finish=5.60
204026) rnnoise_demo cpu=12 start=5.60 finish=5.60
204027) ls cpu=5 start=5.60 finish=5.60
204028) sed cpu=15 start=5.60 finish=5.60
204029) rnnoise_demo cpu=9 start=5.60 finish=5.60
204030) rnnoise_demo cpu=2 start=5.60 finish=5.61
204031) rnnoise_demo cpu=14 start=5.61 finish=5.61
204032) sed cpu=0 start=5.61 finish=5.61
