A recurrent neural network (RNN) being used to remove noise from an audio file. This is a single-threaded test that runs quickly within a minute.

Topdown profile shows a high retirement rate with some backend stalls.

AMD metrics show this is floating point code, has a lot L2 access rate. The backend stalls are CPU and not memory. The IPC is also high and the on-code is only one thread.

elapsed              63.856
on_cpu               0.049          # 0.78 / 16 cores
utime                48.308
stime                1.258
nvcsw                2249           # 78.14%
nivcsw               629            # 21.86%
inblock              0              # 0.00/sec
onblock              830800         # 13010.62/sec
cpu-clock            49580676026    # 49.581 seconds
task-clock           49583458884    # 49.583 seconds
page faults          153052         # 3086.755/sec
context switches     2981           # 60.121/sec
cpu migrations       283            # 5.708/sec
major page faults    2              # 0.040/sec
minor page faults    153050         # 3086.715/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             59793040149    # 86.519 branches per 1000 inst
branch misses        530140420      # 0.89% branch miss
conditional          58646138320    # 84.860 conditional branches per 1000 inst
indirect             73098614       # 0.106 indirect branches per 1000 inst
cpu-cycles           223501986208   # 0.22 GHz
instructions         687431622162   # 3.08 IPC high
slots                450174913080   #
retiring             227142696458   # 50.5% (50.5%)
-- ucode             18076186       #     0.0%
-- fastpath          227124620272   #    50.5%
frontend             38449157594    #  8.5% ( 8.5%)
-- latency           21249828816    #     4.7%
-- bandwidth         17199328778    #     3.8%
backend              165199650470   # 36.7% (36.7%)
-- cpu               156259197366   #    34.7%
-- memory            8940453104     #     2.0%
speculation          19315582910    #  4.3% ( 4.3%)
-- branch mispredict 13384756102    #     3.0%
-- pipeline restart  5930826808     #     1.3%
smt-contention       67545062       #  0.0% ( 0.0%)
cpu-cycles           223849769349   # 0.22 GHz
instructions         688448773654   # 3.08 IPC high
instructions         230555151663   # 5.204 l2 access per 1000 inst
l2 hit from l1       910722479      # 2.37% l2 miss
l2 miss from l1      16628528       #
l2 hit from l2 pf    277196298      #
l3 hit from l2 pf    4811504        #
l3 miss from l2 pf   7019572        #
instructions         230161618213   # 287.307 float per 1000 inst
float 512            60             # 0.000 AVX-512 per 1000 inst
float 256            694            # 0.000 AVX-256 per 1000 inst
float 128            66126936685    # 287.307 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         0              # 0.000 scalar per 1000 inst
instructions         2658571        #
opcache              977408         # 367.644 opcache per 1000 inst
opcache miss         526815         # 53.9% opcache miss rate
l1 dTLB miss         4675           # 1.758 L1 dTLB per 1000 inst
l2 dTLB miss         977            # 0.367 L2 dTLB per 1000 inst
instructions         2711875        #
icache               1319384        # 486.521 icache per 1000 inst
icache miss          110969         #  8.4% icache miss rate
l1 iTLB miss         13             # 0.005 L1 iTLB per 1000 inst
l2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst
tlb flush            19             # 0.007 TLB flush per 1000 inst

Intel metrics

elapsed              70.816
on_cpu               0.050          # 0.80 / 16 cores
utime                55.772
stime                0.869
nvcsw                2614           # 87.22%
nivcsw               383            # 12.78%
inblock              912            # 12.88/sec
onblock              819560         # 11573.16/sec
cpu-clock            56644354100    # 56.644 seconds
task-clock           56647446066    # 56.647 seconds
page faults          142227         # 2510.740/sec
context switches     3141           # 55.448/sec
cpu migrations       317            # 5.596/sec
major page faults    0              # 0.000/sec
minor page faults    142227         # 2510.740/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             59376435112    # 86.107 branches per 1000 inst
branch misses        520249709      # 0.88% branch miss
conditional          59376448392    # 86.107 conditional branches per 1000 inst
indirect             86449665       # 0.125 indirect branches per 1000 inst
slots                1277820930584  #
retiring             697388803314   # 54.6% (54.6%) high
-- ucode             58189167796    #     4.6%
-- fastpath          639199635518   #    50.0%
frontend             277314084053   # 21.7% (21.7%)
-- latency           132884455940   #    10.4%
-- bandwidth         144429628113   #    11.3%
backend              164291231386   # 12.9% (12.9%) low
-- cpu               126996458023   #     9.9%
-- memory            37294773363    #     2.9%
speculation          140493052981   # 11.0% (11.0%) high
-- branch mispredict 61574068334    #     4.8%
-- pipeline restart  78918984647    #     6.2%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           213165050359   # 0.19 GHz
instructions         689591636128   # 3.24 IPC high
l2 access            2089755683     # 3.031 l2 access per 1000 inst
l2 miss              118515140      # 5.67% l2 miss

Process overview invokes the rnnoise_demo program. It runs fast enough that the test harness includes ~1/4 of the total user time.

396 processes
	 30 rnnoise_demo            47.13     0.52
	 67 clinfo                  16.34     5.90
	 38 vulkaninfo               0.95     1.33
	  4 vulkani:disk$0           0.10     0.14
	  6 glxinfo:gdrv0            0.08     0.10
	  6 glxinfo:gl0              0.08     0.10
	  6 clang                    0.06     0.06
	  2 llvmpipe-0               0.05     0.07
	  2 llvmpipe-1               0.05     0.07
	  2 llvmpipe-10              0.05     0.07
	  2 llvmpipe-11              0.05     0.07
	  2 llvmpipe-12              0.05     0.07
	  2 llvmpipe-13              0.05     0.07
	  2 llvmpipe-14              0.05     0.07
	  2 llvmpipe-15              0.05     0.07
	  2 llvmpipe-2               0.05     0.07
	  2 llvmpipe-3               0.05     0.07
	  2 llvmpipe-4               0.05     0.07
	  2 llvmpipe-5               0.05     0.07
	  2 llvmpipe-6               0.05     0.07
	  2 llvmpipe-7               0.05     0.07
	  2 llvmpipe-8               0.05     0.07
	  2 llvmpipe-9               0.05     0.07
	  2 glxinfo                  0.04     0.04
	  2 glxinfo:cs0              0.04     0.04
	  2 glxinfo:disk$0           0.04     0.04
	  2 glxinfo:sh0              0.04     0.04
	  2 glxinfo:shlo0            0.04     0.04
	  6 php                      0.03     0.10
	  3 rocminfo                 0.03     0.00
	  1 lspci                    0.00     0.02
	  1 rm                       0.00     0.01
	 83 sh                       0.00     0.00
	 13 gcc                      0.00     0.00
	 12 gsettings                0.00     0.00
	 10 sed                      0.00     0.00
	  8 stat                     0.00     0.00
	  8 systemd-detect-          0.00     0.00
	  6 llvm-link                0.00     0.00
	  5 phoronix-test-s          0.00     0.00
	  3 gmain                    0.00     0.00
	  3 ls                       0.00     0.00
	  3 rnnoise                  0.00     0.00
	  2 cc                       0.00     0.00
	  2 lscpu                    0.00     0.00
	  2 uname                    0.00     0.00
	  2 which                    0.00     0.00
	  2 xset                     0.00     0.00
	  1 bash                     0.00     0.00
	  1 date                     0.00     0.00
	  1 dconf worker             0.00     0.00
	  1 dirname                  0.00     0.00
	  1 dmesg                    0.00     0.00
	  1 dmidecode                0.00     0.00
	  1 grep                     0.00     0.00
	  1 ifconfig                 0.00     0.00
	  1 ip                       0.00     0.00
	  1 lsmod                    0.00     0.00
	  1 mktemp                   0.00     0.00
	  1 ps                       0.00     0.00
	  1 qdbus                    0.00     0.00
	  1 readlink                 0.00     0.00
	  1 realpath                 0.00     0.00
	  1 sort                     0.00     0.00
	  1 stty                     0.00     0.00
	  1 systemctl                0.00     0.00
	  1 template.sh              0.00     0.00
	  1 wc                       0.00     0.00
	  1 xrandr                   0.00     0.00
0 processes running
47 maximum processes

Computation structure

      204018) rnnoise          cpu=0 start=5.59  finish=21.43
        204019) rnnoise_demo     cpu=3 start=5.59  finish=21.43
          204020) rnnoise_demo     cpu=15 start=5.59  finish=5.59 
            204021) rnnoise_demo     cpu=12 start=5.59  finish=5.59 
          204022) rnnoise_demo     cpu=5 start=5.59  finish=5.59 
          204023) rnnoise_demo     cpu=2 start=5.59  finish=5.60 
            204024) rnnoise_demo     cpu=1 start=5.59  finish=5.59 
            204025) sed              cpu=14 start=5.60  finish=5.60 
          204026) rnnoise_demo     cpu=12 start=5.60  finish=5.60 
            204027) ls               cpu=5 start=5.60  finish=5.60 
            204028) sed              cpu=15 start=5.60  finish=5.60 
          204029) rnnoise_demo     cpu=9 start=5.60  finish=5.60 
          204030) rnnoise_demo     cpu=2 start=5.60  finish=5.61 
            204031) rnnoise_demo     cpu=14 start=5.61  finish=5.61 
            204032) sed              cpu=0 start=5.61  finish=5.61