This is a software-defined radio signal processing library. There are 15 workloads trying variations including how many threads are started, apparent in the progression below.

Topdown metrics show a low overall amount of frontend stalls, though it also looks like they are peppered in periodically at higher level. Otherwise backend stalls tend to be the largest contributor.

AMD metrics show a moderate number of indirect branches as percentage of overall branches and lower L2 access.

elapsed              1697.119
on_cpu               0.326          # 5.22 / 16 cores
utime                8856.685
stime                1.784
nvcsw                16842          # 25.40%
nivcsw               49461          # 74.60%
inblock              8              # 0.00/sec
onblock              17368          # 10.23/sec
cpu-clock            8859043109277  # 8859.043 seconds
task-clock           8859086569027  # 8859.087 seconds
page faults          189820         # 21.427/sec
context switches     74498          # 8.409/sec
cpu migrations       2162           # 0.244/sec
major page faults    2              # 0.000/sec
minor page faults    189818         # 21.426/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             8939620743102  # 146.505 branches per 1000 inst
branch misses        4916139852     # 0.05% branch miss
conditional          4087186678285  # 66.982 conditional branches per 1000 inst
indirect             1816700409580  # 29.773 indirect branches per 1000 inst
cpu-cycles           43906190865863 # 1.12 GHz
instructions         77386138698388 # 1.76 IPC
slots                87820253971404 #
retiring             26966840935029 # 30.7% (34.3%)
-- ucode             60406107       #     0.0%
-- fastpath          26966780528922 #    30.7%
frontend             655509839567   #  0.7% ( 0.8%)
-- latency           442054892988   #     0.5%
-- bandwidth         213454946579   #     0.2%
backend              50285143305180 # 57.3% (64.0%)
-- cpu               24928897941717 #    28.4%
-- memory            25356245363463 #    28.9%
speculation          720398682627   #  0.8% ( 0.9%)
-- branch mispredict 395658212360   #     0.5%
-- pipeline restart  324740470267   #     0.4%
smt-contention       9192280268536  # 10.5% ( 0.0%)
cpu-cycles           35561747977725 # 1.37 GHz
instructions         56362778341513 # 1.58 IPC
instructions         18791619171629 # 2.312 l2 access per 1000 inst
l2 hit from l1       32872153754    # 6.47% l2 miss
l2 miss from l1      2333406782     #
l2 hit from l2 pf    10093658356    #
l3 hit from l2 pf    471559932      #
l3 miss from l2 pf   5473831        #
instructions         18779521772018 # 176.946 float per 1000 inst
float 512            95             # 0.000 AVX-512 per 1000 inst
float 256            184530991347   # 9.826 AVX-256 per 1000 inst
float 128            3138425566159  # 167.120 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         0              # 0.000 scalar per 1000 inst

Intel metrics

elapsed              2101.806
on_cpu               0.449          # 7.18 / 16 cores
utime                15099.025
stime                1.385
nvcsw                20978          # 15.89%
nivcsw               111064         # 84.11%
inblock              19432          # 9.25/sec
onblock              6256           # 2.98/sec
cpu-clock            15101217180305 # 15101.217 seconds
task-clock           15101261875302 # 15101.262 seconds
page faults          180657         # 11.963/sec
context switches     142233         # 9.419/sec
cpu migrations       3874           # 0.257/sec
major page faults    116            # 0.008/sec
minor page faults    180541         # 11.955/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             14706391828425 # 142.843 branches per 1000 inst
branch misses        14208269354    # 0.10% branch miss
conditional          14706391860937 # 142.843 conditional branches per 1000 inst
indirect             6299830739447  # 61.190 indirect branches per 1000 inst
slots                115697040400052 #
retiring             69201998292968 # 59.8% (59.8%)
-- ucode             8257132014881  #     7.1%
-- fastpath          60944866278087 #    52.7%
frontend             18277382966992 # 15.8% (15.8%)
-- latency           9902669823664  #     8.6%
-- bandwidth         8374713143328  #     7.2%
backend              27497778361275 # 23.8% (23.8%)
-- cpu               19048599236733 #    16.5%
-- memory            8449179124542  #     7.3%
speculation          749205159374   #  0.6% ( 0.6%)
-- branch mispredict 571390197369   #     0.5%
-- pipeline restart  177814962005   #     0.2%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           27865924741031 # 0.84 GHz
instructions         65644794602005 # 2.36 IPC
l2 access            4235697429     # 0.065 l2 access per 1000 inst
l2 miss              3330682985     # 78.63% l2 miss

Process overview shows benchmark_threa is the primary resource user.

796 processes
	366 benchmark_threa      102837.37     3.75
	 68 clinfo                  16.20     6.66
	 38 vulkaninfo               1.14     1.14
	  6 php                      0.15     0.48
	  6 glxinfo:gdrv0            0.15     0.07
	  4 vulkani:disk$0           0.12     0.12
	  2 glxinfo                  0.07     0.03
	  2 glxinfo:cs0              0.07     0.03
	  2 glxinfo:disk$0           0.07     0.03
	  2 glxinfo:sh0              0.07     0.03
	  2 glxinfo:shlo0            0.07     0.03
	  2 llvmpipe-0               0.06     0.06
	  2 llvmpipe-1               0.06     0.06
	  2 llvmpipe-10              0.06     0.06
	  2 llvmpipe-11              0.06     0.06
	  2 llvmpipe-12              0.06     0.06
	  2 llvmpipe-13              0.06     0.06
	  2 llvmpipe-14              0.06     0.06
	  2 llvmpipe-15              0.06     0.06
	  2 llvmpipe-2               0.06     0.06
	  2 llvmpipe-3               0.06     0.06
	  2 llvmpipe-4               0.06     0.06
	  2 llvmpipe-5               0.06     0.06
	  2 llvmpipe-6               0.06     0.06
	  2 llvmpipe-7               0.06     0.06
	  2 llvmpipe-8               0.06     0.06
	  2 llvmpipe-9               0.06     0.06
	  6 clang                    0.03     0.09
	  3 rocminfo                 0.00     0.03
	  1 lspci                    0.00     0.02
	110 sh                       0.00     0.00
	 60 liquid-dsp               0.00     0.00
	 13 gcc                      0.00     0.00
	  8 gsettings                0.00     0.00
	  8 stat                     0.00     0.00
	  8 systemd-detect-          0.00     0.00
	  6 llvm-link                0.00     0.00
	  5 gmain                    0.00     0.00
	  5 phoronix-test-s          0.00     0.00
	  3 dconf worker             0.00     0.00
	  2 cc                       0.00     0.00
	  2 lscpu                    0.00     0.00
	  2 uname                    0.00     0.00
	  2 which                    0.00     0.00
	  2 xset                     0.00     0.00
	  1 date                     0.00     0.00
	  1 dirname                  0.00     0.00
	  1 dmesg                    0.00     0.00
	  1 dmidecode                0.00     0.00
	  1 grep                     0.00     0.00
	  1 ifconfig                 0.00     0.00
	  1 ip                       0.00     0.00
	  1 lsmod                    0.00     0.00
	  1 mktemp                   0.00     0.00
	  1 ps                       0.00     0.00
	  1 qdbus                    0.00     0.00
	  1 readlink                 0.00     0.00
	  1 realpath                 0.00     0.00
	  1 sed                      0.00     0.00
	  1 sort                     0.00     0.00
	  1 stty                     0.00     0.00
	  1 systemctl                0.00     0.00
	  1 template.sh              0.00     0.00
	  1 wc                       0.00     0.00
	  1 xrandr                   0.00     0.00
0 processes running
47 maximum processes

Computation blocks

      255011) liquid-dsp       cpu=5 start=5.85  finish=35.89
        255012) benchmark_threa  cpu=5 start=5.86  finish=35.89
          255013) benchmark_threa  cpu=6 start=5.86  finish=35.89
      255018) liquid-dsp       cpu=12 start=39.89 finish=69.93
        255019) benchmark_threa  cpu=5 start=39.90 finish=69.93
          255020) benchmark_threa  cpu=14 start=39.90 finish=69.93
      255022) liquid-dsp       cpu=4 start=73.94 finish=103.97
        255023) benchmark_threa  cpu=5 start=73.94 finish=103.97
          255024) benchmark_threa  cpu=14 start=73.94 finish=103.97
      255025) sh               cpu=11 start=103.97 finish=103.97
        255026) sh               cpu=5 start=103.97 finish=103.97
      255027) liquid-dsp       cpu=0 start=114.16 finish=144.20
        255028) benchmark_threa  cpu=1 start=114.16 finish=144.20
          255029) benchmark_threa  cpu=11 start=114.17 finish=144.20
      255030) liquid-dsp       cpu=8 start=148.20 finish=178.24
        255031) benchmark_threa  cpu=8 start=148.20 finish=178.24
          255032) benchmark_threa  cpu=3 start=148.21 finish=178.24
      255033) liquid-dsp       cpu=8 start=182.24 finish=212.28
        255034) benchmark_threa  cpu=1 start=182.25 finish=212.28
          255035) benchmark_threa  cpu=3 start=182.25 finish=212.28
      255036) liquid-dsp       cpu=8 start=216.28 finish=246.32
        255037) benchmark_threa  cpu=1 start=216.28 finish=246.32
          255038) benchmark_threa  cpu=3 start=216.29 finish=246.32
      255042) liquid-dsp       cpu=8 start=250.32 finish=280.36
        255043) benchmark_threa  cpu=1 start=250.32 finish=280.36
          255044) benchmark_threa  cpu=11 start=250.33 finish=280.36
      255075) liquid-dsp       cpu=8 start=284.36 finish=314.40
        255076) benchmark_threa  cpu=1 start=284.37 finish=314.40
          255077) benchmark_threa  cpu=3 start=284.37 finish=314.40
      255078) sh               cpu=5 start=314.40 finish=314.40
        255079) sh               cpu=14 start=314.40 finish=314.40
      255080) liquid-dsp       cpu=5 start=324.68 finish=354.79
        255081) benchmark_threa  cpu=14 start=324.68 finish=354.79
          255082) benchmark_threa  cpu=0 start=324.69 finish=354.79
          255083) benchmark_threa  cpu=1 start=324.69 finish=354.79
      255084) liquid-dsp       cpu=11 start=358.79 finish=388.86
        255085) benchmark_threa  cpu=5 start=358.80 finish=388.86
          255086) benchmark_threa  cpu=14 start=358.80 finish=388.86
          255087) benchmark_threa  cpu=15 start=358.80 finish=388.86
      255088) liquid-dsp       cpu=0 start=392.86 finish=422.93
        255089) benchmark_threa  cpu=9 start=392.86 finish=422.93
          255090) benchmark_threa  cpu=11 start=392.87 finish=422.93
          255091) benchmark_threa  cpu=4 start=392.87 finish=422.93
      255092) sh               cpu=0 start=422.93 finish=422.93
        255093) sh               cpu=9 start=422.93 finish=422.93