Test of a media encoder for AV1 format using 16 different test cases. These seem to vary on number of runnable processes as well as how busy the CPU cores are kept.

Topdown profiles show occasional frontend stalls but more dominated by backend memory stalls and a mid-level retirement

AMD metrics show on average running on half the cores. Some floating point and not very many branches.

elapsed              2276.295
on_cpu               0.493          # 7.88 / 16 cores
utime                17665.690
stime                280.753
nvcsw                14114020       # 98.83%
nivcsw               167320         # 1.17%
inblock              0              # 0.00/sec
onblock              201704         # 88.61/sec
cpu-clock            17919079774968 # 17919.080 seconds
task-clock           17926988199725 # 17926.988 seconds
page faults          43234643       # 2411.707/sec
context switches     14292008       # 797.234/sec
cpu migrations       34673          # 1.934/sec
major page faults    799            # 0.045/sec
minor page faults    43233844       # 2411.662/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             7098286347416  # 67.726 branches per 1000 inst
branch misses        104143888828   # 1.47% branch miss
conditional          5790064231135  # 55.244 conditional branches per 1000 inst
indirect             204125042808   # 1.948 indirect branches per 1000 inst
cpu-cycles           69273323037251 # 1.79 GHz
instructions         109171124055305 # 1.58 IPC
slots                138509791403472 #
retiring             36282593854356 # 26.2% (35.2%)
-- ucode             64945263255    #     0.0%
-- fastpath          36217648591101 #    26.1%
frontend             12898774713475 #  9.3% (12.5%)
-- latency           8858202001362  #     6.4%
-- bandwidth         4040572712113  #     2.9%
backend              52192703052128 # 37.7% (50.6%)
-- cpu               15916741610982 #    11.5%
-- memory            36275961441146 #    26.2%
speculation          1744351594826  #  1.3% ( 1.7%)
-- branch mispredict 1684874499188  #     1.2%
-- pipeline restart  59477095638    #     0.0%
smt-contention       35389948976400 # 25.6% ( 0.0%)
cpu-cycles           66976690835536 # 1.77 GHz
instructions         106537986397909 # 1.59 IPC
instructions         35497402711557 # 71.862 l2 access per 1000 inst
l2 hit from l1       1983397983497  # 9.58% l2 miss
l2 miss from l1      130076666250   #
l2 hit from l2 pf    453115676932   #
l3 hit from l2 pf    79419555362    #
l3 miss from l2 pf   34973391671    #
instructions         35484242372575 # 113.131 float per 1000 inst
float 512            180            # 0.000 AVX-512 per 1000 inst
float 256            588            # 0.000 AVX-256 per 1000 inst
float 128            4014364969945  # 113.131 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         61848          # 0.000 scalar per 1000 inst

Intel metrics

elapsed              3545.058
on_cpu               0.597          # 9.54 / 16 cores
utime                33458.439
stime                378.988
nvcsw                29791383       # 98.64%
nivcsw               410241         # 1.36%
inblock              8928           # 2.52/sec
onblock              159936         # 45.12/sec
cpu-clock            33776356152864 # 33776.356 seconds
task-clock           33787174021112 # 33787.174 seconds
page faults          46536742       # 1377.349/sec
context switches     30218803       # 894.387/sec
cpu migrations       435720         # 12.896/sec
major page faults    308            # 0.009/sec
minor page faults    46536434       # 1377.340/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             9846368555244  # 61.950 branches per 1000 inst
branch misses        134683159152   # 1.37% branch miss
conditional          9846368644140  # 61.950 conditional branches per 1000 inst
indirect             2925897507424  # 18.409 indirect branches per 1000 inst
slots                198700353218444 #
retiring             111274731064663 # 56.0% (56.0%)
-- ucode             8502464763869  #     4.3%
-- fastpath          102772266300794 #    51.7%
frontend             31289436066663 # 15.7% (15.7%)
-- latency           19819385118484 #    10.0%
-- bandwidth         11470050948179 #     5.8%
backend              47506138189700 # 23.9% (23.9%)
-- cpu               23218262742886 #    11.7%
-- memory            24287875446814 #    12.2%
speculation          11676409983912 #  5.9% ( 5.9%)
-- branch mispredict 11341721304326 #     5.7%
-- pipeline restart  334688679586   #     0.2%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           118239467131762 # 1.75 GHz
instructions         222294493255214 # 1.88 IPC
l2 access            6755722248527  # 60.866 l2 access per 1000 inst
l2 miss              1024804494949  # 15.17% l2 miss

Process overview

2409 processes
	 93 aomenc               16959.58   203.65
	1665 aom enc worker         117.00     0.00
	 68 clinfo                  16.53     6.08
	 38 vulkaninfo               0.95     1.33
	  6 php                      0.24     0.73
	  6 glxinfo:gdrv0            0.11     0.10
	  4 vulkani:disk$0           0.10     0.14
	  2 llvmpipe-0               0.05     0.07
	  2 llvmpipe-1               0.05     0.07
	  2 llvmpipe-10              0.05     0.07
	  2 llvmpipe-11              0.05     0.07
	  2 llvmpipe-12              0.05     0.07
	  2 llvmpipe-13              0.05     0.07
	  2 llvmpipe-14              0.05     0.07
	  2 llvmpipe-15              0.05     0.07
	  2 llvmpipe-2               0.05     0.07
	  2 llvmpipe-3               0.05     0.07
	  2 llvmpipe-4               0.05     0.07
	  2 llvmpipe-5               0.05     0.07
	  2 llvmpipe-6               0.05     0.07
	  2 llvmpipe-7               0.05     0.07
	  2 llvmpipe-8               0.05     0.07
	  2 llvmpipe-9               0.05     0.07
	  6 clang                    0.05     0.06
	  2 glxinfo                  0.05     0.04
	  2 glxinfo:cs0              0.05     0.04
	  2 glxinfo:disk$0           0.05     0.04
	  2 glxinfo:sh0              0.05     0.04
	  2 glxinfo:shlo0            0.05     0.04
	  3 rocminfo                 0.03     0.00
	  1 lspci                    0.00     0.02
	  1 ps                       0.00     0.01
	112 sh                       0.00     0.00
	 94 sed                      0.00     0.00
	 93 aom-av1                  0.00     0.00
	 93 rm                       0.00     0.00
	 13 gcc                      0.00     0.00
	 10 gsettings                0.00     0.00
	  8 stat                     0.00     0.00
	  8 systemd-detect-          0.00     0.00
	  6 llvm-link                0.00     0.00
	  5 gmain                    0.00     0.00
	  5 phoronix-test-s          0.00     0.00
	  2 cc                       0.00     0.00
	  2 lscpu                    0.00     0.00
	  2 uname                    0.00     0.00
	  2 which                    0.00     0.00
	  2 xset                     0.00     0.00
	  1 date                     0.00     0.00
	  1 dconf worker             0.00     0.00
	  1 dirname                  0.00     0.00
	  1 dmesg                    0.00     0.00
	  1 dmidecode                0.00     0.00
	  1 grep                     0.00     0.00
	  1 ifconfig                 0.00     0.00
	  1 ip                       0.00     0.00
	  1 lsmod                    0.00     0.00
	  1 mktemp                   0.00     0.00
	  1 qdbus                    0.00     0.00
	  1 readlink                 0.00     0.00
	  1 realpath                 0.00     0.00
	  1 sort                     0.00     0.00
	  1 stty                     0.00     0.00
	  1 systemctl                0.00     0.00
	  1 template.sh              0.00     0.00
	  1 wc                       0.00     0.00
	  1 xrandr                   0.00     0.00
0 processes running
47 maximum processes

Computation structures seem to be set on one CPU at least as far as the last CPU run on?

      34201) aom-av1          cpu=3 start=5.69  finish=105.89
        34202) aomenc           cpu=5 start=5.69  finish=105.81
          34203) aom enc worker   cpu=0 start=5.96  finish=6.87 
          34204) aom enc worker   cpu=0 start=5.96  finish=6.87 
          34205) aom enc worker   cpu=0 start=5.96  finish=6.87 
          34206) aom enc worker   cpu=0 start=5.96  finish=6.87 
          34207) aom enc worker   cpu=0 start=5.96  finish=6.87 
          34208) aom enc worker   cpu=0 start=5.96  finish=6.87 
          34209) aom enc worker   cpu=0 start=5.96  finish=6.87 
          34210) aom enc worker   cpu=0 start=5.96  finish=6.87 
          34211) aom enc worker   cpu=0 start=5.96  finish=6.87 
          34212) aom enc worker   cpu=0 start=5.96  finish=6.87 
          34213) aom enc worker   cpu=0 start=5.96  finish=6.87 
          34214) aom enc worker   cpu=0 start=5.96  finish=6.87 
          34215) aom enc worker   cpu=0 start=5.96  finish=6.87 
          34216) aom enc worker   cpu=0 start=5.96  finish=6.87 
          34217) aom enc worker   cpu=0 start=5.96  finish=6.87 
          34218) aom enc worker   cpu=0 start=7.07  finish=105.80
          34219) aom enc worker   cpu=0 start=7.07  finish=105.80
          34220) aom enc worker   cpu=0 start=7.07  finish=105.80
          34221) aom enc worker   cpu=0 start=7.07  finish=105.80
          34222) aom enc worker   cpu=0 start=7.07  finish=105.80
          34223) aom enc worker   cpu=0 start=7.07  finish=105.81
          34224) aom enc worker   cpu=0 start=7.07  finish=105.81
          34225) aom enc worker   cpu=0 start=7.08  finish=105.81
          34226) aom enc worker   cpu=0 start=7.08  finish=105.81
          34227) aom enc worker   cpu=0 start=7.08  finish=105.81
          34228) aom enc worker   cpu=0 start=7.08  finish=105.81
          34229) aom enc worker   cpu=0 start=7.08  finish=105.81
          34230) aom enc worker   cpu=0 start=7.08  finish=105.81
          34231) aom enc worker   cpu=0 start=7.08  finish=105.81
          34232) aom enc worker   cpu=0 start=7.08  finish=105.81
        34236) sed              cpu=4 start=105.88 finish=105.89
        34237) rm               cpu=14 start=105.89 finish=105.89