Video encoding using the Google libvpx library. There are four workloads, two for each of two speed levels and then for 4K and 1080p decoding. Looks like variable numbers of processes though one per physical core.

Topdown profil has slight differences in workloads, though fairly high retirement rate with backend stalls being the largest limiter.

AMD metrics show floating point code with some memory bound stalls. There are few branches, though still branch misprediction.

elapsed              623.409
on_cpu               0.311          # 4.97 / 16 cores
utime                3048.837
stime                48.411
nvcsw                3637285        # 99.86%
nivcsw               5119           # 0.14%
inblock              0              # 0.00/sec
onblock              15512          # 24.88/sec
cpu-clock            3089203964534  # 3089.204 seconds
task-clock           3091077405978  # 3091.077 seconds
page faults          2123880        # 687.100/sec
context switches     3645325        # 1179.306/sec
cpu migrations       3420           # 1.106/sec
major page faults    2              # 0.001/sec
minor page faults    2123878        # 687.100/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             1721188727239  # 52.252 branches per 1000 inst
branch misses        26850172244    # 1.56% branch miss
conditional          1385965910144  # 42.075 conditional branches per 1000 inst
indirect             56654184709    # 1.720 indirect branches per 1000 inst
cpu-cycles           12148882895870 # 1.24 GHz
instructions         32948907414886 # 2.71 IPC
slots                24279285092832 #
retiring             11013916606615 # 45.4% (45.6%)
-- ucode             20225750600    #     0.1%
-- fastpath          10993690856015 #    45.3%
frontend             3545739782724  # 14.6% (14.7%)
-- latency           1451506674132  #     6.0%
-- bandwidth         2094233108592  #     8.6%
backend              8701349022251  # 35.8% (36.0%)
-- cpu               3205982477495  #    13.2%
-- memory            5495366544756  #    22.6%
speculation          891925444057   #  3.7% ( 3.7%)
-- branch mispredict 844250791145   #     3.5%
-- pipeline restart  47674652912    #     0.2%
smt-contention       126338239440   #  0.5% ( 0.0%)
cpu-cycles           12134150872890 # 1.24 GHz
instructions         32940470679251 # 2.71 IPC
instructions         10979290279131 # 40.403 l2 access per 1000 inst
l2 hit from l1       383961375679   # 16.06% l2 miss
l2 miss from l1      52653668633    #
l2 hit from l2 pf    41052024947    #
l3 hit from l2 pf    14381078758    #
l3 miss from l2 pf   4198549610     #
instructions         10981114496824 # 243.225 float per 1000 inst
float 512            58             # 0.000 AVX-512 per 1000 inst
float 256            514            # 0.000 AVX-256 per 1000 inst
float 128            2670880984964  # 243.225 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         2469           # 0.000 scalar per 1000 inst
instructions         2686249        #
opcache              993457         # 369.831 opcache per 1000 inst
opcache miss         533742         # 53.7% opcache miss rate
l1 dTLB miss         6511           # 2.424 L1 dTLB per 1000 inst
l2 dTLB miss         1193           # 0.444 L2 dTLB per 1000 inst
instructions         2738555        #
icache               1323391        # 483.244 icache per 1000 inst
icache miss          110104         #  8.3% icache miss rate
l1 iTLB miss         6              # 0.002 L1 iTLB per 1000 inst
l2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst
tlb flush            19             # 0.007 TLB flush per 1000 inst

Intel metrics show L2 cache as most active stalls for memory

elapsed              1393.848
on_cpu               0.320          # 5.12 / 16 cores
utime                7051.575
stime                81.121
nvcsw                5955297        # 98.67%
nivcsw               80192          # 1.33%
inblock              20984032       # 15054.75/sec
onblock              5464           # 3.92/sec
cpu-clock            7093046110780  # 7093.046 seconds
task-clock           7096878999736  # 7096.879 seconds
page faults          2495483        # 351.631/sec
context switches     6042276        # 851.399/sec
cpu migrations       163180         # 22.993/sec
major page faults    2861           # 0.403/sec
minor page faults    2492622        # 351.228/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             2459209749628  # 50.743 branches per 1000 inst
branch misses        44820326405    # 1.82% branch miss
conditional          2459209765468  # 50.743 conditional branches per 1000 inst
indirect             620204202405   # 12.797 indirect branches per 1000 inst
slots                49219054467146 #
retiring             23760356174046 # 48.3% (48.3%)
-- ucode             1008550311918  #     2.0%
-- fastpath          22751805862128 #    46.2%
frontend             6378237707292  # 13.0% (13.0%)
-- latency           3046542845679  #     6.2%
-- bandwidth         3331694861613  #     6.8%
backend              16273030130206 # 33.1% (33.1%)
-- cpu               9088240401088  #    18.5%
-- memory            7184789729118  #    14.6%
speculation          3130243686700  #  6.4% ( 6.4%)
-- branch mispredict 2944870071324  #     6.0%
-- pipeline restart  185373615376   #     0.4%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           16282458738408 # 1.01 GHz
instructions         44439229330649 # 2.73 IPC
l2 access            902938123881   # 39.431 l2 access per 1000 inst
l2 miss              264018120167   # 29.24% l2 miss
cpu-cycles           8391353786130  # 21.0% memory latency
load stalls          1601500730217  #  0.0% l1 bound
l1 miss              1707146006108  # 12.4% l2 bound
l2 miss              668404285722   #  4.3% l3 bound
l3 miss              305217164369   #  3.6% dram bound
store_stalls         158271209456   #  1.9% store bound

Process overview shows vpxenc as the primary process

450 processes
	 96 vpxenc               23832.39   254.91
	 68 clinfo                  17.20     5.33
	 38 vulkaninfo               1.14     1.14
	  4 vulkani:disk$0           0.12     0.12
	  6 glxinfo:gdrv0            0.10     0.04
	  6 glxinfo:gl0              0.10     0.04
	  6 clang                    0.10     0.01
	  6 php                      0.07     0.17
	  2 llvmpipe-0               0.06     0.06
	  2 llvmpipe-1               0.06     0.06
	  2 llvmpipe-10              0.06     0.06
	  2 llvmpipe-11              0.06     0.06
	  2 llvmpipe-12              0.06     0.06
	  2 llvmpipe-13              0.06     0.06
	  2 llvmpipe-14              0.06     0.06
	  2 llvmpipe-15              0.06     0.06
	  2 llvmpipe-2               0.06     0.06
	  2 llvmpipe-3               0.06     0.06
	  2 llvmpipe-4               0.06     0.06
	  2 llvmpipe-5               0.06     0.06
	  2 llvmpipe-6               0.06     0.06
	  2 llvmpipe-7               0.06     0.06
	  2 llvmpipe-8               0.06     0.06
	  2 llvmpipe-9               0.06     0.06
	  2 glxinfo                  0.06     0.02
	  2 glxinfo:cs0              0.06     0.02
	  2 glxinfo:disk$0           0.06     0.02
	  2 glxinfo:sh0              0.06     0.02
	  2 glxinfo:shlo0            0.06     0.02
	  3 rocminfo                 0.03     0.00
	  1 lspci                    0.00     0.02
	 88 sh                       0.00     0.00
	 13 gcc                      0.00     0.00
	 13 gsettings                0.00     0.00
	  8 stat                     0.00     0.00
	  8 systemd-detect-          0.00     0.00
	  6 llvm-link                0.00     0.00
	  5 phoronix-test-s          0.00     0.00
	  2 cc                       0.00     0.00
	  2 gmain                    0.00     0.00
	  2 lscpu                    0.00     0.00
	  2 uname                    0.00     0.00
	  2 which                    0.00     0.00
	  2 xset                     0.00     0.00
	  1 date                     0.00     0.00
	  1 dconf worker             0.00     0.00
	  1 dirname                  0.00     0.00
	  1 dmesg                    0.00     0.00
	  1 dmidecode                0.00     0.00
	  1 grep                     0.00     0.00
	  1 ifconfig                 0.00     0.00
	  1 ip                       0.00     0.00
	  1 lsmod                    0.00     0.00
	  1 mktemp                   0.00     0.00
	  1 ps                       0.00     0.00
	  1 qdbus                    0.00     0.00
	  1 readlink                 0.00     0.00
	  1 realpath                 0.00     0.00
	  1 sed                      0.00     0.00
	  1 sort                     0.00     0.00
	  1 stty                     0.00     0.00
	  1 systemctl                0.00     0.00
	  1 template.sh              0.00     0.00
	  1 wc                       0.00     0.00
	  1 xrandr                   0.00     0.00
0 processes running
47 maximum processes

Computation structure is straightforward.

      333831) vpxenc           cpu=11 start=6.65  finish=91.46
        333832) vpxenc           cpu=11 start=6.66  finish=91.46
          333833) vpxenc           cpu=2 start=7.15  finish=91.41
          333834) vpxenc           cpu=12 start=7.15  finish=91.41
          333835) vpxenc           cpu=7 start=7.16  finish=91.41
          333836) vpxenc           cpu=1 start=7.16  finish=91.41
          333837) vpxenc           cpu=5 start=7.16  finish=91.41
          333838) vpxenc           cpu=6 start=7.16  finish=91.41
          333839) vpxenc           cpu=0 start=7.17  finish=91.41