Benchmarking an open source image manipulation program. You can see four different tests with slightly different profiles.

This looks like mostly a single-threaded program. This accounts for a low amount of on-cpu time. Somewhat branchy code with a moderate amount of floating point. Backend and particularly memory access seems to take largest share of time.

elapsed              262.274
on_cpu               0.121          # 1.94 / 16 cores
utime                366.606
stime                141.072
nvcsw                4489040        # 99.83%
nivcsw               7840           # 0.17%
inblock              262848         # 1002.19/sec
onblock              2932088        # 11179.48/sec
cpu-clock            504118685148   # 504.119 seconds
task-clock           505485321404   # 505.485 seconds
page faults          50883528       # 100662.721/sec
context switches     4497375        # 8897.143/sec
cpu migrations       10959          # 21.680/sec
major page faults    729            # 1.442/sec
minor page faults    50882799       # 100661.279/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             361656385246   # 143.514 branches per 1000 inst
branch misses        13952901330    # 3.86% branch miss
conditional          267322482059   # 106.080 conditional branches per 1000 inst
indirect             7525360183     # 2.986 indirect branches per 1000 inst
cpu-cycles           2188314398268  # 0.53 GHz
instructions         2509486134649  # 1.15 IPC
slots                4384834911768  #
retiring             844543083364   # 19.3% (20.6%)
-- ucode             2508221097     #     0.1%
-- fastpath          842034862267   #    19.2%
frontend             985978905389   # 22.5% (24.0%)
-- latency           717473756388   #    16.4%
-- bandwidth         268505149001   #     6.1%
backend              2164748146131  # 49.4% (52.7%)
-- cpu               542851638476   #    12.4%
-- memory            1621896507655  #    37.0%
speculation          113713583073   #  2.6% ( 2.8%)
-- branch mispredict 107823545927   #     2.5%
-- pipeline restart  5890037146     #     0.1%
smt-contention       275354088418   #  6.3% ( 0.0%)
cpu-cycles           2177473596639  # 0.53 GHz
instructions         2498720982907  # 1.15 IPC
instructions         833123582177   # 23.471 l2 access per 1000 inst
l2 hit from l1       14295187339    # 21.63% l2 miss
l2 miss from l1      1897261648     #
l2 hit from l2 pf    2926997800     #
l3 hit from l2 pf    808079008      #
l3 miss from l2 pf   1523752563     #
instructions         835136271658   # 168.563 float per 1000 inst
float 512            293            # 0.000 AVX-512 per 1000 inst
float 256            276            # 0.000 AVX-256 per 1000 inst
float 128            140773377324   # 168.563 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         0              # 0.000 scalar per 1000 inst

Intel metrics

elapsed              323.082
on_cpu               0.131          # 2.10 / 16 cores
utime                531.315
stime                146.991
nvcsw                10503543       # 99.85%
nivcsw               15479          # 0.15%
inblock              8              # 0.02/sec
onblock              2931888        # 9074.76/sec
cpu-clock            670147573846   # 670.148 seconds
task-clock           671777403316   # 671.777 seconds
page faults          50351069       # 74952.013/sec
context switches     10519619       # 15659.382/sec
cpu migrations       30361          # 45.195/sec
major page faults    1              # 0.001/sec
minor page faults    50351068       # 74952.012/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             342678860344   # 135.988 branches per 1000 inst
branch misses        3385141914     # 0.99% branch miss
conditional          342679096184   # 135.988 conditional branches per 1000 inst
indirect             49044340836    # 19.463 indirect branches per 1000 inst
slots                6042017136368  #
retiring             2057791788255  # 34.1% (34.1%)
-- ucode             206973543335   #     3.4%
-- fastpath          1850818244920  #    30.6%
frontend             869596524785   # 14.4% (14.4%)
-- latency           441086010050   #     7.3%
-- bandwidth         428510514735   #     7.1%
backend              2531687347647  # 41.9% (41.9%)
-- cpu               625305575545   #    10.3%
-- memory            1906381772102  #    31.6%
speculation          609068467938   # 10.1% (10.1%)
-- branch mispredict 488700891743   #     8.1%
-- pipeline restart  120367576195   #     2.0%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           3264854609930  # 0.61 GHz
instructions         4152760303612  # 1.27 IPC
l2 access            47561932109    # 20.313 l2 access per 1000 inst
l2 miss              20055987975    # 42.17% l2 miss

The process profile has a large number of “worker” threads. These seem to be launched in parallel but not on the CPU at the same time.

7389 processes
	5580 worker                5202.05  1937.17
	420 gdbus                  348.48   129.47
	425 gmain                  348.39   129.44
	 36 gimp                   292.85    67.87
	 12 async                  292.80    67.83
	360 file-jpeg               53.88    61.30
	 12 bzip2                   11.24     0.14
	 12 async-ind                1.99     1.98
	 12 xz                       1.94     0.15
	 36 script-fu                1.62     0.24
	 38 vulkaninfo               0.75     1.14
	  6 glxinfo:gdrv0            0.09     0.07
	  6 php                      0.08     0.13
	  4 vulkani:disk$0           0.08     0.12
	  2 glxinfo                  0.06     0.03
	  2 glxinfo:cs0              0.05     0.03
	  2 glxinfo:disk$0           0.05     0.03
	  2 glxinfo:sh0              0.05     0.03
	  2 glxinfo:shlo0            0.05     0.03
	  2 llvmpipe-0               0.04     0.06
	  2 llvmpipe-1               0.04     0.06
	  2 llvmpipe-10              0.04     0.06
	  2 llvmpipe-11              0.04     0.06
	  2 llvmpipe-12              0.04     0.06
	  2 llvmpipe-13              0.04     0.06
	  2 llvmpipe-14              0.04     0.06
	  2 llvmpipe-15              0.04     0.06
	  2 llvmpipe-2               0.04     0.06
	  2 llvmpipe-3               0.04     0.06
	  2 llvmpipe-4               0.04     0.06
	  2 llvmpipe-5               0.04     0.06
	  2 llvmpipe-6               0.04     0.06
	  2 llvmpipe-7               0.04     0.06
	  2 llvmpipe-8               0.04     0.06
	  2 llvmpipe-9               0.04     0.06
	  6 clang                    0.03     0.07
	 12 rawtherapee              0.03     0.02
	 24 tar                      0.02     1.72
	 12 swap writer              0.00   292.80
	 12 [pango] FcInit           0.00     2.09
	  1 lspci                    0.00     0.03
	107 sh                       0.00     0.00
	 36 file-darktable           0.00     0.00
	 16 bash                     0.00     0.00
	 16 rm                       0.00     0.00
	 12 awk                      0.00     0.00
	 12 file-glob                0.00     0.00
	 12 file-heif                0.00     0.00
	 12 file-rawtherape          0.00     0.00
	 12 gcc                      0.00     0.00
	 12 head                     0.00     0.00
	  9 stty                     0.00     0.00
	  8 gsettings                0.00     0.00
	  8 stat                     0.00     0.00
	  8 systemd-detect-          0.00     0.00
	  6 llvm-link                0.00     0.00
	  5 phoronix-test-s          0.00     0.00
	  3 dconf worker             0.00     0.00
	  2 lscpu                    0.00     0.00
	  2 uname                    0.00     0.00
	  2 which                    0.00     0.00
	  2 xset                     0.00     0.00
	  1 cc                       0.00     0.00
	  1 date                     0.00     0.00
	  1 dirname                  0.00     0.00
	  1 dmesg                    0.00     0.00
	  1 dmidecode                0.00     0.00
	  1 grep                     0.00     0.00
	  1 ifconfig                 0.00     0.00
	  1 ip                       0.00     0.00
	  1 lsmod                    0.00     0.00
	  1 mktemp                   0.00     0.00
	  1 ps                       0.00     0.00
	  1 readlink                 0.00     0.00
	  1 realpath                 0.00     0.00
	  1 sed                      0.00     0.00
	  1 sort                     0.00     0.00
	  1 systemctl                0.00     0.00
	  1 template.sh              0.00     0.00
	  1 wc                       0.00     0.00
	  1 xrandr                   0.00     0.00