Generic graphics library, used by GIMP and applications like GNOME photos with nine different operations. Looks mostly single-threaded with small regions of parallel operation.

Topdown profile shows differences with workloads including some with surprising numbers of branch stalls.

AMD metrics show only ~2 cores, moderate floating point and some memory stalls and frontend latency.
elapsed 1299.526
on_cpu 0.123 # 1.96 / 16 cores
utime 1684.132
stime 865.189
nvcsw 14504168 # 99.47%
nivcsw 76810 # 0.53%
inblock 472 # 0.36/sec
onblock 25580472 # 19684.47/sec
cpu-clock 2542914192252 # 2542.914 seconds
task-clock 2547801640165 # 2547.802 seconds
page faults 69417327 # 27245.970/sec
context switches 14586272 # 5725.042/sec
cpu migrations 28916 # 11.349/sec
major page faults 2220 # 0.871/sec
minor page faults 69415107 # 27245.099/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 2308099026992 # 154.155 branches per 1000 inst
branch misses 132254282252 # 5.73% branch miss
conditional 1758200020186 # 117.428 conditional branches per 1000 inst
indirect 41308870165 # 2.759 indirect branches per 1000 inst
cpu-cycles 10462124659789 # 0.50 GHz
instructions 14863121118375 # 1.42 IPC
slots 21295959429906 #
retiring 5069339259379 # 23.8% (25.6%)
-- ucode 13818688365 # 0.1%
-- fastpath 5055520571014 # 23.7%
frontend 6552525067104 # 30.8% (33.0%)
-- latency 4872044219046 # 22.9%
-- bandwidth 1680480848058 # 7.9%
backend 6746815575381 # 31.7% (34.0%)
-- cpu 1716964005492 # 8.1%
-- memory 5029851569889 # 23.6%
speculation 1456392663684 # 6.8% ( 7.3%)
-- branch mispredict 1436077332204 # 6.7%
-- pipeline restart 20315331480 # 0.1%
smt-contention 1468796759278 # 6.9% ( 0.0%)
cpu-cycles 10463773934761 # 0.50 GHz
instructions 14873614224957 # 1.42 IPC
instructions 4977429685254 # 20.651 l2 access per 1000 inst
l2 hit from l1 69830506371 # 6.94% l2 miss
l2 miss from l1 4094424247 #
l2 hit from l2 pf 29917464683 #
l3 hit from l2 pf 1040649343 #
l3 miss from l2 pf 1999101580 #
instructions 4983393093725 # 68.287 float per 1000 inst
float 512 236 # 0.000 AVX-512 per 1000 inst
float 256 588 # 0.000 AVX-256 per 1000 inst
float 128 340302305646 # 68.287 AVX-128 per 1000 inst
float MMX 0 # 0.000 MMX per 1000 inst
float scalar 0 # 0.000 scalar per 1000 inst
instructions 11764565952558 #
opcache 2579613405100 # 219.270 opcache per 1000 inst
opcache miss 298925598456 # 11.6% opcache miss rate
l1 dTLB miss 13219511975 # 1.124 L1 dTLB per 1000 inst
l2 dTLB miss 844253850 # 0.072 L2 dTLB per 1000 inst
instructions 14934018704737 #
icache 850810850683 # 56.971 icache per 1000 inst
icache miss 32271720763 # 3.8% icache miss rate
l1 iTLB miss 2431998815 # 0.163 L1 iTLB per 1000 inst
l2 iTLB miss 0 # 0.000 L2 iTLB per 1000 inst
tlb flush 11032788 # 0.001 TLB flush per 1000 inst
Intel metrics show higher branch misprediction.
elapsed 1401.041
on_cpu 0.128 # 2.05 / 16 cores
utime 1972.594
stime 906.348
nvcsw 36522641 # 98.90%
nivcsw 406132 # 1.10%
inblock 71376 # 50.94/sec
onblock 24186208 # 17263.02/sec
cpu-clock 2856527054590 # 2856.527 seconds
task-clock 2862881732391 # 2862.882 seconds
page faults 66834847 # 23345.305/sec
context switches 36934938 # 12901.315/sec
cpu migrations 68489 # 23.923/sec
major page faults 1735 # 0.606/sec
minor page faults 66833112 # 23344.699/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 2440547647375 # 150.034 branches per 1000 inst
branch misses 50612278990 # 2.07% branch miss
conditional 2440548597519 # 150.034 conditional branches per 1000 inst
indirect 346999416185 # 21.332 indirect branches per 1000 inst
slots 32977774931438 #
retiring 12575828621225 # 38.1% (38.1%)
-- ucode 933363109249 # 2.8%
-- fastpath 11642465511976 # 35.3%
frontend 5511408969044 # 16.7% (16.7%)
-- latency 2281620609717 # 6.9%
-- bandwidth 3229788359327 # 9.8%
backend 7993444608111 # 24.2% (24.2%)
-- cpu 3686834556596 # 11.2%
-- memory 4306610051515 # 13.1%
speculation 6983503569301 # 21.2% (21.2%) high
-- branch mispredict 6736445909989 # 20.4%
-- pipeline restart 247057659312 # 0.7%
smt-contention 0 # 0.0% ( 0.0%)
cpu-cycles 9444859932545 # 0.39 GHz
instructions 17878506070144 # 1.89 IPC
l2 access 182618165893 # 12.583 l2 access per 1000 inst
l2 miss 26117313118 # 14.30% l2 miss
cpu-cycles 7566300983596 # 24.7% memory latency
load stalls 1710512533932 # 7.7% l1 bound
l1 miss 1130422828617 # 6.4% l2 bound
l2 miss 643690694173 # 7.4% l3 bound
l3 miss 80953651660 # 1.1% dram bound
store_stalls 155326487647 # 2.1% store bound
Process summary shows time spent in both gegl and worker processes.
23701 processes
15147 gegl 23527.85 11046.31
6480 worker 21431.51 9087.18
434 gmain 1429.63 607.01
432 gdbus 1429.63 607.01
68 clinfo 15.87 6.66
38 vulkaninfo 1.33 1.14
4 vulkani:disk$0 0.14 0.12
6 php 0.12 0.24
2 llvmpipe-0 0.07 0.06
2 llvmpipe-1 0.07 0.06
2 llvmpipe-10 0.07 0.06
2 llvmpipe-11 0.07 0.06
2 llvmpipe-12 0.07 0.06
2 llvmpipe-13 0.07 0.06
2 llvmpipe-14 0.07 0.06
2 llvmpipe-15 0.07 0.06
2 llvmpipe-2 0.07 0.06
2 llvmpipe-3 0.07 0.06
2 llvmpipe-4 0.07 0.06
2 llvmpipe-5 0.07 0.06
2 llvmpipe-6 0.07 0.06
2 llvmpipe-7 0.07 0.06
2 llvmpipe-8 0.07 0.06
2 llvmpipe-9 0.07 0.06
6 clang 0.04 0.08
3 rocminfo 0.03 0.00
432 swap writer 0.00 1428.11
432 [pango] FcInit 0.00 98.85
1 lspci 0.00 0.02
99 sh 0.00 0.00
13 gsettings 0.00 0.00
12 gcc 0.00 0.00
8 stat 0.00 0.00
8 systemd-detect- 0.00 0.00
6 llvm-link 0.00 0.00
5 glxinfo 0.00 0.00
5 phoronix-test-s 0.00 0.00
2 grep 0.00 0.00
2 lscpu 0.00 0.00
2 setterm 0.00 0.00
2 uname 0.00 0.00
2 which 0.00 0.00
1 cc 0.00 0.00
1 date 0.00 0.00
1 dconf worker 0.00 0.00
1 dirname 0.00 0.00
1 dmesg 0.00 0.00
1 dmidecode 0.00 0.00
1 ifconfig 0.00 0.00
1 ip 0.00 0.00
1 lsmod 0.00 0.00
1 mktemp 0.00 0.00
1 ps 0.00 0.00
1 qdbus 0.00 0.00
1 readlink 0.00 0.00
1 realpath 0.00 0.00
1 sed 0.00 0.00
1 sort 0.00 0.00
1 stty 0.00 0.00
1 systemctl 0.00 0.00
1 template.sh 0.00 0.00
1 wc 0.00 0.00
0 processes running
47 maximum processes
Example of a computation block with many short-run processes.
470964) gegl cpu=0 start=6.48 finish=6.91
470966) worker cpu=11 start=6.51 finish=6.91
470967) worker cpu=6 start=6.51 finish=6.91
470968) worker cpu=7 start=6.51 finish=6.91
470969) worker cpu=2 start=6.51 finish=6.91
470970) worker cpu=5 start=6.51 finish=6.91
470971) worker cpu=1 start=6.51 finish=6.91
470972) worker cpu=3 start=6.51 finish=6.91
470973) worker cpu=4 start=6.51 finish=6.91
470974) worker cpu=8 start=6.51 finish=6.91
470975) worker cpu=15 start=6.51 finish=6.91
470976) worker cpu=10 start=6.51 finish=6.91
470977) worker cpu=13 start=6.51 finish=6.91
470978) worker cpu=14 start=6.51 finish=6.91
470979) worker cpu=12 start=6.51 finish=6.91
470980) worker cpu=9 start=6.51 finish=6.91
470981) gegl cpu=4 start=6.53 finish=6.53
470982) gegl cpu=0 start=6.53 finish=6.53
470983) gegl cpu=5 start=6.53 finish=6.53
470984) gegl cpu=3 start=6.53 finish=6.53
470985) gegl cpu=15 start=6.53 finish=6.53
470986) gegl cpu=1 start=6.53 finish=6.53
470987) gegl cpu=10 start=6.53 finish=6.53
470988) gegl cpu=14 start=6.53 finish=6.53
470989) gegl cpu=12 start=6.53 finish=6.53
470990) gegl cpu=13 start=6.53 finish=6.53
470991) gegl cpu=8 start=6.53 finish=6.53
470992) gegl cpu=9 start=6.53 finish=6.53
470993) gegl cpu=11 start=6.53 finish=6.53
470994) gegl cpu=6 start=6.53 finish=6.53
470995) gegl cpu=7 start=6.53 finish=6.53
470996) gegl cpu=1 start=6.60 finish=6.91
470997) gegl cpu=12 start=6.60 finish=6.91
470998) gegl cpu=13 start=6.60 finish=6.91
470999) gegl cpu=3 start=6.60 finish=6.91
471000) gegl cpu=6 start=6.60 finish=6.91
471001) gegl cpu=7 start=6.60 finish=6.91
471002) gegl cpu=2 start=6.60 finish=6.91
471003) gegl cpu=8 start=6.60 finish=6.91
471004) gegl cpu=9 start=6.60 finish=6.91
471005) gegl cpu=4 start=6.60 finish=6.91
471006) gegl cpu=5 start=6.60 finish=6.91
471007) gegl cpu=11 start=6.60 finish=6.91
471008) gegl cpu=15 start=6.60 finish=6.91
471009) gegl cpu=14 start=6.60 finish=6.91
471010) gegl cpu=10 start=6.60 finish=6.91
471011) [pango] FcInit cpu=-1 start=6.64 finish=6.66
471012) gegl cpu=0 start=6.66 finish=6.66
471013) gegl cpu=0 start=6.66 finish=6.66
471014) gegl cpu=0 start=6.66 finish=6.66
471015) gegl cpu=9 start=6.66 finish=6.66
471016) gmain cpu=1 start=6.66 finish=6.91
471017) gdbus cpu=2 start=6.66 finish=6.91
471018) swap writer cpu=-1 start=6.67 finish=6.91
