A photography workflow program. Looks to be mostly single-threaded with small sections of multi-threaded routines. There are eight workloads overall.

Topdown profile shows backend stalls as highest amount but also variation through the workload.

AMD metrics have 2.5 cores on average, a low frontend stall rate and backend memory-bound stalls. This workload has a large amount of floating point code.
elapsed 318.710
on_cpu 0.156 # 2.50 / 16 cores
utime 737.522
stime 58.055
nvcsw 35946 # 62.29%
nivcsw 21758 # 37.71%
inblock 8 # 0.03/sec
onblock 2734304 # 8579.29/sec
cpu-clock 796090077692 # 796.090 seconds
task-clock 796163209420 # 796.163 seconds
page faults 19068517 # 23950.513/sec
context switches 58917 # 74.001/sec
cpu migrations 1547 # 1.943/sec
major page faults 1013 # 1.272/sec
minor page faults 19067504 # 23949.240/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 268134978685 # 79.288 branches per 1000 inst
branch misses 7152723323 # 2.67% branch miss
conditional 200250495977 # 59.215 conditional branches per 1000 inst
indirect 10094382421 # 2.985 indirect branches per 1000 inst
cpu-cycles 3209534826821 # 0.58 GHz
instructions 3434442494493 # 1.07 IPC
slots 6459127457736 #
retiring 1178756730721 # 18.2% (23.8%)
-- ucode 1722488775 # 0.0%
-- fastpath 1177034241946 # 18.2%
frontend 509373878078 # 7.9% (10.3%)
-- latency 331597892352 # 5.1%
-- bandwidth 177775985726 # 2.8%
backend 3179350583579 # 49.2% (64.1%)
-- cpu 1196276616296 # 18.5%
-- memory 1983073967283 # 30.7%
speculation 91606138822 # 1.4% ( 1.8%)
-- branch mispredict 89770227189 # 1.4%
-- pipeline restart 1835911633 # 0.0%
smt-contention 1500034133018 # 23.2% ( 0.0%)
cpu-cycles 3004312300770 # 0.75 GHz
instructions 3172733040890 # 1.06 IPC
instructions 1057887047426 # 35.418 l2 access per 1000 inst
l2 hit from l1 21896760191 # 15.47% l2 miss
l2 miss from l1 2039049679 #
l2 hit from l2 pf 11815001082 #
l3 hit from l2 pf 1765059884 #
l3 miss from l2 pf 1991532654 #
instructions 1058376511518 # 540.731 float per 1000 inst
float 512 117 # 0.000 AVX-512 per 1000 inst
float 256 1006 # 0.000 AVX-256 per 1000 inst
float 128 572297164157 # 540.731 AVX-128 per 1000 inst
float MMX 0 # 0.000 MMX per 1000 inst
float scalar 90 # 0.000 scalar per 1000 inst
Intel metrics
elapsed 312.031
on_cpu 0.252 # 4.03 / 16 cores
utime 1208.602
stime 47.772
nvcsw 30863 # 59.03%
nivcsw 21420 # 40.97%
inblock 221888 # 711.11/sec
onblock 2140232 # 6859.03/sec
cpu-clock 1256750837194 # 1256.751 seconds
task-clock 1256786792414 # 1256.787 seconds
page faults 22902743 # 18223.252/sec
context switches 53430 # 42.513/sec
cpu migrations 3496 # 2.782/sec
major page faults 1326 # 1.055/sec
minor page faults 22901417 # 18222.197/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 376573690433 # 67.205 branches per 1000 inst
branch misses 3641086802 # 0.97% branch miss
conditional 376573750433 # 67.205 conditional branches per 1000 inst
indirect 111927884967 # 19.975 indirect branches per 1000 inst
slots 6764155658858 #
retiring 2794454922645 # 41.3% (41.3%)
-- ucode 122503399321 # 1.8%
-- fastpath 2671951523324 # 39.5%
frontend 1042775302935 # 15.4% (15.4%)
-- latency 870858587139 # 12.9%
-- bandwidth 171916715796 # 2.5%
backend 2616276822011 # 38.7% (38.7%)
-- cpu 1470293134285 # 21.7%
-- memory 1145983687726 # 16.9%
speculation 337831189831 # 5.0% ( 5.0%)
-- branch mispredict 300555284759 # 4.4%
-- pipeline restart 37275905072 # 0.6%
smt-contention 0 # 0.0% ( 0.0%)
cpu-cycles 3048556218080 # 0.76 GHz
instructions 4131891600459 # 1.36 IPC
l2 access 63172361667 # 22.699 l2 access per 1000 inst
l2 miss 16933081355 # 26.80% l2 miss
Process overview shows things being exercised from darktable-cli and three processes for primary operations. In this and other profiles, interesting to see “lua thread” have a large amount of system time and no user time.
2083 processes
1253 darktable-cli 22637.81 1777.15
37 gdbus 751.43 58.06
37 pool-darktable- 751.43 58.06
40 gmain 751.42 58.06
238 clinfo 63.17 22.21
38 vulkaninfo 0.94 1.33
6 php 0.12 0.22
6 glxinfo:gdrv0 0.11 0.07
6 glxinfo:gl0 0.11 0.07
4 vulkani:disk$0 0.10 0.14
2 glxinfo 0.06 0.04
2 llvmpipe-0 0.05 0.07
2 llvmpipe-1 0.05 0.07
2 llvmpipe-10 0.05 0.07
2 llvmpipe-11 0.05 0.07
2 llvmpipe-12 0.05 0.07
2 llvmpipe-13 0.05 0.07
2 llvmpipe-14 0.05 0.07
2 llvmpipe-15 0.05 0.07
2 llvmpipe-2 0.05 0.07
2 llvmpipe-3 0.05 0.07
2 llvmpipe-4 0.05 0.07
2 llvmpipe-5 0.05 0.07
2 llvmpipe-6 0.05 0.07
2 llvmpipe-7 0.05 0.07
2 llvmpipe-8 0.05 0.07
2 llvmpipe-9 0.05 0.07
2 glxinfo:cs0 0.05 0.03
2 glxinfo:disk$0 0.05 0.03
2 glxinfo:sh0 0.05 0.03
2 glxinfo:shlo0 0.05 0.03
3 rocminfo 0.03 0.00
6 clang 0.02 0.06
1 lspci 0.01 0.01
37 lua thread 0.00 751.28
1 ps 0.00 0.01
100 sh 0.00 0.00
37 awk 0.00 0.00
37 darktable 0.00 0.00
37 head 0.00 0.00
37 rm 0.00 0.00
12 gcc 0.00 0.00
12 gsettings 0.00 0.00
8 stat 0.00 0.00
8 systemd-detect- 0.00 0.00
6 llvm-link 0.00 0.00
5 phoronix-test-s 0.00 0.00
2 lscpu 0.00 0.00
2 uname 0.00 0.00
2 which 0.00 0.00
2 xset 0.00 0.00
1 cc 0.00 0.00
1 date 0.00 0.00
1 dconf worker 0.00 0.00
1 dirname 0.00 0.00
1 dmesg 0.00 0.00
1 dmidecode 0.00 0.00
1 grep 0.00 0.00
1 ifconfig 0.00 0.00
1 ip 0.00 0.00
1 lsmod 0.00 0.00
1 mktemp 0.00 0.00
1 qdbus 0.00 0.00
1 readlink 0.00 0.00
1 realpath 0.00 0.00
1 sed 0.00 0.00
1 sort 0.00 0.00
1 stty 0.00 0.00
1 systemctl 0.00 0.00
1 template.sh 0.00 0.00
1 wc 0.00 0.00
1 xrandr 0.00 0.00
0 processes running
47 maximum processes
Example computation block
2442841) darktable cpu=9 start=6.38 finish=13.11
2442842) rm cpu=14 start=6.38 finish=6.39
2442843) darktable-cli cpu=2 start=6.39 finish=13.04
2442844) gmain cpu=6 start=6.43 finish=13.04
2442845) gdbus cpu=8 start=6.43 finish=13.04
2442848) darktable-cli cpu=11 start=6.51 finish=13.04
2442849) darktable-cli cpu=4 start=6.51 finish=6.51
2442850) darktable-cli cpu=7 start=6.51 finish=6.51
2442851) darktable-cli cpu=13 start=6.51 finish=13.03
2442852) lua thread cpu=-1 start=6.97 finish=12.95
2442853) pool-darktable- cpu=14 start=6.97 finish=13.04
2442854) darktable-cli cpu=4 start=7.11 finish=13.04
2442855) darktable-cli cpu=5 start=7.11 finish=13.04
2442856) darktable-cli cpu=7 start=7.11 finish=13.04
2442857) darktable-cli cpu=9 start=7.11 finish=13.04
2442858) darktable-cli cpu=1 start=7.11 finish=13.04
2442859) darktable-cli cpu=3 start=7.11 finish=13.04
2442860) darktable-cli cpu=15 start=7.11 finish=13.04
2442861) darktable-cli cpu=10 start=7.11 finish=13.04
2442862) darktable-cli cpu=12 start=7.11 finish=13.04
2442863) darktable-cli cpu=0 start=7.11 finish=13.04
2442864) darktable-cli cpu=13 start=7.11 finish=13.04
2442865) darktable-cli cpu=8 start=7.11 finish=13.04
2442866) darktable-cli cpu=14 start=7.11 finish=13.04
2442867) darktable-cli cpu=3 start=7.11 finish=13.04
2442868) darktable-cli cpu=13 start=7.11 finish=13.04
2442873) darktable-cli cpu=3 start=13.08 finish=13.10
2442874) head cpu=13 start=13.08 finish=13.10
2442875) awk cpu=0 start=13.08 finish=13.10
