Performance of blender open source modeling. This one has a longer runtime, so tested the CPU versions of both barbershop and BMW27. You can see the slightly different profiles below though the overall ordering is similar.

AMD metrics show a relatively high branch misprediction rate. The code has a fair amount of floating point and a medium level of branches.
elapsed 4301.794
on_cpu 0.988 # 15.81 / 16 cores
utime 67988.943
stime 40.657
nvcsw 93218 # 14.22%
nivcsw 562276 # 85.78%
inblock 23304 # 5.42/sec
onblock 14896 # 3.46/sec
cpu-clock 68033985522749 # 68033.986 seconds
task-clock 68034524981502 # 68034.525 seconds
page faults 15706060 # 230.854/sec
context switches 676776 # 9.948/sec
cpu migrations 7545 # 0.111/sec
major page faults 188 # 0.003/sec
minor page faults 15705872 # 230.851/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 35293093083193 # 110.727 branches per 1000 inst
branch misses 774791487242 # 2.20% branch miss
conditional 23242796360354 # 72.921 conditional branches per 1000 inst
indirect 3186917069241 # 9.999 indirect branches per 1000 inst
cpu-cycles 277679476584227 # 4.03 GHz
instructions 318780360725662 # 1.15 IPC
slots 555266778052170 #
retiring 112176312261701 # 20.2% (27.1%)
-- ucode 590561413008 # 0.1%
-- fastpath 111585750848693 # 20.1%
frontend 110144593394679 # 19.8% (26.6%)
-- latency 73569126550464 # 13.2%
-- bandwidth 36575466844215 # 6.6%
backend 175308374563980 # 31.6% (42.3%)
-- cpu 68402426527667 # 12.3%
-- memory 106905948036313 # 19.3%
speculation 16699046042329 # 3.0% ( 4.0%)
-- branch mispredict 16262790880833 # 2.9%
-- pipeline restart 436255161496 # 0.1%
smt-contention 140938216831180 # 25.4% ( 0.0%)
cpu-cycles 278774223790977 # 4.03 GHz
instructions 318786777948843 # 1.14 IPC
instructions 106255663700201 # 63.306 l2 access per 1000 inst
l2 hit from l1 6330267339905 # 8.18% l2 miss
l2 miss from l1 354154464303 #
l2 hit from l2 pf 200240252262 #
l3 hit from l2 pf 144019200939 #
l3 miss from l2 pf 52074843805 #
instructions 106229525090169 # 350.889 float per 1000 inst
float 512 47 # 0.000 AVX-512 per 1000 inst
float 256 1188 # 0.000 AVX-256 per 1000 inst
float 128 37274753429769 # 350.889 AVX-128 per 1000 inst
float MMX 0 # 0.000 MMX per 1000 inst
float scalar 188 # 0.000 scalar per 1000 inst
The corresponding Intel metrics show an even higher level of branch misprediction.
elapsed 6447.702
on_cpu 0.989 # 15.83 / 16 cores
utime 102048.975
stime 30.174
nvcsw 86509 # 12.32%
nivcsw 615573 # 87.68%
inblock 81288 # 12.61/sec
onblock 15128 # 2.35/sec
cpu-clock 102082685570860 # 102082.686 seconds
task-clock 102083155121057 # 102083.155 seconds
page faults 15755958 # 154.344/sec
context switches 734082 # 7.191/sec
cpu migrations 20384 # 0.200/sec
major page faults 1398 # 0.014/sec
minor page faults 15754560 # 154.331/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 35287152996927 # 110.702 branches per 1000 inst
branch misses 1227056009226 # 3.48% branch miss
conditional 35287153019519 # 110.702 conditional branches per 1000 inst
indirect 10658847038465 # 33.439 indirect branches per 1000 inst
slots 450860864981234 #
retiring 184934949459959 # 41.0% (41.0%)
-- ucode 14363091429244 # 3.2%
-- fastpath 170571858030715 # 37.8%
frontend 133446042742152 # 29.6% (29.6%)
-- latency 70249860801628 # 15.6%
-- bandwidth 63196181940524 # 14.0%
backend 69647195865492 # 15.4% (15.4%)
-- cpu 32408832374285 # 7.2%
-- memory 37238363491207 # 8.3%
speculation 59237600057642 # 13.1% (13.1%)
-- branch mispredict 58474019541106 # 13.0%
-- pipeline restart 763580516536 # 0.2%
smt-contention 0 # 0.0% ( 0.0%)
cpu-cycles 289398136086046 # 2.79 GHz
instructions 342048750860200 # 1.18 IPC
l2 access 11054117043179 # 62.596 l2 access per 1000 inst
l2 miss 1604086759113 # 14.51% l2 miss
We had a crash towards end of the process tree, but overall both blender and jemalloc_bg_thd take most of the time
583 processes
283 blender 2261250.75 1204.25
20 jemalloc_bg_thd 190773.20 104.68
19 vulkaninfo 0.38 0.57
2 vulkani:disk$0 0.04 0.06
6 clang 0.03 0.04
1 llvmpipe-0 0.02 0.03
1 llvmpipe-1 0.02 0.03
1 llvmpipe-10 0.02 0.03
1 llvmpipe-11 0.02 0.03
1 llvmpipe-12 0.02 0.03
1 llvmpipe-13 0.02 0.03
1 llvmpipe-14 0.02 0.03
1 llvmpipe-15 0.02 0.03
1 llvmpipe-2 0.02 0.03
1 llvmpipe-3 0.02 0.03
1 llvmpipe-4 0.02 0.03
1 llvmpipe-5 0.02 0.03
1 llvmpipe-6 0.02 0.03
1 llvmpipe-7 0.02 0.03
1 llvmpipe-8 0.02 0.03
1 llvmpipe-9 0.02 0.03
72 sh 0.00 0.00
12 gcc 0.00 0.00
9 stty 0.00 0.00
8 systemd-detect- 0.00 0.00
7 gsettings 0.00 0.00
7 stat 0.00 0.00
6 llvm-link 0.00 0.00
6 xdg-user-dir 0.00 0.00
5 gmain 0.00 0.00
5 rm 0.00 0.00
4 phoronix-test-s 0.00 0.00
3 dconf worker 0.00 0.00
3 glxinfo 0.00 0.00
2 which 0.00 0.00
1 date 0.00 0.00
1 dirname 0.00 0.00
1 grep 0.00 0.00
1 ifconfig 0.00 0.00
1 ip 0.00 0.00
1 lscpu 0.00 0.00
1 mktemp 0.00 0.00
1 ps 0.00 0.00
1 readlink 0.00 0.00
1 realpath 0.00 0.00
1 sed 0.00 0.00
1 setterm 0.00 0.00
1 sort 0.00 0.00
1 systemctl 0.00 0.00
1 template.sh 0.00 0.00
1 wc 0.00 0.00
The core parts of the computation have a repeating pattern with several processes started, one per core.
561138) blender start=285.39 finish=421.43
561139) blender start=285.39 finish=421.41
561140) jemalloc_bg_thd start=285.44 finish=421.41
561162) jemalloc_bg_thd start=285.58 finish=421.41
561166) jemalloc_bg_thd start=285.58 finish=421.41
561171) jemalloc_bg_thd start=285.58 finish=421.41
561141) blender start=285.46 finish=421.39
561142) blender start=285.46 finish=421.39
561143) blender start=285.46 finish=421.39
561144) blender start=285.46 finish=421.39
561145) blender start=285.46 finish=421.39
561146) blender start=285.46 finish=421.39
561147) blender start=285.46 finish=421.39
561148) blender start=285.46 finish=421.39
561149) blender start=285.46 finish=421.39
561150) blender start=285.46 finish=421.39
561151) blender start=285.46 finish=421.39
561152) blender start=285.46 finish=421.39
561153) blender start=285.46 finish=421.39
561154) blender start=285.46 finish=421.39
561155) blender start=285.46 finish=421.39
561156) blender start=285.46 finish=421.39
561157) sh start=285.48 finish=285.48
561158) xdg-user-dir start=285.48 finish=285.48
561159) blender start=285.58 finish=421.41
561161) blender start=285.58 finish=421.41
561165) blender start=285.58 finish=421.41
561170) blender start=285.58 finish=421.41
561175) blender start=285.60 finish=421.41
561176) blender start=285.60 finish=421.41
561164) blender start=285.58 finish=421.41
561169) ?? start=285.58 finish=0.00
561174) blender start=285.60 finish=421.41
561173) blender start=285.58 finish=421.41
561160) blender start=285.58 finish=421.41
561163) blender start=285.58 finish=421.41
561168) blender start=285.58 finish=421.41
561172) blender start=285.58 finish=421.41
561167) blender start=285.58 finish=421.41
561178) blender start=285.81 finish=421.40
561179) blender start=285.81 finish=421.40
561180) blender start=285.81 finish=421.40
561181) blender start=285.81 finish=421.40
561182) blender start=285.81 finish=421.40
561183) blender start=285.81 finish=421.40
561184) blender start=285.81 finish=421.40
561185) blender start=285.82 finish=421.40
561186) blender start=285.82 finish=421.40
561187) blender start=285.82 finish=421.40
561188) blender start=285.82 finish=421.40
561189) blender start=285.82 finish=421.40
561190) blender start=285.82 finish=421.40
561191) blender start=285.82 finish=421.40
561192) blender start=285.82 finish=421.40
561177) blender start=285.77 finish=421.12
561195) blender start=421.13 finish=421.37
561196) blender start=421.13 finish=421.37
561197) blender start=421.13 finish=421.37
561198) blender start=421.13 finish=421.37
561199) blender start=421.13 finish=421.37
561200) blender start=421.13 finish=421.37
561201) blender start=421.13 finish=421.37
561202) blender start=421.13 finish=421.37
561203) blender start=421.13 finish=421.37
561204) blender start=421.13 finish=421.37
561205) blender start=421.13 finish=421.37
561206) blender start=421.13 finish=421.37
561207) blender start=421.13 finish=421.37
561208) blender start=421.13 finish=421.37
561209) blender start=421.13 finish=421.37
561210) blender start=421.13 finish=421.37
561211) rm start=421.43 finish=421.43
561212) sh start=421.44 finish=421.44
561213) sh start=421.44 finish=421.44
