Performance of blender open source modeling. This one has a longer runtime, so tested the CPU versions of both barbershop and BMW27. You can see the slightly different profiles below though the overall ordering is similar.

AMD metrics show a relatively high branch misprediction rate. The code has a fair amount of floating point and a medium level of branches.

elapsed              4301.794
on_cpu               0.988          # 15.81 / 16 cores
utime                67988.943
stime                40.657
nvcsw                93218          # 14.22%
nivcsw               562276         # 85.78%
inblock              23304          # 5.42/sec
onblock              14896          # 3.46/sec
cpu-clock            68033985522749 # 68033.986 seconds
task-clock           68034524981502 # 68034.525 seconds
page faults          15706060       # 230.854/sec
context switches     676776         # 9.948/sec
cpu migrations       7545           # 0.111/sec
major page faults    188            # 0.003/sec
minor page faults    15705872       # 230.851/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             35293093083193 # 110.727 branches per 1000 inst
branch misses        774791487242   # 2.20% branch miss
conditional          23242796360354 # 72.921 conditional branches per 1000 inst
indirect             3186917069241  # 9.999 indirect branches per 1000 inst
cpu-cycles           277679476584227 # 4.03 GHz
instructions         318780360725662 # 1.15 IPC
slots                555266778052170 #
retiring             112176312261701 # 20.2% (27.1%)
-- ucode             590561413008   #     0.1%
-- fastpath          111585750848693 #    20.1%
frontend             110144593394679 # 19.8% (26.6%)
-- latency           73569126550464 #    13.2%
-- bandwidth         36575466844215 #     6.6%
backend              175308374563980 # 31.6% (42.3%)
-- cpu               68402426527667 #    12.3%
-- memory            106905948036313 #    19.3%
speculation          16699046042329 #  3.0% ( 4.0%)
-- branch mispredict 16262790880833 #     2.9%
-- pipeline restart  436255161496   #     0.1%
smt-contention       140938216831180 # 25.4% ( 0.0%)
cpu-cycles           278774223790977 # 4.03 GHz
instructions         318786777948843 # 1.14 IPC
instructions         106255663700201 # 63.306 l2 access per 1000 inst
l2 hit from l1       6330267339905  # 8.18% l2 miss
l2 miss from l1      354154464303   #
l2 hit from l2 pf    200240252262   #
l3 hit from l2 pf    144019200939   #
l3 miss from l2 pf   52074843805    #
instructions         106229525090169 # 350.889 float per 1000 inst
float 512            47             # 0.000 AVX-512 per 1000 inst
float 256            1188           # 0.000 AVX-256 per 1000 inst
float 128            37274753429769 # 350.889 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         188            # 0.000 scalar per 1000 inst

The corresponding Intel metrics show an even higher level of branch misprediction.

elapsed              6447.702
on_cpu               0.989          # 15.83 / 16 cores
utime                102048.975
stime                30.174
nvcsw                86509          # 12.32%
nivcsw               615573         # 87.68%
inblock              81288          # 12.61/sec
onblock              15128          # 2.35/sec
cpu-clock            102082685570860 # 102082.686 seconds
task-clock           102083155121057 # 102083.155 seconds
page faults          15755958       # 154.344/sec
context switches     734082         # 7.191/sec
cpu migrations       20384          # 0.200/sec
major page faults    1398           # 0.014/sec
minor page faults    15754560       # 154.331/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             35287152996927 # 110.702 branches per 1000 inst
branch misses        1227056009226  # 3.48% branch miss
conditional          35287153019519 # 110.702 conditional branches per 1000 inst
indirect             10658847038465 # 33.439 indirect branches per 1000 inst
slots                450860864981234 #
retiring             184934949459959 # 41.0% (41.0%)
-- ucode             14363091429244 #     3.2%
-- fastpath          170571858030715 #    37.8%
frontend             133446042742152 # 29.6% (29.6%)
-- latency           70249860801628 #    15.6%
-- bandwidth         63196181940524 #    14.0%
backend              69647195865492 # 15.4% (15.4%)
-- cpu               32408832374285 #     7.2%
-- memory            37238363491207 #     8.3%
speculation          59237600057642 # 13.1% (13.1%)
-- branch mispredict 58474019541106 #    13.0%
-- pipeline restart  763580516536   #     0.2%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           289398136086046 # 2.79 GHz
instructions         342048750860200 # 1.18 IPC
l2 access            11054117043179 # 62.596 l2 access per 1000 inst
l2 miss              1604086759113  # 14.51% l2 miss

We had a crash towards end of the process tree, but overall both blender and jemalloc_bg_thd take most of the time

583 processes
	283 blender              2261250.75  1204.25
	 20 jemalloc_bg_thd      190773.20   104.68
	 19 vulkaninfo               0.38     0.57
	  2 vulkani:disk$0           0.04     0.06
	  6 clang                    0.03     0.04
	  1 llvmpipe-0               0.02     0.03
	  1 llvmpipe-1               0.02     0.03
	  1 llvmpipe-10              0.02     0.03
	  1 llvmpipe-11              0.02     0.03
	  1 llvmpipe-12              0.02     0.03
	  1 llvmpipe-13              0.02     0.03
	  1 llvmpipe-14              0.02     0.03
	  1 llvmpipe-15              0.02     0.03
	  1 llvmpipe-2               0.02     0.03
	  1 llvmpipe-3               0.02     0.03
	  1 llvmpipe-4               0.02     0.03
	  1 llvmpipe-5               0.02     0.03
	  1 llvmpipe-6               0.02     0.03
	  1 llvmpipe-7               0.02     0.03
	  1 llvmpipe-8               0.02     0.03
	  1 llvmpipe-9               0.02     0.03
	 72 sh                       0.00     0.00
	 12 gcc                      0.00     0.00
	  9 stty                     0.00     0.00
	  8 systemd-detect-          0.00     0.00
	  7 gsettings                0.00     0.00
	  7 stat                     0.00     0.00
	  6 llvm-link                0.00     0.00
	  6 xdg-user-dir             0.00     0.00
	  5 gmain                    0.00     0.00
	  5 rm                       0.00     0.00
	  4 phoronix-test-s          0.00     0.00
	  3 dconf worker             0.00     0.00
	  3 glxinfo                  0.00     0.00
	  2 which                    0.00     0.00
	  1 date                     0.00     0.00
	  1 dirname                  0.00     0.00
	  1 grep                     0.00     0.00
	  1 ifconfig                 0.00     0.00
	  1 ip                       0.00     0.00
	  1 lscpu                    0.00     0.00
	  1 mktemp                   0.00     0.00
	  1 ps                       0.00     0.00
	  1 readlink                 0.00     0.00
	  1 realpath                 0.00     0.00
	  1 sed                      0.00     0.00
	  1 setterm                  0.00     0.00
	  1 sort                     0.00     0.00
	  1 systemctl                0.00     0.00
	  1 template.sh              0.00     0.00
	  1 wc                       0.00     0.00

The core parts of the computation have a repeating pattern with several processes started, one per core.

      561138) blender start=285.39 finish=421.43
        561139) blender start=285.39 finish=421.41
          561140) jemalloc_bg_thd start=285.44 finish=421.41
            561162) jemalloc_bg_thd start=285.58 finish=421.41
            561166) jemalloc_bg_thd start=285.58 finish=421.41
            561171) jemalloc_bg_thd start=285.58 finish=421.41
          561141) blender start=285.46 finish=421.39
          561142) blender start=285.46 finish=421.39
          561143) blender start=285.46 finish=421.39
          561144) blender start=285.46 finish=421.39
          561145) blender start=285.46 finish=421.39
          561146) blender start=285.46 finish=421.39
          561147) blender start=285.46 finish=421.39
          561148) blender start=285.46 finish=421.39
          561149) blender start=285.46 finish=421.39
          561150) blender start=285.46 finish=421.39
          561151) blender start=285.46 finish=421.39
          561152) blender start=285.46 finish=421.39
          561153) blender start=285.46 finish=421.39
          561154) blender start=285.46 finish=421.39
          561155) blender start=285.46 finish=421.39
          561156) blender start=285.46 finish=421.39
          561157) sh start=285.48 finish=285.48
            561158) xdg-user-dir start=285.48 finish=285.48
          561159) blender start=285.58 finish=421.41
            561161) blender start=285.58 finish=421.41
              561165) blender start=285.58 finish=421.41
              561170) blender start=285.58 finish=421.41
                561175) blender start=285.60 finish=421.41
                561176) blender start=285.60 finish=421.41
            561164) blender start=285.58 finish=421.41
              561169) ?? start=285.58 finish=0.00 
                561174) blender start=285.60 finish=421.41
              561173) blender start=285.58 finish=421.41
          561160) blender start=285.58 finish=421.41
            561163) blender start=285.58 finish=421.41
              561168) blender start=285.58 finish=421.41
              561172) blender start=285.58 finish=421.41
            561167) blender start=285.58 finish=421.41
              561178) blender start=285.81 finish=421.40
              561179) blender start=285.81 finish=421.40
              561180) blender start=285.81 finish=421.40
              561181) blender start=285.81 finish=421.40
              561182) blender start=285.81 finish=421.40
              561183) blender start=285.81 finish=421.40
              561184) blender start=285.81 finish=421.40
              561185) blender start=285.82 finish=421.40
              561186) blender start=285.82 finish=421.40
              561187) blender start=285.82 finish=421.40
              561188) blender start=285.82 finish=421.40
              561189) blender start=285.82 finish=421.40
              561190) blender start=285.82 finish=421.40
              561191) blender start=285.82 finish=421.40
              561192) blender start=285.82 finish=421.40
          561177) blender start=285.77 finish=421.12
          561195) blender start=421.13 finish=421.37
          561196) blender start=421.13 finish=421.37
          561197) blender start=421.13 finish=421.37
          561198) blender start=421.13 finish=421.37
          561199) blender start=421.13 finish=421.37
          561200) blender start=421.13 finish=421.37
          561201) blender start=421.13 finish=421.37
          561202) blender start=421.13 finish=421.37
          561203) blender start=421.13 finish=421.37
          561204) blender start=421.13 finish=421.37
          561205) blender start=421.13 finish=421.37
          561206) blender start=421.13 finish=421.37
          561207) blender start=421.13 finish=421.37
          561208) blender start=421.13 finish=421.37
          561209) blender start=421.13 finish=421.37
          561210) blender start=421.13 finish=421.37
        561211) rm start=421.43 finish=421.43
      561212) sh start=421.44 finish=421.44
        561213) sh start=421.44 finish=421.44