Embree is set of ray tracing kernels. In the test below, I run three workloads showing slight differences between the first and the other workloads. The workloads are listed as AVX-512 capable, but I don’t see those instructions in my trace. Most likely because I am using default compilation rather than -march=native. An experiment to try in the future. It is floating point intensive code with most time spent in backend memory operations. Branch misprediction is also higher than average.

AMD metrics show a relatively low IPC and many backend misses. The floating point code is predominantly AVX-128 bit.

elapsed              589.246
on_cpu               0.871          # 13.93 / 16 cores
utime                8196.986
stime                11.873
nvcsw                93191          # 41.33%
nivcsw               132272         # 58.67%
inblock              2348544        # 3985.68/sec
onblock              1648           # 2.80/sec
cpu-clock            8210676277740  # 8210.676 seconds
task-clock           8210826881166  # 8210.827 seconds
page faults          2579835        # 314.199/sec
context switches     228219         # 27.795/sec
cpu migrations       657            # 0.080/sec
major page faults    111            # 0.014/sec
minor page faults    2579724        # 314.186/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             1362232904120  # 80.306 branches per 1000 inst
branch misses        184544546458   # 13.55% branch miss
conditional          1140352434370  # 67.226 conditional branches per 1000 inst
indirect             12415374703    # 0.732 indirect branches per 1000 inst
cpu-cycles           34845974472315 # 3.69 GHz
instructions         16962277518169 # 0.49 IPC
slots                69674090415768 #
retiring             7581529448521  # 10.9% (13.5%)
-- ucode             24936273893    #     0.0%
-- fastpath          7556593174628  #    10.8%
frontend             7598757925485  # 10.9% (13.5%)
-- latency           6124859496648  #     8.8%
-- bandwidth         1473898428837  #     2.1%
backend              37121990917073 # 53.3% (65.9%)
-- cpu               9400871625693  #    13.5%
-- memory            27721119291380 #    39.8%
speculation          3987229045557  #  5.7% ( 7.1%)
-- branch mispredict 3967073762620  #     5.7%
-- pipeline restart  20155282937    #     0.0%
smt-contention       13384540352281 # 19.2% ( 0.0%)
cpu-cycles           34862170519282 # 3.68 GHz
instructions         16966704924565 # 0.49 IPC
instructions         5654335743036  # 98.697 l2 access per 1000 inst
l2 hit from l1       440791291703   # 25.36% l2 miss
l2 miss from l1      93098250807    #
l2 hit from l2 pf    68873331326    #
l3 hit from l2 pf    20579962754    #
l3 miss from l2 pf   27819968309    #
instructions         5652446118217  # 296.440 float per 1000 inst
float 512            91             # 0.000 AVX-512 per 1000 inst
float 256            2921024696     # 0.517 AVX-256 per 1000 inst
float 128            1672690446523  # 295.923 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         0              # 0.000 scalar per 1000 inst

Intel metrics including a large percentage of time with branch misses.

elapsed              1463.161
on_cpu               0.904          # 14.46 / 16 cores
utime                21137.635
stime                18.313
nvcsw                368397         # 55.12%
nivcsw               299967         # 44.88%
inblock              0              # 0.00/sec
onblock              1912           # 1.31/sec
cpu-clock            21156464346568 # 21156.464 seconds
task-clock           21156834367225 # 21156.834 seconds
page faults          3295364        # 155.759/sec
context switches     675478         # 31.927/sec
cpu migrations       70820          # 3.347/sec
major page faults    0              # 0.000/sec
minor page faults    3295364        # 155.759/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             3008212425185  # 71.821 branches per 1000 inst
branch misses        441975034062   # 14.69% branch miss
conditional          3008212445665  # 71.821 conditional branches per 1000 inst
indirect             610431103988   # 14.574 indirect branches per 1000 inst
slots                52138940679776 #
retiring             13804850785894 # 26.5% (26.5%)
-- ucode             1354930255474  #     2.6%
-- fastpath          12449920530420 #    23.9%
frontend             8695611248482  # 16.7% (16.7%)
-- latency           5828517798001  #    11.2%
-- bandwidth         2867093450481  #     5.5%
backend              17257445503008 # 33.1% (33.1%)
-- cpu               5308618896422  #    10.2%
-- memory            11948826606586 #    22.9%
speculation          11980383511891 # 23.0% (23.0%)
-- branch mispredict 11929015562360 #    22.9%
-- pipeline restart  51367949531    #     0.1%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           33660387719224 # 2.61 GHz
instructions         26349322565350 # 0.78 IPC
l2 access            824384239858   # 61.635 l2 access per 1000 inst
l2 miss              320741599938   # 38.91% l2 miss

Process time is spent in embree_pathtrac program

504 processes
	135 embree_pathtrac      123173.29   153.69
	 64 clinfo                  11.20     2.88
	 38 vulkaninfo               0.75     1.32
	  6 glxinfo:gdrv0            0.17     0.03
	  6 php                      0.08     0.18
	  4 vulkani:disk$0           0.08     0.14
	  2 glxinfo                  0.07     0.01
	  2 glxinfo:cs0              0.07     0.01
	  2 glxinfo:disk$0           0.07     0.01
	  2 glxinfo:sh0              0.07     0.01
	  2 glxinfo:shlo0            0.07     0.01
	  2 llvmpipe-0               0.04     0.07
	  2 llvmpipe-1               0.04     0.07
	  2 llvmpipe-10              0.04     0.07
	  2 llvmpipe-11              0.04     0.07
	  2 llvmpipe-12              0.04     0.07
	  2 llvmpipe-13              0.04     0.07
	  2 llvmpipe-14              0.04     0.07
	  2 llvmpipe-15              0.04     0.07
	  2 llvmpipe-2               0.04     0.07
	  2 llvmpipe-3               0.04     0.07
	  2 llvmpipe-4               0.04     0.07
	  2 llvmpipe-5               0.04     0.07
	  2 llvmpipe-6               0.04     0.07
	  2 llvmpipe-7               0.04     0.07
	  2 llvmpipe-8               0.04     0.07
	  2 llvmpipe-9               0.04     0.07
	  6 clang                    0.03     0.03
	  1 lspci                    0.00     0.03
	 92 sh                       0.00     0.00
	 12 gcc                      0.00     0.00
	 10 gsettings                0.00     0.00
	  9 embree                   0.00     0.00
	  9 stty                     0.00     0.00
	  8 stat                     0.00     0.00
	  8 systemd-detect-          0.00     0.00
	  6 llvm-link                0.00     0.00
	  5 phoronix-test-s          0.00     0.00
	  4 gmain                    0.00     0.00
	  2 dconf worker             0.00     0.00
	  2 lscpu                    0.00     0.00
	  2 uname                    0.00     0.00
	  2 which                    0.00     0.00
	  2 xset                     0.00     0.00
	  1 cc                       0.00     0.00
	  1 date                     0.00     0.00
	  1 dirname                  0.00     0.00
	  1 dmesg                    0.00     0.00
	  1 dmidecode                0.00     0.00
	  1 grep                     0.00     0.00
	  1 ifconfig                 0.00     0.00
	  1 ip                       0.00     0.00
	  1 lsmod                    0.00     0.00
	  1 mktemp                   0.00     0.00
	  1 ps                       0.00     0.00
	  1 qdbus                    0.00     0.00
	  1 readlink                 0.00     0.00
	  1 realpath                 0.00     0.00
	  1 sed                      0.00     0.00
	  1 sort                     0.00     0.00
	  1 systemctl                0.00     0.00
	  1 template.sh              0.00     0.00
	  1 wc                       0.00     0.00
	  1 xrandr                   0.00     0.00
9 processes running
56 maximum processes

Following is the example of the recurring process pattern. Occasionally we miss an “exit” event so that is why 9 were still listed as running above.

      6117) embree start=137.17 finish=198.93
        6118) embree_pathtrac start=137.17 finish=198.92
          6119) embree_pathtrac start=137.37 finish=198.92
            6121) embree_pathtrac start=137.37 finish=198.92
              6125) embree_pathtrac start=137.37 finish=198.92
              6129) embree_pathtrac start=137.37 finish=198.92
            6123) embree_pathtrac start=137.37 finish=198.92
              6130) embree_pathtrac start=137.37 finish=198.92
              6131) embree_pathtrac start=137.37 finish=198.92
          6120) embree_pathtrac start=137.37 finish=198.92
            6122) embree_pathtrac start=137.37 finish=198.92
              6124) embree_pathtrac start=137.37 finish=198.92
                6128) embree_pathtrac start=137.37 finish=198.92
                6132) embree_pathtrac start=137.37 finish=198.92
              6126) embree_pathtrac start=137.37 finish=198.92
                6133) embree_pathtrac start=137.37 finish=198.92
            6127) embree_pathtrac start=137.37 finish=198.92