Test of a ray tracing engine. This test does the core engine and ospray-studio does the application around the engine. There are four different workloads with slightly different characteristics. Both backend CPU and backend memory contribute,.

AMD metrics show a heavily floating point code with both cpu and memory holding up backend operations. Slightly below average branch misses.

elapsed              1971.476
on_cpu               0.863          # 13.80 / 16 cores
utime                27152.145
stime                61.060
nvcsw                1340682        # 75.51%
nivcsw               434716         # 24.49%
inblock              2744           # 1.39/sec
onblock              2312           # 1.17/sec
cpu-clock            27206812961906 # 27206.813 seconds
task-clock           27208381390391 # 27208.381 seconds
page faults          8487526        # 311.945/sec
context switches     1785005        # 65.605/sec
cpu migrations       9640           # 0.354/sec
major page faults    187            # 0.007/sec
minor page faults    8487330        # 311.938/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             5426414056343  # 84.375 branches per 1000 inst
branch misses        86955920293    # 1.60% branch miss
conditional          3427687159291  # 53.297 conditional branches per 1000 inst
indirect             467592274081   # 7.271 indirect branches per 1000 inst
cpu-cycles           115746926713087 # 3.68 GHz
instructions         64192430504369 # 0.55 IPC
slots                231486951377550 #
retiring             51778023564919 # 22.4% (26.0%)
-- ucode             681253365950   #     0.3%
-- fastpath          51096770198969 #    22.1%
frontend             55786361849112 # 24.1% (28.0%)
-- latency           50690132046288 #    21.9%
-- bandwidth         5096229802824  #     2.2%
backend              88277200554514 # 38.1% (44.3%)
-- cpu               44531804002903 #    19.2%
-- memory            43745396551611 #    18.9%
speculation          3571024998154  #  1.5% ( 1.8%)
-- branch mispredict 2245783755052  #     1.0%
-- pipeline restart  1325241243102  #     0.6%
smt-contention       32073952477589 # 13.9% ( 0.0%)
cpu-cycles           115958248469394 # 3.67 GHz
instructions         64226418211743 # 0.55 IPC
instructions         21406027497349 # 33.924 l2 access per 1000 inst
l2 hit from l1       618447294645   # 6.06% l2 miss
l2 miss from l1      25831918168    #
l2 hit from l2 pf    89533704038    #
l3 hit from l2 pf    12514580391    #
l3 miss from l2 pf   5675181497     #
instructions         21400025702796 # 529.504 float per 1000 inst
float 512            63             # 0.000 AVX-512 per 1000 inst
float 256            186015033837   # 8.692 AVX-256 per 1000 inst
float 128            11145392088233 # 520.812 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         0              # 0.000 scalar per 1000 inst

Intel metrics

elapsed              2026.796
on_cpu               0.834          # 13.34 / 16 cores
utime                26992.039
stime                55.101
nvcsw                1612258        # 76.84%
nivcsw               485818         # 23.16%
inblock              28752          # 14.19/sec
onblock              2304           # 1.14/sec
cpu-clock            27025830249018 # 27025.830 seconds
task-clock           27027955699446 # 27027.956 seconds
page faults          7666166        # 283.638/sec
context switches     2107974        # 77.992/sec
cpu migrations       178019         # 6.586/sec
major page faults    332            # 0.012/sec
minor page faults    7665834        # 283.626/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             6452668168556  # 74.029 branches per 1000 inst
branch misses        107582308263   # 1.67% branch miss
conditional          6452668190860  # 74.029 conditional branches per 1000 inst
indirect             1379306396813  # 15.824 indirect branches per 1000 inst
slots                133319314983128 #
retiring             68712786666910 # 51.5% (51.5%)
-- ucode             10062728100131 #     7.5%
-- fastpath          58650058566779 #    44.0%
frontend             27729511428938 # 20.8% (20.8%)
-- latency           17199350954800 #    12.9%
-- bandwidth         10530160474138 #     7.9%
backend              28375398569037 # 21.3% (21.3%)
-- cpu               17817810359036 #    13.4%
-- memory            10557588210001 #     7.9%
speculation          8164958670901  #  6.1% ( 6.1%)
-- branch mispredict 7964623713592  #     6.0%
-- pipeline restart  200334957309   #     0.2%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           85119684339755 # 2.63 GHz
instructions         120822853947546 # 1.42 IPC
l2 access            865536962026   # 14.277 l2 access per 1000 inst
l2 miss              150276868086   # 17.36% l2 miss

Process tree information

681 processes
	287 ospBenchmark         433361.72   847.31
	 64 clinfo                  11.52     2.56
	 38 vulkaninfo               0.76     1.33
	  6 php                      0.16     0.25
	  6 glxinfo:gdrv0            0.10     0.10
	  4 vulkani:disk$0           0.08     0.14
	  6 clang                    0.05     0.02
	  2 llvmpipe-0               0.04     0.07
	  2 llvmpipe-1               0.04     0.07
	  2 llvmpipe-10              0.04     0.07
	  2 llvmpipe-11              0.04     0.07
	  2 llvmpipe-12              0.04     0.07
	  2 llvmpipe-13              0.04     0.07
	  2 llvmpipe-14              0.04     0.07
	  2 llvmpipe-15              0.04     0.07
	  2 llvmpipe-2               0.04     0.07
	  2 llvmpipe-3               0.04     0.07
	  2 llvmpipe-4               0.04     0.07
	  2 llvmpipe-5               0.04     0.07
	  2 llvmpipe-6               0.04     0.07
	  2 llvmpipe-7               0.04     0.07
	  2 llvmpipe-8               0.04     0.07
	  2 llvmpipe-9               0.04     0.07
	  2 glxinfo                  0.04     0.04
	  2 glxinfo:cs0              0.04     0.04
	  2 glxinfo:disk$0           0.04     0.04
	  2 glxinfo:sh0              0.04     0.04
	  2 glxinfo:shlo0            0.04     0.04
	  1 lspci                    0.01     0.03
	 98 sh                       0.00     0.00
	 19 sed                      0.00     0.00
	 18 ospray                   0.00     0.00
	 12 gcc                      0.00     0.00
	  9 gsettings                0.00     0.00
	  9 stty                     0.00     0.00
	  8 stat                     0.00     0.00
	  8 systemd-detect-          0.00     0.00
	  6 llvm-link                0.00     0.00
	  5 phoronix-test-s          0.00     0.00
	  4 gmain                    0.00     0.00
	  3 dconf worker             0.00     0.00
	  2 lscpu                    0.00     0.00
	  2 uname                    0.00     0.00
	  2 which                    0.00     0.00
	  2 xset                     0.00     0.00
	  1 cc                       0.00     0.00
	  1 date                     0.00     0.00
	  1 dirname                  0.00     0.00
	  1 dmesg                    0.00     0.00
	  1 dmidecode                0.00     0.00
	  1 grep                     0.00     0.00
	  1 ifconfig                 0.00     0.00
	  1 ip                       0.00     0.00
	  1 lsmod                    0.00     0.00
	  1 mktemp                   0.00     0.00
	  1 ps                       0.00     0.00
	  1 qdbus                    0.00     0.00
	  1 readlink                 0.00     0.00
	  1 realpath                 0.00     0.00
	  1 sort                     0.00     0.00
	  1 systemctl                0.00     0.00
	  1 template.sh              0.00     0.00
	  1 wc                       0.00     0.00
	  1 xrandr                   0.00     0.00
1 processes running
48 maximum processes

The core parts have a certain hierarchy of threads

      437206) ospray           cpu=8 start=126.97 finish=242.90
        437207) ospBenchmark     cpu=1 start=126.97 finish=242.89
          437208) ospBenchmark     cpu=15 start=126.98 finish=242.89
            437210) ospBenchmark     cpu=7 start=126.98 finish=242.89
              437213) ospBenchmark     cpu=11 start=126.98 finish=242.89
              437216) ospBenchmark     cpu=8 start=126.98 finish=242.89
                437222) ospBenchmark     cpu=6 start=126.98 finish=242.89
            437211) ospBenchmark     cpu=2 start=126.98 finish=242.89
              437214) ospBenchmark     cpu=12 start=126.98 finish=242.89
                437215) ospBenchmark     cpu=9 start=126.98 finish=242.89
                437219) ospBenchmark     cpu=10 start=126.98 finish=242.89
              437217) ospBenchmark     cpu=13 start=126.98 finish=242.89
          437209) ospBenchmark     cpu=3 start=126.98 finish=242.89
            437212) ospBenchmark     cpu=5 start=126.98 finish=242.89
              437220) ospBenchmark     cpu=0 start=126.98 finish=242.89
              437221) ospBenchmark     cpu=4 start=126.98 finish=242.89
            437218) ospBenchmark     cpu=14 start=126.98 finish=242.89
        437224) sed              cpu=2 start=242.90 finish=242.90