A simple path-tracer/ray-tracer with one workload. Looks like it runs in ~20 seconds per iteration.

Topdown profile shows a fairly high retirement rate with some backend stalls.

AMD metrics confirm a 53% retirement rate and low L2 access and low opcache miss. Looks like small kernels overall.
elapsed 56.575
on_cpu 0.650 # 10.41 / 16 cores
utime 587.145
stime 1.550
nvcsw 2224 # 26.55%
nivcsw 6153 # 73.45%
inblock 0 # 0.00/sec
onblock 38216 # 675.49/sec
cpu-clock 588988945689 # 588.989 seconds
task-clock 588994392600 # 588.994 seconds
page faults 403434 # 684.954/sec
context switches 8442 # 14.333/sec
cpu migrations 285 # 0.484/sec
major page faults 3 # 0.005/sec
minor page faults 403431 # 684.949/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 422033102895 # 95.619 branches per 1000 inst
branch misses 8767528965 # 2.08% branch miss
conditional 390398110502 # 88.452 conditional branches per 1000 inst
indirect 2995398424 # 0.679 indirect branches per 1000 inst
cpu-cycles 2193517209245 # 2.43 GHz
instructions 4412484677562 # 2.01 IPC
slots 4390587834432 #
retiring 1393458662508 # 31.7% (53.4%)
-- ucode 958018754 # 0.0%
-- fastpath 1392500643754 # 31.7%
frontend 351318571484 # 8.0% (13.5%)
-- latency 217424991282 # 5.0%
-- bandwidth 133893580202 # 3.0%
backend 653236524449 # 14.9% (25.0%)
-- cpu 596466322859 # 13.6%
-- memory 56770201590 # 1.3%
speculation 212678142692 # 4.8% ( 8.1%)
-- branch mispredict 212394334311 # 4.8%
-- pipeline restart 283808381 # 0.0%
smt-contention 1779890319064 # 40.5% ( 0.0%)
cpu-cycles 2194363711223 # 2.43 GHz
instructions 4409314620701 # 2.01 IPC
instructions 1472643771017 # 9.090 l2 access per 1000 inst
l2 hit from l1 9590644993 # 0.66% l2 miss
l2 miss from l1 54619276 #
l2 hit from l2 pf 3761503405 #
l3 hit from l2 pf 24597536 #
l3 miss from l2 pf 9534127 #
instructions 1470855146556 # 76.471 float per 1000 inst
float 512 80 # 0.000 AVX-512 per 1000 inst
float 256 610 # 0.000 AVX-256 per 1000 inst
float 128 112477303101 # 76.471 AVX-128 per 1000 inst
float MMX 0 # 0.000 MMX per 1000 inst
float scalar 0 # 0.000 scalar per 1000 inst
instructions 4413386672400 #
opcache 621787364163 # 140.887 opcache per 1000 inst
opcache miss 2223793132 # 0.4% opcache miss rate
l1 dTLB miss 156568437 # 0.035 L1 dTLB per 1000 inst
l2 dTLB miss 9987439 # 0.002 L2 dTLB per 1000 inst
instructions 4413463173734 #
icache 4075752160 # 0.923 icache per 1000 inst
icache miss 859116413 # 21.1% icache miss rate
l1 iTLB miss 67570423 # 0.015 L1 iTLB per 1000 inst
l2 iTLB miss 0 # 0.000 L2 iTLB per 1000 inst
tlb flush 29434 # 0.000 TLB flush per 1000 inst
Intel metrics
elapsed 361.295
on_cpu 0.771 # 12.34 / 16 cores
utime 4456.398
stime 2.584
nvcsw 2851 # 6.15%
nivcsw 43486 # 93.85%
inblock 8 # 0.02/sec
onblock 133024 # 368.19/sec
cpu-clock 4460238106346 # 4460.238 seconds
task-clock 4460254922589 # 4460.255 seconds
page faults 1500897 # 336.505/sec
context switches 47717 # 10.698/sec
cpu migrations 531 # 0.119/sec
major page faults 1 # 0.000/sec
minor page faults 1500896 # 336.505/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 2104213340106 # 95.285 branches per 1000 inst
branch misses 46948434651 # 2.23% branch miss
conditional 2104213383082 # 95.285 conditional branches per 1000 inst
indirect 674160895320 # 30.528 indirect branches per 1000 inst
slots 8682095488862 #
retiring 5295681969594 # 61.0% (61.0%) high
-- ucode 276375704863 # 3.2%
-- fastpath 5019306264731 # 57.8%
frontend 1145825338892 # 13.2% (13.2%)
-- latency 767452449161 # 8.8%
-- bandwidth 378372889731 # 4.4%
backend 762062648350 # 8.8% ( 8.8%) low
-- cpu 262379951747 # 3.0%
-- memory 499682696603 # 5.8%
speculation 1491692962110 # 17.2% (17.2%) high
-- branch mispredict 1488388813726 # 17.1%
-- pipeline restart 3304148384 # 0.0%
smt-contention 0 # 0.0% ( 0.0%)
cpu-cycles 5520830012827 # 1.76 GHz
instructions 10513725252527 # 1.90 IPC
l2 access 4365224473 # 0.602 l2 access per 1000 inst
l2 miss 764411421 # 17.51% l2 miss
cpu-cycles 2855102875288 # 15.1% memory latency
load stalls 431689352949 # 14.9% l1 bound
l1 miss 6670996467 # 0.2% l2 bound
l2 miss 1734383138 # 0.0% l3 bound
l3 miss 729430767 # 0.0% dram bound
store_stalls 430419365 # 0.0% store bound
Process overview with a.out as the name of the benchmark process.
547 processes
147 a.out 12846.28 1.50
68 clinfo 15.88 6.98
6 cc1plus 2.05 0.40
38 vulkaninfo 1.34 1.15
3 lto1-ltrans 0.94 0.05
4 vulkani:disk$0 0.14 0.13
6 glxinfo:gdrv0 0.08 0.10
6 glxinfo:gl0 0.08 0.10
3 ld 0.08 0.02
2 llvmpipe-0 0.07 0.07
2 llvmpipe-1 0.07 0.07
2 llvmpipe-10 0.07 0.07
2 llvmpipe-11 0.07 0.07
2 llvmpipe-12 0.07 0.07
2 llvmpipe-13 0.07 0.07
2 llvmpipe-14 0.07 0.07
2 llvmpipe-15 0.07 0.07
2 llvmpipe-2 0.07 0.07
2 llvmpipe-3 0.07 0.07
2 llvmpipe-4 0.07 0.07
2 llvmpipe-5 0.07 0.07
2 llvmpipe-6 0.07 0.07
2 llvmpipe-7 0.07 0.07
2 llvmpipe-8 0.07 0.07
2 llvmpipe-9 0.07 0.07
7 python3 0.07 0.00
6 php 0.06 0.06
9 as 0.06 0.01
2 glxinfo 0.05 0.04
6 clang 0.04 0.08
2 glxinfo:cs0 0.04 0.04
2 glxinfo:disk$0 0.04 0.04
2 glxinfo:sh0 0.04 0.04
2 glxinfo:shlo0 0.04 0.04
3 rocminfo 0.03 0.00
3 lto1-wpa 0.01 0.01
1 lspci 0.01 0.01
1 ps 0.00 0.01
84 sh 0.00 0.00
13 gcc 0.00 0.00
12 gsettings 0.00 0.00
9 g++ 0.00 0.00
8 stat 0.00 0.00
8 systemd-detect- 0.00 0.00
6 llvm-link 0.00 0.00
5 phoronix-test-s 0.00 0.00
3 collect2 0.00 0.00
3 gmain 0.00 0.00
3 lto-wrapper 0.00 0.00
3 rays1bench 0.00 0.00
2 cc 0.00 0.00
2 lscpu 0.00 0.00
2 uname 0.00 0.00
2 which 0.00 0.00
2 xset 0.00 0.00
1 date 0.00 0.00
1 dconf worker 0.00 0.00
1 dirname 0.00 0.00
1 dmesg 0.00 0.00
1 dmidecode 0.00 0.00
1 grep 0.00 0.00
1 ifconfig 0.00 0.00
1 ip 0.00 0.00
1 lsmod 0.00 0.00
1 mktemp 0.00 0.00
1 python 0.00 0.00
1 qdbus 0.00 0.00
1 readlink 0.00 0.00
1 realpath 0.00 0.00
1 sed 0.00 0.00
1 sort 0.00 0.00
1 stty 0.00 0.00
1 systemctl 0.00 0.00
1 template.sh 0.00 0.00
1 wc 0.00 0.00
1 xrandr 0.00 0.00
0 processes running
47 maximum processes
Computation blocks
1052322) rays1bench cpu=6 start=5.64 finish=19.07
1052323) python3 cpu=4 start=5.65 finish=19.07
1052324) python3 cpu=5 start=5.67 finish=5.68
1052325) g++ cpu=7 start=5.68 finish=7.00
1052326) cc1plus cpu=5 start=5.69 finish=6.25
1052327) as cpu=8 start=6.26 finish=6.28
1052328) cc1plus cpu=2 start=6.28 finish=6.56
1052329) as cpu=8 start=6.57 finish=6.57
1052330) collect2 cpu=13 start=6.57 finish=7.00
1052331) ld cpu=13 start=6.58 finish=7.00
1052332) lto-wrapper cpu=8 start=6.59 finish=6.97
1052333) g++ cpu=6 start=6.59 finish=6.61
1052334) lto1-wpa cpu=9 start=6.59 finish=6.60
1052335) g++ cpu=4 start=6.61 finish=6.97
1052336) lto1-ltrans cpu=6 start=6.61 finish=6.94
1052337) as cpu=15 start=6.94 finish=6.97
1052338) a.out cpu=11 start=7.00 finish=19.06
1052339) a.out cpu=1 start=7.01 finish=7.93
1052340) a.out cpu=2 start=7.01 finish=7.93
1052341) a.out cpu=11 start=7.01 finish=7.93
1052342) a.out cpu=4 start=7.01 finish=7.93
1052343) a.out cpu=7 start=7.01 finish=7.93
1052344) a.out cpu=0 start=7.01 finish=7.93
1052345) a.out cpu=6 start=7.01 finish=7.93
1052346) a.out cpu=13 start=7.01 finish=7.93
1052347) a.out cpu=5 start=7.01 finish=7.93
1052348) a.out cpu=8 start=7.01 finish=7.93
1052349) a.out cpu=3 start=7.01 finish=7.93
1052350) a.out cpu=12 start=7.01 finish=7.93
1052351) a.out cpu=15 start=7.01 finish=7.93
1052352) a.out cpu=9 start=7.01 finish=7.93
1052353) a.out cpu=14 start=7.01 finish=7.93
1052354) a.out cpu=10 start=7.01 finish=7.93
1052355) a.out cpu=13 start=7.93 finish=9.98
1052356) a.out cpu=15 start=7.93 finish=9.98
1052357) a.out cpu=12 start=7.93 finish=9.98
1052358) a.out cpu=6 start=7.93 finish=9.98
1052359) a.out cpu=9 start=7.93 finish=9.98
1052360) a.out cpu=3 start=7.93 finish=9.98
1052361) a.out cpu=0 start=7.93 finish=9.98
1052362) a.out cpu=2 start=7.93 finish=9.97
1052363) a.out cpu=4 start=7.93 finish=9.98
1052364) a.out cpu=7 start=7.93 finish=9.98
1052365) a.out cpu=5 start=7.93 finish=9.98
1052366) a.out cpu=8 start=7.93 finish=9.98
1052367) a.out cpu=14 start=7.93 finish=9.98
1052368) a.out cpu=1 start=7.93 finish=9.98
1052369) a.out cpu=11 start=7.93 finish=9.98
1052370) a.out cpu=10 start=7.93 finish=9.98
1052371) a.out cpu=14 start=9.98 finish=19.05
1052372) a.out cpu=7 start=9.98 finish=19.05
1052373) a.out cpu=0 start=9.98 finish=19.06
1052374) a.out cpu=1 start=9.98 finish=19.06
1052375) a.out cpu=5 start=9.98 finish=19.05
1052376) a.out cpu=2 start=9.98 finish=19.05
1052377) a.out cpu=12 start=9.98 finish=19.05
1052378) a.out cpu=3 start=9.98 finish=19.05
1052379) a.out cpu=10 start=9.98 finish=19.05
1052380) a.out cpu=13 start=9.98 finish=19.04
1052381) a.out cpu=6 start=9.98 finish=19.04
1052382) a.out cpu=15 start=9.98 finish=19.05
1052383) a.out cpu=8 start=9.98 finish=19.06
1052384) a.out cpu=9 start=9.98 finish=19.06
1052385) a.out cpu=4 start=9.98 finish=19.04
1052386) a.out cpu=11 start=9.98 finish=19.05
