A 3D software renderer that uses OpenMP and Intel Thread Building Blocks. This test has one workload. The workload is multi-threaded and runs quickly.

Topdown metrics are dominated by backend stalls.

AMD metrics confirm the topdown stalls and is more CPU-bound than memory bound. This is floating point code.
elapsed 96.469
on_cpu 0.824 # 13.19 / 16 cores
utime 1271.434
stime 0.828
nvcsw 31883 # 75.38%
nivcsw 10411 # 24.62%
inblock 8 # 0.08/sec
onblock 12632 # 130.94/sec
cpu-clock 1272444095522 # 1272.444 seconds
task-clock 1272459019517 # 1272.459 seconds
page faults 145878 # 114.643/sec
context switches 42600 # 33.478/sec
cpu migrations 279 # 0.219/sec
major page faults 2 # 0.002/sec
minor page faults 145876 # 114.641/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 391349607640 # 89.894 branches per 1000 inst
branch misses 10754257318 # 2.75% branch miss
conditional 305970757981 # 70.283 conditional branches per 1000 inst
indirect 11845354997 # 2.721 indirect branches per 1000 inst
cpu-cycles 5344543691186 # 3.46 GHz
instructions 4348276881357 # 0.81 IPC
slots 10693312025760 #
retiring 1604929468240 # 15.0% (22.3%)
-- ucode 12995719283 # 0.1%
-- fastpath 1591933748957 # 14.9%
frontend 574101577622 # 5.4% ( 8.0%)
-- latency 236084046810 # 2.2%
-- bandwidth 338017530812 # 3.2%
backend 4809901133590 # 45.0% (66.9%)
-- cpu 2734706267287 # 25.6%
-- memory 2075194866303 # 19.4%
speculation 199470645645 # 1.9% ( 2.8%)
-- branch mispredict 198054692143 # 1.9%
-- pipeline restart 1415953502 # 0.0%
smt-contention 3504894008324 # 32.8% ( 0.0%)
cpu-cycles 5328975712732 # 3.46 GHz
instructions 4357225211014 # 0.82 IPC
instructions 1451050978351 # 13.451 l2 access per 1000 inst
l2 hit from l1 13882641022 # 41.79% l2 miss
l2 miss from l1 4418378075 #
l2 hit from l2 pf 1897404366 #
l3 hit from l2 pf 3504862131 #
l3 miss from l2 pf 233107773 #
instructions 1447220853101 # 320.239 float per 1000 inst
float 512 44 # 0.000 AVX-512 per 1000 inst
float 256 604 # 0.000 AVX-256 per 1000 inst
float 128 463456838102 # 320.239 AVX-128 per 1000 inst
float MMX 0 # 0.000 MMX per 1000 inst
float scalar 0 # 0.000 scalar per 1000 inst
instructions 4350710944191 #
opcache 553340397942 # 127.184 opcache per 1000 inst
opcache miss 4162732235 # 0.8% opcache miss rate
l1 dTLB miss 16224971580 # 3.729 L1 dTLB per 1000 inst
l2 dTLB miss 585000071 # 0.134 L2 dTLB per 1000 inst
instructions 4350509639256 #
icache 6002763036 # 1.380 icache per 1000 inst
icache miss 932442359 # 15.5% icache miss rate
l1 iTLB miss 3629974 # 0.001 L1 iTLB per 1000 inst
l2 iTLB miss 0 # 0.000 L2 iTLB per 1000 inst
tlb flush 17764 # 0.000 TLB flush per 1000 inst
Intel metrics show most memory stalls at L2 level.
elapsed 100.551
on_cpu 0.827 # 13.24 / 16 cores
utime 1329.997
stime 0.842
nvcsw 32996 # 67.94%
nivcsw 15571 # 32.06%
inblock 301688 # 3000.34/sec
onblock 1288 # 12.81/sec
cpu-clock 1330954352419 # 1330.954 seconds
task-clock 1330970775363 # 1330.971 seconds
page faults 113745 # 85.460/sec
context switches 48876 # 36.722/sec
cpu migrations 286 # 0.215/sec
major page faults 1595 # 1.198/sec
minor page faults 112150 # 84.262/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 343392892437 # 81.445 branches per 1000 inst
branch misses 10999047114 # 3.20% branch miss
conditional 343392903733 # 81.445 conditional branches per 1000 inst
indirect 86915724488 # 20.614 indirect branches per 1000 inst
slots 7225154861570 #
retiring 2377199852848 # 32.9% (32.9%)
-- ucode 222923249497 # 3.1%
-- fastpath 2154276603351 # 29.8%
frontend 856658332219 # 11.9% (11.9%)
-- latency 647601467795 # 9.0%
-- bandwidth 209056864424 # 2.9%
backend 3421909787471 # 47.4% (47.4%)
-- cpu 3113118947117 # 43.1%
-- memory 308790840354 # 4.3%
speculation 602695009453 # 8.3% ( 8.3%)
-- branch mispredict 591256229546 # 8.2%
-- pipeline restart 11438779907 # 0.2%
smt-contention 0 # 0.0% ( 0.0%)
cpu-cycles 2317565854634 # 1.46 GHz
instructions 2312562929728 # 1.00 IPC
l2 access 42519533920 # 18.604 l2 access per 1000 inst
l2 miss 21623683450 # 50.86% l2 miss
cpu-cycles 2291687404134 # 13.2% memory latency
load stalls 257807824450 # 0.3% l1 bound
l1 miss 251638988470 # 7.3% l2 bound
l2 miss 83527035396 # 3.5% l3 bound
l3 miss 2257341377 # 0.1% dram bound
store_stalls 44052125709 # 1.9% store bound
Process statistics show time spent in renderer process
352 processes
48 renderer 20199.68 2.56
36 clinfo 4.11 2.07
38 vulkaninfo 1.14 1.15
4 vulkani:disk$0 0.12 0.13
2 llvmpipe-0 0.06 0.06
2 llvmpipe-1 0.06 0.06
2 llvmpipe-10 0.06 0.06
2 llvmpipe-11 0.06 0.06
2 llvmpipe-12 0.06 0.06
2 llvmpipe-13 0.06 0.06
2 llvmpipe-14 0.06 0.06
2 llvmpipe-15 0.06 0.06
2 llvmpipe-2 0.06 0.06
2 llvmpipe-3 0.06 0.06
2 llvmpipe-4 0.06 0.06
2 llvmpipe-5 0.06 0.06
2 llvmpipe-6 0.06 0.06
2 llvmpipe-7 0.06 0.06
2 llvmpipe-8 0.06 0.06
2 llvmpipe-9 0.06 0.06
6 clang 0.05 0.07
6 php 0.05 0.07
3 rocminfo 0.03 0.00
1 lspci 0.00 0.01
1 ps 0.00 0.01
84 sh 0.00 0.00
13 gcc 0.00 0.00
8 stat 0.00 0.00
8 systemd-detect- 0.00 0.00
7 gsettings 0.00 0.00
6 llvm-link 0.00 0.00
5 glxinfo 0.00 0.00
5 gmain 0.00 0.00
5 phoronix-test-s 0.00 0.00
4 dconf worker 0.00 0.00
3 ttsiod-renderer 0.00 0.00
2 cc 0.00 0.00
2 grep 0.00 0.00
2 lscpu 0.00 0.00
2 setterm 0.00 0.00
2 uname 0.00 0.00
2 which 0.00 0.00
1 date 0.00 0.00
1 dirname 0.00 0.00
1 dmesg 0.00 0.00
1 dmidecode 0.00 0.00
1 ifconfig 0.00 0.00
1 ip 0.00 0.00
1 lsmod 0.00 0.00
1 mktemp 0.00 0.00
1 qdbus 0.00 0.00
1 readlink 0.00 0.00
1 realpath 0.00 0.00
1 sed 0.00 0.00
1 sort 0.00 0.00
1 stty 0.00 0.00
1 systemctl 0.00 0.00
1 template.sh 0.00 0.00
1 wc 0.00 0.00
0 processes running
47 maximum processes
The computation blocks are simple
11887) ttsiod-renderer cpu=11 start=5.14 finish=32.24
11888) renderer cpu=14 start=5.14 finish=32.24
11889) renderer cpu=6 start=5.21 finish=32.24
11890) renderer cpu=15 start=5.21 finish=32.24
11891) renderer cpu=2 start=5.21 finish=32.24
11892) renderer cpu=4 start=5.21 finish=32.24
11893) renderer cpu=1 start=5.21 finish=32.24
11894) renderer cpu=13 start=5.21 finish=32.24
11895) renderer cpu=11 start=5.21 finish=32.24
11896) renderer cpu=0 start=5.21 finish=32.24
11897) renderer cpu=3 start=5.21 finish=32.24
11898) renderer cpu=8 start=5.21 finish=32.24
11899) renderer cpu=7 start=5.21 finish=32.24
11900) renderer cpu=12 start=5.21 finish=32.24
11901) renderer cpu=9 start=5.21 finish=32.24
11902) renderer cpu=10 start=5.21 finish=32.24
11903) renderer cpu=5 start=5.21 finish=32.24
