An ambient occlusion renderer written in C. There is one workload which runs quickly and is single-threaded.

Topdown profile shows a fair retirement with backend stalls. Branch mispredictions are high while frontend stalls are low.

AMD metrics

elapsed              102.495
on_cpu               0.054          # 0.86 / 16 cores
utime                87.363
stime                1.020
nvcsw                1999           # 69.43%
nivcsw               880            # 30.57%
inblock              0              # 0.00/sec
onblock              86264          # 841.64/sec
cpu-clock            88410340018    # 88.410 seconds
task-clock           88413216623    # 88.413 seconds
page faults          304139         # 3439.972/sec
context switches     3221           # 36.431/sec
cpu migrations       283            # 3.201/sec
major page faults    2              # 0.023/sec
minor page faults    304137         # 3439.949/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             81795025595    # 92.951 branches per 1000 inst
branch misses        3180554743     # 3.89% branch miss
conditional          47929599290    # 54.467 conditional branches per 1000 inst
indirect             5305288022     # 6.029 indirect branches per 1000 inst
cpu-cycles           406522857441   # 0.25 GHz
instructions         877863716496   # 2.16 IPC
slots                815064673812   #
retiring             313009309772   # 38.4% (38.4%)
-- ucode             595913736      #     0.1%
-- fastpath          312413396036   #    38.3%
frontend             88237262487    # 10.8% (10.8%)
-- latency           48716465088    #     6.0%
-- bandwidth         39520797399    #     4.8%
backend              263992965803   # 32.4% (32.4%)
-- cpu               224705468131   #    27.6%
-- memory            39287497672    #     4.8%
speculation          149780891444   # 18.4% (18.4%) high
-- branch mispredict 148122841279   #    18.2%
-- pipeline restart  1658050165     #     0.2%
smt-contention       44009663       #  0.0% ( 0.0%)
cpu-cycles           406560579104   # 0.25 GHz
instructions         878913529305   # 2.16 IPC
instructions         293297866428   # 0.482 l2 access per 1000 inst
l2 hit from l1       121907588      # 18.04% l2 miss
l2 miss from l1      14912506       #
l2 hit from l2 pf    8978579        #
l3 hit from l2 pf    4321477        #
l3 miss from l2 pf   6292570        #
instructions         293187261114   # 286.414 float per 1000 inst
float 512            45             # 0.000 AVX-512 per 1000 inst
float 256            434            # 0.000 AVX-256 per 1000 inst
float 128            83972975802    # 286.414 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         0              # 0.000 scalar per 1000 inst
instructions         2678177        #
opcache              992616         # 370.631 opcache per 1000 inst
opcache miss         535144         # 53.9% opcache miss rate
l1 dTLB miss         6691           # 2.498 L1 dTLB per 1000 inst
l2 dTLB miss         1192           # 0.445 L2 dTLB per 1000 inst
instructions         2716755        #
icache               1316655        # 484.643 icache per 1000 inst
icache miss          110449         #  8.4% icache miss rate
l1 iTLB miss         12             # 0.004 L1 iTLB per 1000 inst
l2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst
tlb flush            19             # 0.007 TLB flush per 1000 inst

Intel metrics show most of the memory time in L1 and hence also little backend memory stalls.

elapsed              99.367
on_cpu               0.053          # 0.85 / 16 cores
utime                83.595
stime                0.708
nvcsw                1838           # 77.85%
nivcsw               523            # 22.15%
inblock              376            # 3.78/sec
onblock              75024          # 755.02/sec
cpu-clock            84314176287    # 84.314 seconds
task-clock           84316871118    # 84.317 seconds
page faults          293264         # 3478.118/sec
context switches     2682           # 31.809/sec
cpu migrations       256            # 3.036/sec
major page faults    3              # 0.036/sec
minor page faults    293261         # 3478.082/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             81437652894    # 92.693 branches per 1000 inst
branch misses        3150488706     # 3.87% branch miss
conditional          81437664862    # 92.693 conditional branches per 1000 inst
indirect             5324683966     # 6.061 indirect branches per 1000 inst
slots                1909779189608  #
retiring             844514250146   # 44.2% (44.2%)
-- ucode             22927963393    #     1.2%
-- fastpath          821586286753   #    43.0%
frontend             97708395771    #  5.1% ( 5.1%)
-- latency           47599342664    #     2.5%
-- bandwidth         50109053107    #     2.6%
backend              424540183774   # 22.2% (22.2%)
-- cpu               384651028950   #    20.1%
-- memory            39889154824    #     2.1%
speculation          543044850727   # 28.4% (28.4%) high
-- branch mispredict 540435652666   #    28.3%
-- pipeline restart  2609198061     #     0.1%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           318526215737   # 0.20 GHz
instructions         878918735087   # 2.76 IPC
l2 access            306437238      # 0.349 l2 access per 1000 inst
l2 miss              95837357       # 31.27% l2 miss
cpu-cycles           318387579528   #  7.0% memory latency
load stalls          22063711216    #  6.7% l1 bound
l1 miss              799531261      #  0.1% l2 bound
l2 miss              402304447      #  0.0% l3 bound
l3 miss              286747182      #  0.1% dram bound
store_stalls         159923946      #  0.1% store bound

Process overview shows three invocations of “ao”

354 processes
	  3 ao                      86.32     0.24
	 68 clinfo                  15.86     6.66
	 38 vulkaninfo               0.96     1.15
	  6 glxinfo:gdrv0            0.12     0.06
	  6 glxinfo:gl0              0.12     0.06
	  4 vulkani:disk$0           0.10     0.12
	  6 php                      0.06     0.08
	  2 glxinfo                  0.06     0.03
	  2 glxinfo:cs0              0.06     0.03
	  2 glxinfo:disk$0           0.06     0.03
	  2 glxinfo:sh0              0.06     0.03
	  2 glxinfo:shlo0            0.06     0.03
	  6 clang                    0.05     0.07
	  2 llvmpipe-0               0.05     0.06
	  2 llvmpipe-1               0.05     0.06
	  2 llvmpipe-10              0.05     0.06
	  2 llvmpipe-11              0.05     0.06
	  2 llvmpipe-12              0.05     0.06
	  2 llvmpipe-13              0.05     0.06
	  2 llvmpipe-14              0.05     0.06
	  2 llvmpipe-15              0.05     0.06
	  2 llvmpipe-2               0.05     0.06
	  2 llvmpipe-3               0.05     0.06
	  2 llvmpipe-4               0.05     0.06
	  2 llvmpipe-5               0.05     0.06
	  2 llvmpipe-6               0.05     0.06
	  2 llvmpipe-7               0.05     0.06
	  2 llvmpipe-8               0.05     0.06
	  2 llvmpipe-9               0.05     0.06
	  3 rocminfo                 0.03     0.00
	  1 lspci                    0.00     0.02
	  1 ps                       0.00     0.01
	 82 sh                       0.00     0.00
	 13 gcc                      0.00     0.00
	  8 gsettings                0.00     0.00
	  8 stat                     0.00     0.00
	  8 systemd-detect-          0.00     0.00
	  6 llvm-link                0.00     0.00
	  5 gmain                    0.00     0.00
	  5 phoronix-test-s          0.00     0.00
	  3 aobench                  0.00     0.00
	  3 dconf worker             0.00     0.00
	  2 cc                       0.00     0.00
	  2 lscpu                    0.00     0.00
	  2 uname                    0.00     0.00
	  2 which                    0.00     0.00
	  2 xset                     0.00     0.00
	  1 date                     0.00     0.00
	  1 dirname                  0.00     0.00
	  1 dmesg                    0.00     0.00
	  1 dmidecode                0.00     0.00
	  1 grep                     0.00     0.00
	  1 ifconfig                 0.00     0.00
	  1 ip                       0.00     0.00
	  1 lsmod                    0.00     0.00
	  1 mktemp                   0.00     0.00
	  1 qdbus                    0.00     0.00
	  1 readlink                 0.00     0.00
	  1 realpath                 0.00     0.00
	  1 sed                      0.00     0.00
	  1 sort                     0.00     0.00
	  1 stty                     0.00     0.00
	  1 systemctl                0.00     0.00
	  1 template.sh              0.00     0.00
	  1 wc                       0.00     0.00
	  1 xrandr                   0.00     0.00
0 processes running
47 maximum processes

Computation blocks

      356574) aobench          cpu=6 start=5.50  finish=34.37
        356575) ao               cpu=8 start=5.51  finish=34.36
      356578) aobench          cpu=15 start=38.37 finish=67.28
        356579) ao               cpu=0 start=38.38 finish=67.27
      356580) aobench          cpu=14 start=71.28 finish=100.13
        356581) ao               cpu=7 start=71.28 finish=100.12