A cross-platform GPU stress test with OpenGL and Vulkan drivers. There are 160 combinations of drivers, screen sizes and MSAA settings. This test picks 1920×1200 resolution with OpenGL and all MSAA settings. Overall system shows this is now a CPU benchmark and runes with just a few threads.

Topdown profile shows the little CPU time is spent in frontend stalls.

AMD metrics show only 1/30th of a core time spent.

elapsed              1076.862
on_cpu               0.002          # 0.03 / 16 cores
utime                20.084
stime                16.793
nvcsw                162027         # 99.31%
nivcsw               1122           # 0.69%
inblock              8              # 0.01/sec
onblock              13640          # 12.67/sec
cpu-clock            32019749507    # 32.020 seconds
task-clock           32688331876    # 32.688 seconds
page faults          476048         # 14563.239/sec
context switches     168328         # 5149.483/sec
cpu migrations       1007           # 30.806/sec
major page faults    2              # 0.061/sec
minor page faults    476046         # 14563.178/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             8777901101     # 159.914 branches per 1000 inst
branch misses        365812203      # 4.17% branch miss
conditional          5205216990     # 94.828 conditional branches per 1000 inst
indirect             507105797      # 9.238 indirect branches per 1000 inst
cpu-cycles           53961154591    # 0.00 GHz
instructions         54738144413    # 1.01 IPC
slots                108780603978   #
retiring             19232090905    # 17.7% (17.7%)
-- ucode             52330631       #     0.0%
-- fastpath          19179760274    #    17.6%
frontend             64353407460    # 59.2% (59.3%) high
-- latency           58838576598    #    54.1%
-- bandwidth         5514830862     #     5.1%
backend              21298430222    # 19.6% (19.6%)
-- cpu               2146489744     #     2.0%
-- memory            19151940478    #    17.6%
speculation          3655643434     #  3.4% ( 3.4%)
-- branch mispredict 3554005285     #     3.3%
-- pipeline restart  101638149      #     0.1%
smt-contention       240343787      #  0.2% ( 0.0%)
cpu-cycles           54536598047    # 0.00 GHz
instructions         54631269920    # 1.00 IPC
instructions         18245618756    # 37.691 l2 access per 1000 inst
l2 hit from l1       615783822      # 43.13% l2 miss
l2 miss from l1      254071074      #
l2 hit from l2 pf    29368657       #
l3 hit from l2 pf    16021795       #
l3 miss from l2 pf   26518276       #
instructions         18197223178    # 62.137 float per 1000 inst
float 512            64             # 0.000 AVX-512 per 1000 inst
float 256            674            # 0.000 AVX-256 per 1000 inst
float 128            1130722912     # 62.137 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         0              # 0.000 scalar per 1000 inst
instructions         54655192673    #
opcache              12529239376    # 229.242 opcache per 1000 inst
opcache miss         3057310348     # 24.4% opcache miss rate
l1 dTLB miss         105870486      # 1.937 L1 dTLB per 1000 inst
l2 dTLB miss         24007702       # 0.439 L2 dTLB per 1000 inst
instructions         54611990576    #
icache               7369764703     # 134.948 icache per 1000 inst
icache miss          1623602659     # 22.0% icache miss rate
l1 iTLB miss         39060559       # 0.715 L1 iTLB per 1000 inst
l2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst
tlb flush            25177          # 0.000 TLB flush per 1000 inst

Intel metrics

elapsed              1083.609
on_cpu               0.002          # 0.03 / 16 cores
utime                15.516
stime                15.960
nvcsw                116053         # 99.64%
nivcsw               423            # 0.36%
inblock              64             # 0.06/sec
onblock              2344           # 2.16/sec
cpu-clock            27100165512    # 27.100 seconds
task-clock           27755281301    # 27.755 seconds
page faults          375809         # 13540.090/sec
context switches     121705         # 4384.931/sec
cpu migrations       1051           # 37.867/sec
major page faults    1              # 0.036/sec
minor page faults    375808         # 13540.054/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             6642555358     # 150.585 branches per 1000 inst
branch misses        110237614      # 1.66% branch miss
conditional          6642575742     # 150.585 conditional branches per 1000 inst
indirect             478025670      # 10.837 indirect branches per 1000 inst
slots                195395490248   #
retiring             59217355013    # 30.3% (30.3%)
-- ucode             6707380516     #     3.4%
-- fastpath          52509974497    #    26.9%
frontend             30497378415    # 15.6% (15.6%)
-- latency           17340657935    #     8.9%
-- bandwidth         13156720480    #     6.7%
backend              94477580844    # 48.4% (48.4%)
-- cpu               28046200975    #    14.4%
-- memory            66431379869    #    34.0%
speculation          11809887438    #  6.0% ( 6.0%)
-- branch mispredict 10842411803    #     5.5%
-- pipeline restart  967475635      #     0.5%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           34678455881    # 0.00 GHz
instructions         46157951024    # 1.33 IPC
l2 access            1951534064     # 45.139 l2 access per 1000 inst
l2 miss              1285026344     # 65.85% l2 miss
cpu-cycles           32708750023    # 39.3% memory latency
load stalls          6497272907     #  3.9% l1 bound
l1 miss              5216356433     #  4.1% l2 bound
l2 miss              3877407117     #  2.1% l3 bound
l3 miss              3202485247     #  9.8% dram bound
store_stalls         6343834109     # 19.4% store bound

Overall, this is a GPU test and not a CPU test.