crafty is a quick running chess benchmark. A single threaded program.

Topdown shows very high branch misprediction and low backend stalls.

AMD metrics
elapsed 85.026
on_cpu 0.049 # 0.79 / 16 cores
utime 66.219
stime 1.031
nvcsw 1987 # 80.06%
nivcsw 495 # 19.94%
inblock 0 # 0.00/sec
onblock 13520 # 159.01/sec
cpu-clock 67275625058 # 67.276 seconds
task-clock 67278460780 # 67.278 seconds
page faults 254497 # 3782.741/sec
context switches 2732 # 40.607/sec
cpu migrations 277 # 4.117/sec
major page faults 2 # 0.030/sec
minor page faults 254495 # 3782.711/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 103658642395 # 126.323 branches per 1000 inst
branch misses 3344279538 # 3.23% branch miss
conditional 81346335549 # 99.132 conditional branches per 1000 inst
indirect 699836262 # 0.853 indirect branches per 1000 inst
cpu-cycles 227536355149 # 0.22 GHz
instructions 612422018555 # 2.69 IPC
slots 457299337278 #
retiring 207822186201 # 45.4% (45.4%)
-- ucode 20268546 # 0.0%
-- fastpath 207801917655 # 45.4%
frontend 124436967971 # 27.2% (27.2%)
-- latency 79829054820 # 17.5%
-- bandwidth 44607913151 # 9.8%
backend 65528959985 # 14.3% (14.3%)
-- cpu 12232245149 # 2.7%
-- memory 53296714836 # 11.7%
speculation 59488082175 # 13.0% (13.0%)
-- branch mispredict 58719071362 # 12.8%
-- pipeline restart 769010813 # 0.2%
smt-contention 22893938 # 0.0% ( 0.0%)
cpu-cycles 227110461275 # 0.22 GHz
instructions 612833388726 # 2.70 IPC
instructions 204805663491 # 10.496 l2 access per 1000 inst
l2 hit from l1 1989973299 # 4.61% l2 miss
l2 miss from l1 56774383 #
l2 hit from l2 pf 117293988 #
l3 hit from l2 pf 27300597 #
l3 miss from l2 pf 15007672 #
instructions 204922878110 # 17.716 float per 1000 inst
float 512 50 # 0.000 AVX-512 per 1000 inst
float 256 618 # 0.000 AVX-256 per 1000 inst
float 128 3630465889 # 17.716 AVX-128 per 1000 inst
float MMX 0 # 0.000 MMX per 1000 inst
float scalar 10 # 0.000 scalar per 1000 inst
Intel metrics
elapsed 78.685
on_cpu 0.052 # 0.83 / 16 cores
utime 64.831
stime 0.552
nvcsw 1881 # 81.78%
nivcsw 419 # 18.22%
inblock 8 # 0.10/sec
onblock 2008 # 25.52/sec
cpu-clock 65403411728 # 65.403 seconds
task-clock 65406142261 # 65.406 seconds
page faults 213026 # 3256.972/sec
context switches 2523 # 38.574/sec
cpu migrations 300 # 4.587/sec
major page faults 0 # 0.000/sec
minor page faults 213026 # 3256.972/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 77620340217 # 126.456 branches per 1000 inst
branch misses 2814845760 # 3.63% branch miss
conditional 77620352217 # 126.456 conditional branches per 1000 inst
indirect 546661136 # 0.891 indirect branches per 1000 inst
slots 1479746226446 #
retiring 586349958636 # 39.6% (39.6%)
-- ucode 23262388203 # 1.6%
-- fastpath 563087570433 # 38.1%
frontend 441928498060 # 29.9% (29.9%)
-- latency 219742463361 # 14.9%
-- bandwidth 222186034699 # 15.0%
backend 128165873654 # 8.7% ( 8.7%)
-- cpu 73050345133 # 4.9%
-- memory 55115528521 # 3.7%
speculation 330332085857 # 22.3% (22.3%)
-- branch mispredict 324491614752 # 21.9%
-- pipeline restart 5840471105 # 0.4%
smt-contention 0 # 0.0% ( 0.0%)
cpu-cycles 246727230707 # 0.20 GHz
instructions 613593772132 # 2.49 IPC
l2 access 4856487075 # 7.918 l2 access per 1000 inst
l2 miss 738889264 # 15.21% l2 miss
Process structure shows just six invocations of crafty and a moderate percentage of overhead.
elapsed 78.685
on_cpu 0.052 # 0.83 / 16 cores
utime 64.831
stime 0.552
nvcsw 1881 # 81.78%
nivcsw 419 # 18.22%
inblock 8 # 0.10/sec
onblock 2008 # 25.52/sec
cpu-clock 65403411728 # 65.403 seconds
task-clock 65406142261 # 65.406 seconds
page faults 213026 # 3256.972/sec
context switches 2523 # 38.574/sec
cpu migrations 300 # 4.587/sec
major page faults 0 # 0.000/sec
minor page faults 213026 # 3256.972/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 77620340217 # 126.456 branches per 1000 inst
branch misses 2814845760 # 3.63% branch miss
conditional 77620352217 # 126.456 conditional branches per 1000 inst
indirect 546661136 # 0.891 indirect branches per 1000 inst
slots 1479746226446 #
retiring 586349958636 # 39.6% (39.6%)
-- ucode 23262388203 # 1.6%
-- fastpath 563087570433 # 38.1%
frontend 441928498060 # 29.9% (29.9%)
-- latency 219742463361 # 14.9%
-- bandwidth 222186034699 # 15.0%
backend 128165873654 # 8.7% ( 8.7%)
-- cpu 73050345133 # 4.9%
-- memory 55115528521 # 3.7%
speculation 330332085857 # 22.3% (22.3%)
-- branch mispredict 324491614752 # 21.9%
-- pipeline restart 5840471105 # 0.4%
smt-contention 0 # 0.0% ( 0.0%)
cpu-cycles 246727230707 # 0.20 GHz
instructions 613593772132 # 2.49 IPC
l2 access 4856487075 # 7.918 l2 access per 1000 inst
l2 miss 738889264 # 15.21% l2 miss
Core computation is simple
1954945) sh cpu=4 start=5.57 finish=5.57
1954946) stty cpu=1 start=5.57 finish=5.57
1954947) crafty-benchmar cpu=3 start=5.57 finish=21.44
1954948) crafty cpu=5 start=5.58 finish=21.43
1954951) crafty cpu=7 start=21.29 finish=21.43
1954952) crafty-benchmar cpu=3 start=25.44 finish=41.30
1954953) crafty cpu=12 start=25.45 finish=41.29
1954956) crafty cpu=13 start=41.14 finish=41.29
1954957) crafty-benchmar cpu=11 start=45.30 finish=61.09
1954958) crafty cpu=12 start=45.30 finish=61.09
1954959) crafty cpu=14 start=60.96 finish=61.09
