crafty is a quick running chess benchmark. A single threaded program.

Topdown shows very high branch misprediction and low backend stalls.

AMD metrics

elapsed              85.026
on_cpu               0.049          # 0.79 / 16 cores
utime                66.219
stime                1.031
nvcsw                1987           # 80.06%
nivcsw               495            # 19.94%
inblock              0              # 0.00/sec
onblock              13520          # 159.01/sec
cpu-clock            67275625058    # 67.276 seconds
task-clock           67278460780    # 67.278 seconds
page faults          254497         # 3782.741/sec
context switches     2732           # 40.607/sec
cpu migrations       277            # 4.117/sec
major page faults    2              # 0.030/sec
minor page faults    254495         # 3782.711/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             103658642395   # 126.323 branches per 1000 inst
branch misses        3344279538     # 3.23% branch miss
conditional          81346335549    # 99.132 conditional branches per 1000 inst
indirect             699836262      # 0.853 indirect branches per 1000 inst
cpu-cycles           227536355149   # 0.22 GHz
instructions         612422018555   # 2.69 IPC
slots                457299337278   #
retiring             207822186201   # 45.4% (45.4%)
-- ucode             20268546       #     0.0%
-- fastpath          207801917655   #    45.4%
frontend             124436967971   # 27.2% (27.2%)
-- latency           79829054820    #    17.5%
-- bandwidth         44607913151    #     9.8%
backend              65528959985    # 14.3% (14.3%)
-- cpu               12232245149    #     2.7%
-- memory            53296714836    #    11.7%
speculation          59488082175    # 13.0% (13.0%)
-- branch mispredict 58719071362    #    12.8%
-- pipeline restart  769010813      #     0.2%
smt-contention       22893938       #  0.0% ( 0.0%)
cpu-cycles           227110461275   # 0.22 GHz
instructions         612833388726   # 2.70 IPC
instructions         204805663491   # 10.496 l2 access per 1000 inst
l2 hit from l1       1989973299     # 4.61% l2 miss
l2 miss from l1      56774383       #
l2 hit from l2 pf    117293988      #
l3 hit from l2 pf    27300597       #
l3 miss from l2 pf   15007672       #
instructions         204922878110   # 17.716 float per 1000 inst
float 512            50             # 0.000 AVX-512 per 1000 inst
float 256            618            # 0.000 AVX-256 per 1000 inst
float 128            3630465889     # 17.716 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         10             # 0.000 scalar per 1000 inst

Intel metrics

elapsed              78.685
on_cpu               0.052          # 0.83 / 16 cores
utime                64.831
stime                0.552
nvcsw                1881           # 81.78%
nivcsw               419            # 18.22%
inblock              8              # 0.10/sec
onblock              2008           # 25.52/sec
cpu-clock            65403411728    # 65.403 seconds
task-clock           65406142261    # 65.406 seconds
page faults          213026         # 3256.972/sec
context switches     2523           # 38.574/sec
cpu migrations       300            # 4.587/sec
major page faults    0              # 0.000/sec
minor page faults    213026         # 3256.972/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             77620340217    # 126.456 branches per 1000 inst
branch misses        2814845760     # 3.63% branch miss
conditional          77620352217    # 126.456 conditional branches per 1000 inst
indirect             546661136      # 0.891 indirect branches per 1000 inst
slots                1479746226446  #
retiring             586349958636   # 39.6% (39.6%)
-- ucode             23262388203    #     1.6%
-- fastpath          563087570433   #    38.1%
frontend             441928498060   # 29.9% (29.9%)
-- latency           219742463361   #    14.9%
-- bandwidth         222186034699   #    15.0%
backend              128165873654   #  8.7% ( 8.7%)
-- cpu               73050345133    #     4.9%
-- memory            55115528521    #     3.7%
speculation          330332085857   # 22.3% (22.3%)
-- branch mispredict 324491614752   #    21.9%
-- pipeline restart  5840471105     #     0.4%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           246727230707   # 0.20 GHz
instructions         613593772132   # 2.49 IPC
l2 access            4856487075     # 7.918 l2 access per 1000 inst
l2 miss              738889264      # 15.21% l2 miss

Process structure shows just six invocations of crafty and a moderate percentage of overhead.

elapsed              78.685
on_cpu               0.052          # 0.83 / 16 cores
utime                64.831
stime                0.552
nvcsw                1881           # 81.78%
nivcsw               419            # 18.22%
inblock              8              # 0.10/sec
onblock              2008           # 25.52/sec
cpu-clock            65403411728    # 65.403 seconds
task-clock           65406142261    # 65.406 seconds
page faults          213026         # 3256.972/sec
context switches     2523           # 38.574/sec
cpu migrations       300            # 4.587/sec
major page faults    0              # 0.000/sec
minor page faults    213026         # 3256.972/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             77620340217    # 126.456 branches per 1000 inst
branch misses        2814845760     # 3.63% branch miss
conditional          77620352217    # 126.456 conditional branches per 1000 inst
indirect             546661136      # 0.891 indirect branches per 1000 inst
slots                1479746226446  #
retiring             586349958636   # 39.6% (39.6%)
-- ucode             23262388203    #     1.6%
-- fastpath          563087570433   #    38.1%
frontend             441928498060   # 29.9% (29.9%)
-- latency           219742463361   #    14.9%
-- bandwidth         222186034699   #    15.0%
backend              128165873654   #  8.7% ( 8.7%)
-- cpu               73050345133    #     4.9%
-- memory            55115528521    #     3.7%
speculation          330332085857   # 22.3% (22.3%)
-- branch mispredict 324491614752   #    21.9%
-- pipeline restart  5840471105     #     0.4%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           246727230707   # 0.20 GHz
instructions         613593772132   # 2.49 IPC
l2 access            4856487075     # 7.918 l2 access per 1000 inst
l2 miss              738889264      # 15.21% l2 miss

Core computation is simple

      1954945) sh               cpu=4 start=5.57  finish=5.57 
        1954946) stty             cpu=1 start=5.57  finish=5.57 
      1954947) crafty-benchmar  cpu=3 start=5.57  finish=21.44
        1954948) crafty           cpu=5 start=5.58  finish=21.43
          1954951) crafty           cpu=7 start=21.29 finish=21.43
      1954952) crafty-benchmar  cpu=3 start=25.44 finish=41.30
        1954953) crafty           cpu=12 start=25.45 finish=41.29
          1954956) crafty           cpu=13 start=41.14 finish=41.29
      1954957) crafty-benchmar  cpu=11 start=45.30 finish=61.09
        1954958) crafty           cpu=12 start=45.30 finish=61.09
          1954959) crafty           cpu=14 start=60.96 finish=61.09