An OpenMP program for solving the n-queens problem This runs ~4x slower than n-queens but not clear if the board sizes are the same. Otherwise all processes on all cores.

Topdown profile shows similar levels of frontend stalls and retiring. The backend stalls are low and branch mispredict are high.

AMD metrics show little floating point and little L2 access

elapsed              200.507
on_cpu               0.920          # 14.71 / 16 cores
utime                2949.022
stime                1.235
nvcsw                2102           # 6.67%
nivcsw               29429          # 93.33%
inblock              0              # 0.00/sec
onblock              12568          # 62.68/sec
cpu-clock            2950319583663  # 2950.320 seconds
task-clock           2950333312474  # 2950.333 seconds
page faults          147869         # 50.119/sec
context switches     32365          # 10.970/sec
cpu migrations       287            # 0.097/sec
major page faults    5              # 0.002/sec
minor page faults    147864         # 50.118/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             2449635795430  # 122.266 branches per 1000 inst
branch misses        279557151552   # 11.41% branch miss
conditional          2016563015382  # 100.651 conditional branches per 1000 inst
indirect             53952584       # 0.003 indirect branches per 1000 inst
cpu-cycles           12290424378998 # 3.84 GHz
instructions         20036705544292 # 1.63 IPC
slots                24594741239610 #
retiring             6648865947598  # 27.0% (38.5%)
-- ucode             27142902       #     0.0%
-- fastpath          6648838804696  #    27.0%
frontend             6827133842947  # 27.8% (39.6%)
-- latency           5177461340538  #    21.1%
-- bandwidth         1649672502409  #     6.7%
backend              1625658350041  #  6.6% ( 9.4%) low
-- cpu               620818946652   #     2.5%
-- memory            1004839403389  #     4.1%
speculation          2155493716016  #  8.8% (12.5%) high
-- branch mispredict 2155483748047  #     8.8%
-- pipeline restart  9967969        #     0.0%
smt-contention       7337535654680  # 29.8% ( 0.0%)
cpu-cycles           12292939466217 # 3.84 GHz
instructions         20038933123611 # 1.63 IPC
instructions         6681724577089  # 0.036 l2 access per 1000 inst
l2 hit from l1       220989111      # 10.87% l2 miss
l2 miss from l1      16124182       #
l2 hit from l2 pf    10847456       #
l3 hit from l2 pf    5189423        #
l3 miss from l2 pf   5002627        #
instructions         6674737780939  # 0.013 float per 1000 inst
float 512            59             # 0.000 AVX-512 per 1000 inst
float 256            532            # 0.000 AVX-256 per 1000 inst
float 128            89238336       # 0.013 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         0              # 0.000 scalar per 1000 inst

Intel metrics

elapsed              287.052
on_cpu               0.945          # 15.12 / 16 cores
utime                4338.559
stime                0.480
nvcsw                1880           # 5.72%
nivcsw               31014          # 94.28%
inblock              8              # 0.03/sec
onblock              1328           # 4.63/sec
cpu-clock            4339072593247  # 4339.073 seconds
task-clock           4339083116128  # 4339.083 seconds
page faults          137059         # 31.587/sec
context switches     34159          # 7.872/sec
cpu migrations       284            # 0.065/sec
major page faults    0              # 0.000/sec
minor page faults    137059         # 31.587/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             2449320104575  # 122.253 branches per 1000 inst
branch misses        290139523295   # 11.85% branch miss
conditional          2449320117951  # 122.253 conditional branches per 1000 inst
indirect             366867368214   # 18.312 indirect branches per 1000 inst
slots                21837969535640 #
retiring             8918818055099  # 40.8% (40.8%)
-- ucode             736827206      #     0.0%
-- fastpath          8918081227893  #    40.8%
frontend             2363902682752  # 10.8% (10.8%)
-- latency           1643610505164  #     7.5%
-- bandwidth         720292177588   #     3.3%
backend              1573606838551  #  7.2% ( 7.2%) low
-- cpu               1403575395407  #     6.4%
-- memory            170031443144   #     0.8%
speculation          9117519044335  # 41.8% (41.8%) high
-- branch mispredict 9117374039133  #    41.8%
-- pipeline restart  145005202      #     0.0%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           11206702565317 # 2.45 GHz
instructions         15216889946882 # 1.36 IPC
l2 access            304845877      # 0.031 l2 access per 1000 inst
l2 miss              89427914       # 29.34% l2 miss

Process overview shows m-queens.bin

399 processes
	 48 m-queens.bin         47213.12     1.48
	 68 clinfo                  17.20     5.64
	 38 vulkaninfo               0.95     1.31
	  6 glxinfo:gdrv0            0.12     0.04
	  6 glxinfo:gl0              0.12     0.04
	  4 vulkani:disk$0           0.10     0.13
	  6 php                      0.06     0.07
	  2 glxinfo                  0.06     0.02
	  2 glxinfo:cs0              0.06     0.02
	  2 glxinfo:disk$0           0.06     0.02
	  2 glxinfo:sh0              0.06     0.02
	  2 glxinfo:shlo0            0.06     0.02
	  2 llvmpipe-0               0.05     0.07
	  2 llvmpipe-1               0.05     0.07
	  2 llvmpipe-10              0.05     0.07
	  2 llvmpipe-11              0.05     0.07
	  2 llvmpipe-12              0.05     0.07
	  2 llvmpipe-13              0.05     0.07
	  2 llvmpipe-14              0.05     0.07
	  2 llvmpipe-15              0.05     0.07
	  2 llvmpipe-2               0.05     0.07
	  2 llvmpipe-3               0.05     0.07
	  2 llvmpipe-4               0.05     0.07
	  2 llvmpipe-5               0.05     0.07
	  2 llvmpipe-6               0.05     0.07
	  2 llvmpipe-7               0.05     0.07
	  2 llvmpipe-8               0.05     0.07
	  2 llvmpipe-9               0.05     0.07
	  6 clang                    0.05     0.06
	  3 rocminfo                 0.03     0.00
	  1 lspci                    0.00     0.02
	  1 ps                       0.00     0.01
	 82 sh                       0.00     0.00
	 13 gcc                      0.00     0.00
	 13 gsettings                0.00     0.00
	  8 stat                     0.00     0.00
	  8 systemd-detect-          0.00     0.00
	  6 llvm-link                0.00     0.00
	  5 phoronix-test-s          0.00     0.00
	  3 m-queens                 0.00     0.00
	  2 cc                       0.00     0.00
	  2 gmain                    0.00     0.00
	  2 lscpu                    0.00     0.00
	  2 uname                    0.00     0.00
	  2 which                    0.00     0.00
	  2 xset                     0.00     0.00
	  1 date                     0.00     0.00
	  1 dconf worker             0.00     0.00
	  1 dirname                  0.00     0.00
	  1 dmesg                    0.00     0.00
	  1 dmidecode                0.00     0.00
	  1 grep                     0.00     0.00
	  1 ifconfig                 0.00     0.00
	  1 ip                       0.00     0.00
	  1 lsmod                    0.00     0.00
	  1 mktemp                   0.00     0.00
	  1 qdbus                    0.00     0.00
	  1 readlink                 0.00     0.00
	  1 realpath                 0.00     0.00
	  1 sed                      0.00     0.00
	  1 sort                     0.00     0.00
	  1 stty                     0.00     0.00
	  1 systemctl                0.00     0.00
	  1 template.sh              0.00     0.00
	  1 wc                       0.00     0.00
	  1 xrandr                   0.00     0.00
0 processes running
47 maximum processes

Computation structure is simple

      115396) m-queens         cpu=14 start=5.71  finish=67.43
        115397) m-queens.bin     cpu=1 start=5.71  finish=67.43
          115398) m-queens.bin     cpu=10 start=5.71  finish=67.43
          115399) m-queens.bin     cpu=3 start=5.71  finish=67.43
          115400) m-queens.bin     cpu=15 start=5.71  finish=67.43
          115401) m-queens.bin     cpu=6 start=5.71  finish=67.43
          115402) m-queens.bin     cpu=0 start=5.71  finish=67.43
          115403) m-queens.bin     cpu=5 start=5.71  finish=67.43
          115404) m-queens.bin     cpu=4 start=5.71  finish=67.43
          115405) m-queens.bin     cpu=8 start=5.71  finish=67.43
          115406) m-queens.bin     cpu=12 start=5.71  finish=67.43
          115407) m-queens.bin     cpu=2 start=5.71  finish=67.43
          115408) m-queens.bin     cpu=11 start=5.71  finish=67.42
          115409) m-queens.bin     cpu=7 start=5.71  finish=67.42
          115410) m-queens.bin     cpu=13 start=5.71  finish=67.42
          115411) m-queens.bin     cpu=9 start=5.71  finish=67.42
          115412) m-queens.bin     cpu=14 start=5.71  finish=67.42