An OpenMP program for solving the n-queens problem This runs ~4x slower than n-queens but not clear if the board sizes are the same. Otherwise all processes on all cores.

Topdown profile shows similar levels of frontend stalls and retiring. The backend stalls are low and branch mispredict are high.

AMD metrics show little floating point and little L2 access
elapsed 200.507
on_cpu 0.920 # 14.71 / 16 cores
utime 2949.022
stime 1.235
nvcsw 2102 # 6.67%
nivcsw 29429 # 93.33%
inblock 0 # 0.00/sec
onblock 12568 # 62.68/sec
cpu-clock 2950319583663 # 2950.320 seconds
task-clock 2950333312474 # 2950.333 seconds
page faults 147869 # 50.119/sec
context switches 32365 # 10.970/sec
cpu migrations 287 # 0.097/sec
major page faults 5 # 0.002/sec
minor page faults 147864 # 50.118/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 2449635795430 # 122.266 branches per 1000 inst
branch misses 279557151552 # 11.41% branch miss
conditional 2016563015382 # 100.651 conditional branches per 1000 inst
indirect 53952584 # 0.003 indirect branches per 1000 inst
cpu-cycles 12290424378998 # 3.84 GHz
instructions 20036705544292 # 1.63 IPC
slots 24594741239610 #
retiring 6648865947598 # 27.0% (38.5%)
-- ucode 27142902 # 0.0%
-- fastpath 6648838804696 # 27.0%
frontend 6827133842947 # 27.8% (39.6%)
-- latency 5177461340538 # 21.1%
-- bandwidth 1649672502409 # 6.7%
backend 1625658350041 # 6.6% ( 9.4%) low
-- cpu 620818946652 # 2.5%
-- memory 1004839403389 # 4.1%
speculation 2155493716016 # 8.8% (12.5%) high
-- branch mispredict 2155483748047 # 8.8%
-- pipeline restart 9967969 # 0.0%
smt-contention 7337535654680 # 29.8% ( 0.0%)
cpu-cycles 12292939466217 # 3.84 GHz
instructions 20038933123611 # 1.63 IPC
instructions 6681724577089 # 0.036 l2 access per 1000 inst
l2 hit from l1 220989111 # 10.87% l2 miss
l2 miss from l1 16124182 #
l2 hit from l2 pf 10847456 #
l3 hit from l2 pf 5189423 #
l3 miss from l2 pf 5002627 #
instructions 6674737780939 # 0.013 float per 1000 inst
float 512 59 # 0.000 AVX-512 per 1000 inst
float 256 532 # 0.000 AVX-256 per 1000 inst
float 128 89238336 # 0.013 AVX-128 per 1000 inst
float MMX 0 # 0.000 MMX per 1000 inst
float scalar 0 # 0.000 scalar per 1000 inst
Intel metrics
elapsed 287.052
on_cpu 0.945 # 15.12 / 16 cores
utime 4338.559
stime 0.480
nvcsw 1880 # 5.72%
nivcsw 31014 # 94.28%
inblock 8 # 0.03/sec
onblock 1328 # 4.63/sec
cpu-clock 4339072593247 # 4339.073 seconds
task-clock 4339083116128 # 4339.083 seconds
page faults 137059 # 31.587/sec
context switches 34159 # 7.872/sec
cpu migrations 284 # 0.065/sec
major page faults 0 # 0.000/sec
minor page faults 137059 # 31.587/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 2449320104575 # 122.253 branches per 1000 inst
branch misses 290139523295 # 11.85% branch miss
conditional 2449320117951 # 122.253 conditional branches per 1000 inst
indirect 366867368214 # 18.312 indirect branches per 1000 inst
slots 21837969535640 #
retiring 8918818055099 # 40.8% (40.8%)
-- ucode 736827206 # 0.0%
-- fastpath 8918081227893 # 40.8%
frontend 2363902682752 # 10.8% (10.8%)
-- latency 1643610505164 # 7.5%
-- bandwidth 720292177588 # 3.3%
backend 1573606838551 # 7.2% ( 7.2%) low
-- cpu 1403575395407 # 6.4%
-- memory 170031443144 # 0.8%
speculation 9117519044335 # 41.8% (41.8%) high
-- branch mispredict 9117374039133 # 41.8%
-- pipeline restart 145005202 # 0.0%
smt-contention 0 # 0.0% ( 0.0%)
cpu-cycles 11206702565317 # 2.45 GHz
instructions 15216889946882 # 1.36 IPC
l2 access 304845877 # 0.031 l2 access per 1000 inst
l2 miss 89427914 # 29.34% l2 miss
Process overview shows m-queens.bin
399 processes
48 m-queens.bin 47213.12 1.48
68 clinfo 17.20 5.64
38 vulkaninfo 0.95 1.31
6 glxinfo:gdrv0 0.12 0.04
6 glxinfo:gl0 0.12 0.04
4 vulkani:disk$0 0.10 0.13
6 php 0.06 0.07
2 glxinfo 0.06 0.02
2 glxinfo:cs0 0.06 0.02
2 glxinfo:disk$0 0.06 0.02
2 glxinfo:sh0 0.06 0.02
2 glxinfo:shlo0 0.06 0.02
2 llvmpipe-0 0.05 0.07
2 llvmpipe-1 0.05 0.07
2 llvmpipe-10 0.05 0.07
2 llvmpipe-11 0.05 0.07
2 llvmpipe-12 0.05 0.07
2 llvmpipe-13 0.05 0.07
2 llvmpipe-14 0.05 0.07
2 llvmpipe-15 0.05 0.07
2 llvmpipe-2 0.05 0.07
2 llvmpipe-3 0.05 0.07
2 llvmpipe-4 0.05 0.07
2 llvmpipe-5 0.05 0.07
2 llvmpipe-6 0.05 0.07
2 llvmpipe-7 0.05 0.07
2 llvmpipe-8 0.05 0.07
2 llvmpipe-9 0.05 0.07
6 clang 0.05 0.06
3 rocminfo 0.03 0.00
1 lspci 0.00 0.02
1 ps 0.00 0.01
82 sh 0.00 0.00
13 gcc 0.00 0.00
13 gsettings 0.00 0.00
8 stat 0.00 0.00
8 systemd-detect- 0.00 0.00
6 llvm-link 0.00 0.00
5 phoronix-test-s 0.00 0.00
3 m-queens 0.00 0.00
2 cc 0.00 0.00
2 gmain 0.00 0.00
2 lscpu 0.00 0.00
2 uname 0.00 0.00
2 which 0.00 0.00
2 xset 0.00 0.00
1 date 0.00 0.00
1 dconf worker 0.00 0.00
1 dirname 0.00 0.00
1 dmesg 0.00 0.00
1 dmidecode 0.00 0.00
1 grep 0.00 0.00
1 ifconfig 0.00 0.00
1 ip 0.00 0.00
1 lsmod 0.00 0.00
1 mktemp 0.00 0.00
1 qdbus 0.00 0.00
1 readlink 0.00 0.00
1 realpath 0.00 0.00
1 sed 0.00 0.00
1 sort 0.00 0.00
1 stty 0.00 0.00
1 systemctl 0.00 0.00
1 template.sh 0.00 0.00
1 wc 0.00 0.00
1 xrandr 0.00 0.00
0 processes running
47 maximum processes
Computation structure is simple
115396) m-queens cpu=14 start=5.71 finish=67.43
115397) m-queens.bin cpu=1 start=5.71 finish=67.43
115398) m-queens.bin cpu=10 start=5.71 finish=67.43
115399) m-queens.bin cpu=3 start=5.71 finish=67.43
115400) m-queens.bin cpu=15 start=5.71 finish=67.43
115401) m-queens.bin cpu=6 start=5.71 finish=67.43
115402) m-queens.bin cpu=0 start=5.71 finish=67.43
115403) m-queens.bin cpu=5 start=5.71 finish=67.43
115404) m-queens.bin cpu=4 start=5.71 finish=67.43
115405) m-queens.bin cpu=8 start=5.71 finish=67.43
115406) m-queens.bin cpu=12 start=5.71 finish=67.43
115407) m-queens.bin cpu=2 start=5.71 finish=67.43
115408) m-queens.bin cpu=11 start=5.71 finish=67.42
115409) m-queens.bin cpu=7 start=5.71 finish=67.42
115410) m-queens.bin cpu=13 start=5.71 finish=67.42
115411) m-queens.bin cpu=9 start=5.71 finish=67.42
115412) m-queens.bin cpu=14 start=5.71 finish=67.42
