A sudoku puzzle solver in TCL. This is a quick running, single-threaded program that runs in less than a minute.

Topdown profile has few data popints but overall a high retirement rate with some frontend stalls.

AMD metrics show very little cache activity beyond L1, little floating point and running on one core.
elapsed 38.166
on_cpu 0.040 # 0.63 / 16 cores
utime 22.743
stime 1.407
nvcsw 2922 # 84.65%
nivcsw 530 # 15.35%
inblock 152 # 3.98/sec
onblock 14176 # 371.43/sec
cpu-clock 24150225570 # 24.150 seconds
task-clock 24152726708 # 24.153 seconds
page faults 401138 # 16608.394/sec
context switches 3186 # 131.911/sec
cpu migrations 261 # 10.806/sec
major page faults 2 # 0.083/sec
minor page faults 401136 # 16608.311/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 75383328542 # 218.131 branches per 1000 inst
branch misses 389478134 # 0.52% branch miss
conditional 51967656139 # 150.375 conditional branches per 1000 inst
indirect 3230390546 # 9.348 indirect branches per 1000 inst
cpu-cycles 101190461945 # 0.16 GHz
instructions 339002025886 # 3.35 IPC high
slots 209160030252 #
retiring 110152621743 # 52.7% (52.7%)
-- ucode 127985441 # 0.1%
-- fastpath 110024636302 # 52.6%
frontend 77355240137 # 37.0% (37.0%)
-- latency 36416197752 # 17.4%
-- bandwidth 40939042385 # 19.6%
backend 13683290991 # 6.5% ( 6.5%) low
-- cpu 2035995902 # 1.0%
-- memory 11647295089 # 5.6%
speculation 7890471651 # 3.8% ( 3.8%)
-- branch mispredict 7494793595 # 3.6%
-- pipeline restart 395678056 # 0.2%
smt-contention 78120930 # 0.0% ( 0.0%)
cpu-cycles 101260142102 # 0.16 GHz
instructions 338687934281 # 3.34 IPC high
instructions 115602851732 # 12.126 l2 access per 1000 inst
l2 hit from l1 1343316541 # 4.72% l2 miss
l2 miss from l1 42538855 #
l2 hit from l2 pf 34831655 #
l3 hit from l2 pf 16515857 #
l3 miss from l2 pf 7142269 #
instructions 113219580291 # 3.214 float per 1000 inst
float 512 419 # 0.000 AVX-512 per 1000 inst
float 256 968 # 0.000 AVX-256 per 1000 inst
float 128 363932696 # 3.214 AVX-128 per 1000 inst
float MMX 0 # 0.000 MMX per 1000 inst
float scalar 0 # 0.000 scalar per 1000 inst
instructions 2661793 #
opcache 996496 # 374.370 opcache per 1000 inst
opcache miss 539348 # 54.1% opcache miss rate
l1 dTLB miss 5126 # 1.926 L1 dTLB per 1000 inst
l2 dTLB miss 1046 # 0.393 L2 dTLB per 1000 inst
instructions 2686411 #
icache 1319260 # 491.086 icache per 1000 inst
icache miss 112450 # 8.5% icache miss rate
l1 iTLB miss 12 # 0.004 L1 iTLB per 1000 inst
l2 iTLB miss 0 # 0.000 L2 iTLB per 1000 inst
tlb flush 19 # 0.007 TLB flush per 1000 inst
Intel metrics show more memory-bound stalls but still memory latency is mostly L1.
elapsed 41.931
on_cpu 0.042 # 0.67 / 16 cores
utime 27.211
stime 0.875
nvcsw 2756 # 84.46%
nivcsw 507 # 15.54%
inblock 3600 # 85.86/sec
onblock 2568 # 61.24/sec
cpu-clock 28048645629 # 28.049 seconds
task-clock 28052864188 # 28.053 seconds
page faults 395778 # 14108.292/sec
context switches 3005 # 107.119/sec
cpu migrations 284 # 10.124/sec
major page faults 25 # 0.891/sec
minor page faults 395753 # 14107.401/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 75156469514 # 217.736 branches per 1000 inst
branch misses 299780840 # 0.40% branch miss
conditional 75156490922 # 217.736 conditional branches per 1000 inst
indirect 3261592661 # 9.449 indirect branches per 1000 inst
slots 628764512336 #
retiring 316419294072 # 50.3% (50.3%)
-- ucode 15015144716 # 2.4%
-- fastpath 301404149356 # 47.9%
frontend 247522760902 # 39.4% (39.4%)
-- latency 85144205322 # 13.5%
-- bandwidth 162378555580 # 25.8%
backend 18617959489 # 3.0% ( 3.0%) low
-- cpu 12751496330 # 2.0%
-- memory 5866463159 # 0.9%
speculation 46502216731 # 7.4% ( 7.4%)
-- branch mispredict 40575038937 # 6.5%
-- pipeline restart 5927177794 # 0.9%
smt-contention 0 # 0.0% ( 0.0%)
cpu-cycles 105011813182 # 0.15 GHz
instructions 345057177086 # 3.29 IPC high
l2 access 3870769220 # 11.228 l2 access per 1000 inst
l2 miss 235887627 # 6.09% l2 miss
cpu-cycles 104834415246 # 7.6% memory latency
load stalls 7767265463 # 0.6% l1 bound
l1 miss 7116364268 # 6.2% l2 bound
l2 miss 570919695 # 0.3% l3 bound
l3 miss 265312061 # 0.3% dram bound
store_stalls 193339180 # 0.2% store bound
Process summary shows test overhead as almost as high as the workload.
651 processes
300 tclsh 20.13 0.01
68 clinfo 16.88 5.65
38 vulkaninfo 0.75 1.52
6 glxinfo:gdrv0 0.09 0.07
6 glxinfo:gl0 0.09 0.07
4 vulkani:disk$0 0.08 0.16
6 php 0.06 0.07
6 clang 0.06 0.06
2 glxinfo 0.05 0.03
2 glxinfo:cs0 0.05 0.03
2 glxinfo:disk$0 0.05 0.03
2 glxinfo:sh0 0.05 0.03
2 glxinfo:shlo0 0.05 0.03
2 llvmpipe-0 0.04 0.08
2 llvmpipe-1 0.04 0.08
2 llvmpipe-10 0.04 0.08
2 llvmpipe-11 0.04 0.08
2 llvmpipe-12 0.04 0.08
2 llvmpipe-13 0.04 0.08
2 llvmpipe-14 0.04 0.08
2 llvmpipe-15 0.04 0.08
2 llvmpipe-2 0.04 0.08
2 llvmpipe-3 0.04 0.08
2 llvmpipe-4 0.04 0.08
2 llvmpipe-5 0.04 0.08
2 llvmpipe-6 0.04 0.08
2 llvmpipe-7 0.04 0.08
2 llvmpipe-8 0.04 0.08
2 llvmpipe-9 0.04 0.08
3 rocminfo 0.03 0.00
3 sudokut-100-run 0.01 0.01
1 lspci 0.00 0.02
1 ps 0.00 0.01
81 sh 0.00 0.00
13 gsettings 0.00 0.00
12 gcc 0.00 0.00
8 stat 0.00 0.00
8 systemd-detect- 0.00 0.00
6 llvm-link 0.00 0.00
5 phoronix-test-s 0.00 0.00
3 sudokut 0.00 0.00
2 gmain 0.00 0.00
2 lscpu 0.00 0.00
2 uname 0.00 0.00
2 which 0.00 0.00
2 xset 0.00 0.00
1 cc 0.00 0.00
1 date 0.00 0.00
1 dconf worker 0.00 0.00
1 dirname 0.00 0.00
1 dmesg 0.00 0.00
1 dmidecode 0.00 0.00
1 grep 0.00 0.00
1 ifconfig 0.00 0.00
1 ip 0.00 0.00
1 lsmod 0.00 0.00
1 mktemp 0.00 0.00
1 qdbus 0.00 0.00
1 readlink 0.00 0.00
1 realpath 0.00 0.00
1 sed 0.00 0.00
1 sort 0.00 0.00
1 stty 0.00 0.00
1 systemctl 0.00 0.00
1 template.sh 0.00 0.00
1 wc 0.00 0.00
1 xrandr 0.00 0.00
0 processes running
47 maximum processes
Process summary is as follows with the “sudokut-100-run having not quite 100 children but each runs in 0.05 seconds or so…
330575) sudokut cpu=11 start=5.50 finish=13.08
330576) sudokut-100-run cpu=11 start=5.50 finish=13.08
330577) tclsh cpu=12 start=5.51 finish=5.59
330578) tclsh cpu=2 start=5.59 finish=5.71
330579) tclsh cpu=12 start=5.71 finish=5.79
330580) tclsh cpu=10 start=5.79 finish=5.98
330581) tclsh cpu=11 start=5.99 finish=6.15
330582) tclsh cpu=10 start=6.15 finish=6.44
330583) tclsh cpu=4 start=6.44 finish=6.56
330584) tclsh cpu=10 start=6.56 finish=6.62
330585) tclsh cpu=11 start=6.62 finish=6.71
330586) tclsh cpu=2 start=6.71 finish=6.77
330587) tclsh cpu=11 start=6.77 finish=6.84
330588) tclsh cpu=2 start=6.84 finish=6.94
330589) tclsh cpu=11 start=6.94 finish=7.06
330590) tclsh cpu=2 start=7.06 finish=7.12
330591) tclsh cpu=11 start=7.12 finish=7.22
330592) tclsh cpu=2 start=7.22 finish=7.31
330593) tclsh cpu=3 start=7.31 finish=7.34
330594) tclsh cpu=12 start=7.34 finish=7.42
330595) tclsh cpu=10 start=7.42 finish=7.48
330596) tclsh cpu=11 start=7.48 finish=7.53
330597) tclsh cpu=10 start=7.53 finish=7.64
330598) tclsh cpu=11 start=7.64 finish=7.68
330599) tclsh cpu=10 start=7.68 finish=7.74
330600) tclsh cpu=11 start=7.75 finish=7.81
330601) tclsh cpu=10 start=7.81 finish=7.91
330602) tclsh cpu=11 start=7.91 finish=8.00
330603) tclsh cpu=10 start=8.00 finish=8.12
330604) tclsh cpu=11 start=8.12 finish=8.17
330605) tclsh cpu=10 start=8.17 finish=8.21
330606) tclsh cpu=11 start=8.21 finish=8.29
330607) tclsh cpu=12 start=8.29 finish=8.37
330608) tclsh cpu=10 start=8.37 finish=8.41
330609) tclsh cpu=11 start=8.41 finish=8.46
330610) tclsh cpu=10 start=8.46 finish=8.52
330611) tclsh cpu=11 start=8.53 finish=8.57
330612) tclsh cpu=12 start=8.57 finish=8.62
330613) tclsh cpu=10 start=8.62 finish=8.71
330614) tclsh cpu=11 start=8.72 finish=8.81
330615) tclsh cpu=10 start=8.81 finish=8.86
330616) tclsh cpu=11 start=8.86 finish=8.91
330617) tclsh cpu=10 start=8.91 finish=8.96
330618) tclsh cpu=11 start=8.97 finish=9.07
330619) tclsh cpu=10 start=9.07 finish=9.16
330620) tclsh cpu=11 start=9.16 finish=9.20
330621) tclsh cpu=10 start=9.21 finish=9.27
330622) tclsh cpu=11 start=9.27 finish=9.32
330623) tclsh cpu=10 start=9.32 finish=9.37
330624) tclsh cpu=11 start=9.37 finish=9.43
330625) tclsh cpu=10 start=9.43 finish=9.48
330626) tclsh cpu=11 start=9.48 finish=9.65
330627) tclsh cpu=10 start=9.65 finish=9.71
330628) tclsh cpu=11 start=9.71 finish=9.76
330629) tclsh cpu=10 start=9.76 finish=9.82
330630) tclsh cpu=11 start=9.82 finish=9.87
330631) tclsh cpu=10 start=9.87 finish=9.94
330632) tclsh cpu=11 start=9.94 finish=10.04
330633) tclsh cpu=10 start=10.04 finish=10.10
330634) tclsh cpu=11 start=10.10 finish=10.19
330635) tclsh cpu=10 start=10.19 finish=10.26
330636) tclsh cpu=11 start=10.27 finish=10.33
330637) tclsh cpu=10 start=10.33 finish=10.39
330638) tclsh cpu=11 start=10.39 finish=10.46
330639) tclsh cpu=10 start=10.46 finish=10.51
330640) tclsh cpu=11 start=10.51 finish=10.59
330642) tclsh cpu=10 start=10.59 finish=10.64
330643) tclsh cpu=11 start=10.64 finish=10.73
330644) tclsh cpu=10 start=10.73 finish=10.79
330645) tclsh cpu=11 start=10.79 finish=10.89
330646) tclsh cpu=10 start=10.90 finish=10.96
330647) tclsh cpu=11 start=10.96 finish=11.10
330648) tclsh cpu=10 start=11.10 finish=11.16
330649) tclsh cpu=11 start=11.16 finish=11.21
330650) tclsh cpu=10 start=11.21 finish=11.36
330651) tclsh cpu=11 start=11.36 finish=11.41
330652) tclsh cpu=10 start=11.42 finish=11.50
330653) tclsh cpu=11 start=11.50 finish=11.57
330654) tclsh cpu=10 start=11.57 finish=11.62
330655) tclsh cpu=11 start=11.63 finish=11.70
330656) tclsh cpu=10 start=11.70 finish=11.74
330657) tclsh cpu=11 start=11.74 finish=11.82
330658) tclsh cpu=10 start=11.82 finish=11.91
330659) tclsh cpu=11 start=11.91 finish=11.95
330660) tclsh cpu=2 start=11.95 finish=11.99
330661) tclsh cpu=11 start=11.99 finish=12.04
330662) tclsh cpu=2 start=12.04 finish=12.08
330663) tclsh cpu=11 start=12.08 finish=12.15
330664) tclsh cpu=2 start=12.16 finish=12.22
330665) tclsh cpu=11 start=12.22 finish=12.26
330666) tclsh cpu=10 start=12.26 finish=12.31
330667) tclsh cpu=3 start=12.31 finish=12.40
330668) tclsh cpu=10 start=12.40 finish=12.44
330669) tclsh cpu=12 start=12.44 finish=12.59
330670) tclsh cpu=13 start=12.59 finish=12.64
330671) tclsh cpu=12 start=12.64 finish=12.71
330672) tclsh cpu=13 start=12.72 finish=12.76
330673) tclsh cpu=12 start=12.76 finish=12.80
330674) tclsh cpu=13 start=12.80 finish=12.86
330675) tclsh cpu=12 start=12.86 finish=12.91
330676) tclsh cpu=13 start=12.91 finish=12.96
330677) tclsh cpu=4 start=12.96 finish=13.08
