{"id":1250,"date":"2024-02-01T23:35:53","date_gmt":"2024-02-01T23:35:53","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=1250"},"modified":"2024-02-02T03:14:27","modified_gmt":"2024-02-02T03:14:27","slug":"m-queens","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/m-queens\/","title":{"rendered":"m-queens"},"content":{"rendered":"\n<p>An OpenMP program for solving the n-queens problem This runs ~4x slower than n-queens but not clear if the board sizes are the same. Otherwise all processes on all cores.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-8.png\" alt=\"\" class=\"wp-image-1278\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-8.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-8-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-8-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows similar levels of frontend stalls and retiring. The backend stalls are low and branch mispredict are high.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-8.png\" alt=\"\" class=\"wp-image-1280\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-8.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-8-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-8-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show little floating point and little L2 access<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              200.507\non_cpu               0.920          # 14.71 \/ 16 cores\nutime                2949.022\nstime                1.235\nnvcsw                2102           # 6.67%\nnivcsw               29429          # 93.33%\ninblock              0              # 0.00\/sec\nonblock              12568          # 62.68\/sec\ncpu-clock            2950319583663  # 2950.320 seconds\ntask-clock           2950333312474  # 2950.333 seconds\npage faults          147869         # 50.119\/sec\ncontext switches     32365          # 10.970\/sec\ncpu migrations       287            # 0.097\/sec\nmajor page faults    5              # 0.002\/sec\nminor page faults    147864         # 50.118\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             2449635795430  # 122.266 branches per 1000 inst\nbranch misses        279557151552   # 11.41% branch miss\nconditional          2016563015382  # 100.651 conditional branches per 1000 inst\nindirect             53952584       # 0.003 indirect branches per 1000 inst\ncpu-cycles           12290424378998 # 3.84 GHz\ninstructions         20036705544292 # 1.63 IPC\nslots                24594741239610 #\nretiring             6648865947598  # 27.0% (38.5%)\n-- ucode             27142902       #     0.0%\n-- fastpath          6648838804696  #    27.0%\nfrontend             6827133842947  # 27.8% (39.6%)\n-- latency           5177461340538  #    21.1%\n-- bandwidth         1649672502409  #     6.7%\nbackend              1625658350041  #  6.6% ( 9.4%) low\n-- cpu               620818946652   #     2.5%\n-- memory            1004839403389  #     4.1%\nspeculation          2155493716016  #  8.8% (12.5%) high\n-- branch mispredict 2155483748047  #     8.8%\n-- pipeline restart  9967969        #     0.0%\nsmt-contention       7337535654680  # 29.8% ( 0.0%)\ncpu-cycles           12292939466217 # 3.84 GHz\ninstructions         20038933123611 # 1.63 IPC\ninstructions         6681724577089  # 0.036 l2 access per 1000 inst\nl2 hit from l1       220989111      # 10.87% l2 miss\nl2 miss from l1      16124182       #\nl2 hit from l2 pf    10847456       #\nl3 hit from l2 pf    5189423        #\nl3 miss from l2 pf   5002627        #\ninstructions         6674737780939  # 0.013 float per 1000 inst\nfloat 512            59             # 0.000 AVX-512 per 1000 inst\nfloat 256            532            # 0.000 AVX-256 per 1000 inst\nfloat 128            89238336       # 0.013 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              287.052\non_cpu               0.945          # 15.12 \/ 16 cores\nutime                4338.559\nstime                0.480\nnvcsw                1880           # 5.72%\nnivcsw               31014          # 94.28%\ninblock              8              # 0.03\/sec\nonblock              1328           # 4.63\/sec\ncpu-clock            4339072593247  # 4339.073 seconds\ntask-clock           4339083116128  # 4339.083 seconds\npage faults          137059         # 31.587\/sec\ncontext switches     34159          # 7.872\/sec\ncpu migrations       284            # 0.065\/sec\nmajor page faults    0              # 0.000\/sec\nminor page faults    137059         # 31.587\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             2449320104575  # 122.253 branches per 1000 inst\nbranch misses        290139523295   # 11.85% branch miss\nconditional          2449320117951  # 122.253 conditional branches per 1000 inst\nindirect             366867368214   # 18.312 indirect branches per 1000 inst\nslots                21837969535640 #\nretiring             8918818055099  # 40.8% (40.8%)\n-- ucode             736827206      #     0.0%\n-- fastpath          8918081227893  #    40.8%\nfrontend             2363902682752  # 10.8% (10.8%)\n-- latency           1643610505164  #     7.5%\n-- bandwidth         720292177588   #     3.3%\nbackend              1573606838551  #  7.2% ( 7.2%) low\n-- cpu               1403575395407  #     6.4%\n-- memory            170031443144   #     0.8%\nspeculation          9117519044335  # 41.8% (41.8%) high\n-- branch mispredict 9117374039133  #    41.8%\n-- pipeline restart  145005202      #     0.0%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           11206702565317 # 2.45 GHz\ninstructions         15216889946882 # 1.36 IPC\nl2 access            304845877      # 0.031 l2 access per 1000 inst\nl2 miss              89427914       # 29.34% l2 miss\n<\/code><\/pre>\n\n\n\n<p>Process overview shows m-queens.bin<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>399 processes\n\t 48 m-queens.bin         47213.12     1.48\n\t 68 clinfo                  17.20     5.64\n\t 38 vulkaninfo               0.95     1.31\n\t  6 glxinfo:gdrv0            0.12     0.04\n\t  6 glxinfo:gl0              0.12     0.04\n\t  4 vulkani:disk$0           0.10     0.13\n\t  6 php                      0.06     0.07\n\t  2 glxinfo                  0.06     0.02\n\t  2 glxinfo:cs0              0.06     0.02\n\t  2 glxinfo:disk$0           0.06     0.02\n\t  2 glxinfo:sh0              0.06     0.02\n\t  2 glxinfo:shlo0            0.06     0.02\n\t  2 llvmpipe-0               0.05     0.07\n\t  2 llvmpipe-1               0.05     0.07\n\t  2 llvmpipe-10              0.05     0.07\n\t  2 llvmpipe-11              0.05     0.07\n\t  2 llvmpipe-12              0.05     0.07\n\t  2 llvmpipe-13              0.05     0.07\n\t  2 llvmpipe-14              0.05     0.07\n\t  2 llvmpipe-15              0.05     0.07\n\t  2 llvmpipe-2               0.05     0.07\n\t  2 llvmpipe-3               0.05     0.07\n\t  2 llvmpipe-4               0.05     0.07\n\t  2 llvmpipe-5               0.05     0.07\n\t  2 llvmpipe-6               0.05     0.07\n\t  2 llvmpipe-7               0.05     0.07\n\t  2 llvmpipe-8               0.05     0.07\n\t  2 llvmpipe-9               0.05     0.07\n\t  6 clang                    0.05     0.06\n\t  3 rocminfo                 0.03     0.00\n\t  1 lspci                    0.00     0.02\n\t  1 ps                       0.00     0.01\n\t 82 sh                       0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t 13 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  3 m-queens                 0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 gmain                    0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dconf worker             0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Computation structure is simple<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      115396) m-queens         cpu=14 start=5.71  finish=67.43\n        115397) m-queens.bin     cpu=1 start=5.71  finish=67.43\n          115398) m-queens.bin     cpu=10 start=5.71  finish=67.43\n          115399) m-queens.bin     cpu=3 start=5.71  finish=67.43\n          115400) m-queens.bin     cpu=15 start=5.71  finish=67.43\n          115401) m-queens.bin     cpu=6 start=5.71  finish=67.43\n          115402) m-queens.bin     cpu=0 start=5.71  finish=67.43\n          115403) m-queens.bin     cpu=5 start=5.71  finish=67.43\n          115404) m-queens.bin     cpu=4 start=5.71  finish=67.43\n          115405) m-queens.bin     cpu=8 start=5.71  finish=67.43\n          115406) m-queens.bin     cpu=12 start=5.71  finish=67.43\n          115407) m-queens.bin     cpu=2 start=5.71  finish=67.43\n          115408) m-queens.bin     cpu=11 start=5.71  finish=67.42\n          115409) m-queens.bin     cpu=7 start=5.71  finish=67.42\n          115410) m-queens.bin     cpu=13 start=5.71  finish=67.42\n          115411) m-queens.bin     cpu=9 start=5.71  finish=67.42\n          115412) m-queens.bin     cpu=14 start=5.71  finish=67.42\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>An OpenMP program for solving the n-queens problem This runs ~4x slower than n-queens but not clear if the board sizes are the same. Otherwise all processes on all cores. Topdown profile shows similar levels of frontend stalls and retiring. <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/m-queens\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-1250","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1250","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=1250"}],"version-history":[{"count":3,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1250\/revisions"}],"predecessor-version":[{"id":1281,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1250\/revisions\/1281"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=1250"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}