{"id":1955,"date":"2024-03-03T19:03:32","date_gmt":"2024-03-03T19:03:32","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=1955"},"modified":"2024-03-04T02:33:40","modified_gmt":"2024-03-04T02:33:40","slug":"neat","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/neat\/","title":{"rendered":"neat"},"content":{"rendered":"\n<p>Nebular Empirical Analysis Tool. There is one test which runs in about 30 seconds. Look like it bounces between running single-threaded and on all cores. This fails to run on my Intel box returning in ~10 seconds with a non-zero exit status. Not clear from the test logs though I also notice some out-of-memory (OOM) events in \/var\/log\/syslog.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-15.png\" alt=\"\" class=\"wp-image-1970\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-15.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-15-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-15-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile suggests more backend stalls and low frontend stalls.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-16.png\" alt=\"\" class=\"wp-image-1972\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-16.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-16-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-16-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics confirm the topdown nature of the workload.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              112.938\non_cpu               0.461          # 7.37 \/ 16 cores\nutime                787.010\nstime                45.727\nnvcsw                79016          # 90.55%\nnivcsw               8251           # 9.45%\ninblock              33008          # 292.27\/sec\nonblock              13320          # 117.94\/sec\ncpu-clock            832926827655   # 832.927 seconds\ntask-clock           832962259010   # 832.962 seconds\npage faults          19058360       # 22880.220\/sec\ncontext switches     87655          # 105.233\/sec\ncpu migrations       333            # 0.400\/sec\nmajor page faults    296            # 0.355\/sec\nminor page faults    19058064       # 22879.865\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             836781073100   # 157.534 branches per 1000 inst\nbranch misses        7786272785     # 0.93% branch miss\nconditional          731949090194   # 137.798 conditional branches per 1000 inst\nindirect             25034090579    # 4.713 indirect branches per 1000 inst\ncpu-cycles           3294228431732  # 1.81 GHz\ninstructions         5301843834321  # 1.61 IPC\nslots                6593246373414  #\nretiring             1759327788324  # 26.7% (34.7%)\n-- ucode             694038870      #     0.0%\n-- fastpath          1758633749454  #    26.7%\nfrontend             827133220959   # 12.5% (16.3%)\n-- latency           467093377584   #     7.1%\n-- bandwidth         360039843375   #     5.5%\nbackend              2399471110050  # 36.4% (47.4%)\n-- cpu               1671050799446  #    25.3%\n-- memory            728420310604   #    11.0%\nspeculation          77017903018    #  1.2% ( 1.5%)\n-- branch mispredict 75839031117    #     1.2%\n-- pipeline restart  1178871901     #     0.0%\nsmt-contention       1530270612228  # 23.2% ( 0.0%)\ncpu-cycles           3295802237397  # 1.82 GHz\ninstructions         5309014725494  # 1.61 IPC\ninstructions         1766553771091  # 34.525 l2 access per 1000 inst\nl2 hit from l1       32425961726    # 11.15% l2 miss\nl2 miss from l1      1878534951     #\nl2 hit from l2 pf    23639853312    #\nl3 hit from l2 pf    4162983178     #\nl3 miss from l2 pf   761717911      #\ninstructions         1766957621853  # 251.625 float per 1000 inst\nfloat 512            66             # 0.000 AVX-512 per 1000 inst\nfloat 256            424            # 0.000 AVX-256 per 1000 inst\nfloat 128            444610768510   # 251.625 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\ninstructions         5303189358253  #\nopcache              743657481583   # 140.228 opcache per 1000 inst\nopcache miss         24690402705    #  3.3% opcache miss rate\nl1 dTLB miss         72370956534    # 13.647 L1 dTLB per 1000 inst\nl2 dTLB miss         828723050      # 0.156 L2 dTLB per 1000 inst\ninstructions         5302796797585  #\nicache               45850945138    # 8.647 icache per 1000 inst\nicache miss          4587964436     # 10.0% icache miss rate\nl1 iTLB miss         355368054      # 0.067 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            19286          # 0.000 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<p>Process overview shows the neat process taking most of the time.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>399 processes\n\t 51 neat                 12603.04   643.04\n\t 68 clinfo                  18.20     4.57\n\t 38 vulkaninfo               1.12     1.50\n\t  4 vulkani:disk$0           0.12     0.15\n\t  6 glxinfo:gdrv0            0.10     0.08\n\t  6 glxinfo:gl0              0.10     0.08\n\t  6 php                      0.07     0.07\n\t  2 llvmpipe-0               0.06     0.08\n\t  2 llvmpipe-1               0.06     0.08\n\t  2 llvmpipe-10              0.06     0.08\n\t  2 llvmpipe-11              0.06     0.08\n\t  2 llvmpipe-12              0.06     0.08\n\t  2 llvmpipe-13              0.06     0.08\n\t  2 llvmpipe-14              0.06     0.08\n\t  2 llvmpipe-15              0.06     0.08\n\t  2 llvmpipe-2               0.06     0.08\n\t  2 llvmpipe-3               0.06     0.08\n\t  2 llvmpipe-4               0.06     0.08\n\t  2 llvmpipe-5               0.06     0.08\n\t  2 llvmpipe-6               0.06     0.08\n\t  2 llvmpipe-7               0.06     0.08\n\t  2 llvmpipe-8               0.06     0.08\n\t  2 llvmpipe-9               0.06     0.08\n\t  6 clang                    0.06     0.06\n\t  2 glxinfo                  0.05     0.04\n\t  2 glxinfo:cs0              0.05     0.04\n\t  2 glxinfo:disk$0           0.05     0.04\n\t  2 glxinfo:sh0              0.05     0.04\n\t  2 glxinfo:shlo0            0.05     0.04\n\t  3 rocminfo                 0.03     0.00\n\t  1 lspci                    0.00     0.02\n\t  1 ps                       0.00     0.01\n\t 82 sh                       0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t 10 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  4 gmain                    0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 dconf worker             0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Computation blocks<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      911524) neat             cpu=8 start=5.71  finish=37.89\n        911525) neat             cpu=3 start=5.71  finish=36.73\n          911526) neat             cpu=4 start=5.82  finish=36.73\n          911527) neat             cpu=13 start=5.82  finish=36.73\n          911528) neat             cpu=6 start=5.82  finish=36.73\n          911529) neat             cpu=2 start=5.82  finish=36.73\n          911530) neat             cpu=1 start=5.82  finish=36.73\n          911531) neat             cpu=15 start=5.82  finish=36.73\n          911532) neat             cpu=0 start=5.82  finish=36.73\n          911533) neat             cpu=11 start=5.82  finish=36.73\n          911534) neat             cpu=14 start=5.82  finish=36.73\n          911535) neat             cpu=9 start=5.82  finish=36.73\n          911536) neat             cpu=10 start=5.82  finish=36.73\n          911537) neat             cpu=5 start=5.82  finish=36.73\n          911538) neat             cpu=12 start=5.82  finish=36.73\n          911539) neat             cpu=7 start=5.82  finish=36.73\n          911540) neat             cpu=8 start=5.82  finish=36.73\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Nebular Empirical Analysis Tool. There is one test which runs in about 30 seconds. Look like it bounces between running single-threaded and on all cores. This fails to run on my Intel box returning in ~10 seconds with a non-zero <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/neat\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-1955","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1955","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=1955"}],"version-history":[{"count":3,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1955\/revisions"}],"predecessor-version":[{"id":1974,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1955\/revisions\/1974"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=1955"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}