{"id":2041,"date":"2024-03-07T03:45:24","date_gmt":"2024-03-07T03:45:24","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=2041"},"modified":"2024-03-07T12:44:14","modified_gmt":"2024-03-07T12:44:14","slug":"ffte","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/ffte\/","title":{"rendered":"ffte"},"content":{"rendered":"\n<p>A package computing discrete fourier transforms. Overall a quick running test.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-25.png\" alt=\"\" class=\"wp-image-2045\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-25.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-25-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-25-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile doesn&#8217;t have many data points but shows somewhat high backend stalls.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-25.png\" alt=\"\" class=\"wp-image-2047\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-25.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-25-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-25-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics confirm the backend stalls. The rest of the code has a lot of floating point and low frontend stalls.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              28.735\non_cpu               0.357          # 5.72 \/ 16 cores\nutime                162.839\nstime                1.511\nnvcsw                2031           # 48.45%\nnivcsw               2161           # 51.55%\ninblock              0              # 0.00\/sec\nonblock              12576          # 437.65\/sec\ncpu-clock            164440980158   # 164.441 seconds\ntask-clock           164446588922   # 164.447 seconds\npage faults          344700         # 2096.121\/sec\ncontext switches     4162           # 25.309\/sec\ncpu migrations       267            # 1.624\/sec\nmajor page faults    2              # 0.012\/sec\nminor page faults    344698         # 2096.109\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             22525014483    # 29.306 branches per 1000 inst\nbranch misses        142525357      # 0.63% branch miss\nconditional          21315112292    # 27.731 conditional branches per 1000 inst\nindirect             46909573       # 0.061 indirect branches per 1000 inst\ncpu-cycles           645340177349   # 1.47 GHz\ninstructions         766939873997   # 1.19 IPC\nslots                1295578399398  #\nretiring             253149782597   # 19.5% (23.4%)\n-- ucode             63790386       #     0.0%\n-- fastpath          253085992211   #    19.5%\nfrontend             18599786075    #  1.4% ( 1.7%) low\n-- latency           15794612688    #     1.2%\n-- bandwidth         2805173387     #     0.2%\nbackend              808271511988   # 62.4% (74.8%) high\n-- cpu               255543083595   #    19.7%\n-- memory            552728428393   #    42.7%\nspeculation          975977428      #  0.1% ( 0.1%) low\n-- branch mispredict 930800943      #     0.1%\n-- pipeline restart  45176485       #     0.0%\nsmt-contention       214580577475   # 16.6% ( 0.0%)\ncpu-cycles           643126094843   # 1.49 GHz\ninstructions         769855395585   # 1.20 IPC\ninstructions         254848664760   # 26.837 l2 access per 1000 inst\nl2 hit from l1       5626317105     # 27.97% l2 miss\nl2 miss from l1      1342573404     #\nl2 hit from l2 pf    642580883      #\nl3 hit from l2 pf    236038076      #\nl3 miss from l2 pf   334478930      #\ninstructions         256599436858   # 606.108 float per 1000 inst\nfloat 512            56             # 0.000 AVX-512 per 1000 inst\nfloat 256            626            # 0.000 AVX-256 per 1000 inst\nfloat 128            155527078471   # 606.108 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\ninstructions         768061234133   #\nopcache              68672979636    # 89.411 opcache per 1000 inst\nopcache miss         1051822848     #  1.5% opcache miss rate\nl1 dTLB miss         1192723295     # 1.553 L1 dTLB per 1000 inst\nl2 dTLB miss         471669429      # 0.614 L2 dTLB per 1000 inst\ninstructions         768451506345   #\nicache               1985334320     # 2.584 icache per 1000 inst\nicache miss          244207249      # 12.3% icache miss rate\nl1 iTLB miss         9159987        # 0.012 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            16882          # 0.000 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics confirm the dram stalls.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              24.293\non_cpu               0.389          # 6.22 \/ 16 cores\nutime                149.928\nstime                1.105\nnvcsw                1443           # 48.36%\nnivcsw               1541           # 51.64%\ninblock              1584           # 65.20\/sec\nonblock              1128           # 46.43\/sec\ncpu-clock            151138846294   # 151.139 seconds\ntask-clock           151141771739   # 151.142 seconds\npage faults          326444         # 2159.853\/sec\ncontext switches     2928           # 19.373\/sec\ncpu migrations       254            # 1.681\/sec\nmajor page faults    9              # 0.060\/sec\nminor page faults    326434         # 2159.787\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             22322682452    # 29.078 branches per 1000 inst\nbranch misses        65479584       # 0.29% branch miss\nconditional          22322694676    # 29.078 conditional branches per 1000 inst\nindirect             9267166486     # 12.072 indirect branches per 1000 inst\nslots                826186570826   #\nretiring             379324008162   # 45.9% (45.9%)\n-- ucode             4477105158     #     0.5%\n-- fastpath          374846903004   #    45.4%\nfrontend             83216234818    # 10.1% (10.1%)\n-- latency           70320222173    #     8.5%\n-- bandwidth         12896012645    #     1.6%\nbackend              358420851091   # 43.4% (43.4%)\n-- cpu               169928088458   #    20.6%\n-- memory            188492762633   #    22.8%\nspeculation          5807426002     #  0.7% ( 0.7%) low\n-- branch mispredict 5507934937     #     0.7%\n-- pipeline restart  299491065      #     0.0%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           272415717150   # 0.70 GHz\ninstructions         388269199233   # 1.43 IPC\nl2 access            8466469743     # 21.811 l2 access per 1000 inst\nl2 miss              2813444552     # 33.23% l2 miss\ncpu-cycles           272455895543   # 38.5% memory latency\nload stalls          92399258775    #  4.7% l1 bound\nl1 miss              79638938324    #  5.8% l2 bound\nl2 miss              63787703172    #  4.1% l3 bound\nl3 miss              52617857012    # 19.3% dram bound\nstore_stalls         12418643836    #  4.6% store bound\n<\/code><\/pre>\n\n\n\n<p>Process summary gives name of the benchmark thread as speed3d<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>402 processes\n\t 48 speed3d               2605.28     9.12\n\t 68 clinfo                  17.13     5.66\n\t 38 vulkaninfo               1.30     1.34\n\t  4 vulkani:disk$0           0.13     0.15\n\t  6 php                      0.08     0.05\n\t  2 llvmpipe-0               0.07     0.08\n\t  2 llvmpipe-1               0.07     0.08\n\t  2 llvmpipe-10              0.07     0.08\n\t  2 llvmpipe-11              0.07     0.08\n\t  2 llvmpipe-12              0.07     0.08\n\t  2 llvmpipe-13              0.07     0.08\n\t  2 llvmpipe-14              0.07     0.08\n\t  2 llvmpipe-15              0.07     0.08\n\t  2 llvmpipe-2               0.07     0.08\n\t  2 llvmpipe-3               0.07     0.08\n\t  2 llvmpipe-4               0.07     0.08\n\t  2 llvmpipe-5               0.07     0.08\n\t  2 llvmpipe-6               0.07     0.08\n\t  2 llvmpipe-7               0.07     0.08\n\t  2 llvmpipe-8               0.07     0.08\n\t  2 llvmpipe-9               0.07     0.08\n\t  6 glxinfo:gdrv0            0.06     0.13\n\t  6 glxinfo:gl0              0.06     0.13\n\t  6 clang                    0.04     0.08\n\t  2 glxinfo                  0.04     0.05\n\t  2 glxinfo:cs0              0.04     0.05\n\t  2 glxinfo:disk$0           0.04     0.05\n\t  2 glxinfo:sh0              0.04     0.05\n\t  2 glxinfo:shlo0            0.04     0.05\n\t  3 rocminfo                 0.00     0.03\n\t  1 lspci                    0.00     0.03\n\t 82 sh                       0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t 13 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 ffte                     0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 gmain                    0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dconf worker             0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 ps                       0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Computation blocks<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      1032593) ffte             cpu=9 start=5.82  finish=9.27 \n        1032594) ffte             cpu=3 start=5.82  finish=5.82 \n        1032595) speed3d          cpu=6 start=5.82  finish=9.26 \n          1032596) speed3d          cpu=7 start=5.82  finish=9.26 \n          1032597) speed3d          cpu=12 start=5.82  finish=9.26 \n          1032598) speed3d          cpu=10 start=5.82  finish=9.26 \n          1032599) speed3d          cpu=0 start=5.82  finish=9.26 \n          1032600) speed3d          cpu=13 start=5.82  finish=9.26 \n          1032601) speed3d          cpu=9 start=5.82  finish=9.26 \n          1032602) speed3d          cpu=11 start=5.82  finish=9.26 \n          1032603) speed3d          cpu=14 start=5.83  finish=9.26 \n          1032604) speed3d          cpu=8 start=5.83  finish=9.26 \n          1032605) speed3d          cpu=4 start=5.83  finish=9.26 \n          1032606) speed3d          cpu=2 start=5.83  finish=9.26 \n          1032607) speed3d          cpu=5 start=5.83  finish=9.26 \n          1032608) speed3d          cpu=15 start=5.83  finish=9.26 \n          1032609) speed3d          cpu=1 start=5.83  finish=9.26 \n          1032610) speed3d          cpu=3 start=5.83  finish=9.26 \n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>A package computing discrete fourier transforms. Overall a quick running test. Topdown profile doesn&#8217;t have many data points but shows somewhat high backend stalls. AMD metrics confirm the backend stalls. The rest of the code has a lot of floating <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/ffte\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2041","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2041","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=2041"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2041\/revisions"}],"predecessor-version":[{"id":2048,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2041\/revisions\/2048"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=2041"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}