{"id":2194,"date":"2024-03-24T20:24:24","date_gmt":"2024-03-24T20:24:24","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=2194"},"modified":"2024-03-29T10:25:26","modified_gmt":"2024-03-29T10:25:26","slug":"quadray","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/quadray\/","title":{"rendered":"quadray"},"content":{"rendered":"\n<p>A real-time vector ray-tracing engine. There are eight subtests. These run on all cores.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-47.png\" alt=\"\" class=\"wp-image-2226\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-47.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-47-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-47-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows these mostly backend bound with few frontend stalls.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-49.png\" alt=\"\" class=\"wp-image-2228\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-49.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-49-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-49-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show backend stalls split between CPU and memory. There is not much floating point code.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              625.743\non_cpu               0.714          # 11.42 \/ 16 cores\nutime                7119.296\nstime                28.262\nnvcsw                543310         # 80.96%\nnivcsw               127794         # 19.04%\ninblock              0              # 0.00\/sec\nonblock              14424          # 23.05\/sec\ncpu-clock            7147322701511  # 7147.323 seconds\ntask-clock           7147938611459  # 7147.939 seconds\npage faults          9605265        # 1343.781\/sec\ncontext switches     674000         # 94.293\/sec\ncpu migrations       2367           # 0.331\/sec\nmajor page faults    77             # 0.011\/sec\nminor page faults    9605188        # 1343.770\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             2213138560582  # 83.291 branches per 1000 inst\nbranch misses        17501902038    # 0.79% branch miss\nconditional          1933794071567  # 72.778 conditional branches per 1000 inst\nindirect             163873148      # 0.006 indirect branches per 1000 inst\ncpu-cycles           26735053193850 # 2.68 GHz\ninstructions         26540275610856 # 0.99 IPC\nslots                53478182239044 #\nretiring             9404982091633  # 17.6% (23.2%)\n-- ucode             23083359398    #     0.0%\n-- fastpath          9381898732235  #    17.5%\nfrontend             1432749254238  #  2.7% ( 3.5%) low\n-- latency           913772041278   #     1.7%\n-- bandwidth         518977212960   #     1.0%\nbackend              29314007637038 # 54.8% (72.2%) high\n-- cpu               15535825778777 #    29.1%\n-- memory            13778181858261 #    25.8%\nspeculation          433053049363   #  0.8% ( 1.1%)\n-- branch mispredict 301142558895   #     0.6%\n-- pipeline restart  131910490468   #     0.2%\nsmt-contention       12893301100630 # 24.1% ( 0.0%)\ncpu-cycles           26684097340491 # 2.63 GHz\ninstructions         26508111828655 # 0.99 IPC\ninstructions         8838043367721  # 217.120 l2 access per 1000 inst\nl2 hit from l1       1452674025909  # 0.93% l2 miss\nl2 miss from l1      11743811973    #\nl2 hit from l2 pf    460145048721   #\nl3 hit from l2 pf    4891501169     #\nl3 miss from l2 pf   1201823009     #\ninstructions         8829963878485  # 31.739 float per 1000 inst\nfloat 512            85             # 0.000 AVX-512 per 1000 inst\nfloat 256            468            # 0.000 AVX-256 per 1000 inst\nfloat 128            280256805592   # 31.739 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\ninstructions         26550065692299 #\nopcache              4885638845761  # 184.016 opcache per 1000 inst\nopcache miss         188790824089   #  3.9% opcache miss rate\nl1 dTLB miss         218391826788   # 8.226 L1 dTLB per 1000 inst\nl2 dTLB miss         2754869850     # 0.104 L2 dTLB per 1000 inst\ninstructions         26547412969503 #\nicache               244980878191   # 9.228 icache per 1000 inst\nicache miss          21417678897    #  8.7% icache miss rate\nl1 iTLB miss         9824289        # 0.000 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            42902          # 0.000 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics show most of the backend memory stalls are L1 and L2<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              628.053\non_cpu               0.558          # 8.93 \/ 16 cores\nutime                5593.746\nstime                15.852\nnvcsw                225970         # 79.57%\nnivcsw               58003          # 20.43%\ninblock              3232           # 5.15\/sec\nonblock              2664           # 4.24\/sec\ncpu-clock            5608322435946  # 5608.322 seconds\ntask-clock           5608584641226  # 5608.585 seconds\npage faults          9542595        # 1701.427\/sec\ncontext switches     286886         # 51.151\/sec\ncpu migrations       3881           # 0.692\/sec\nmajor page faults    104            # 0.019\/sec\nminor page faults    9542491        # 1701.408\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             1680443054208  # 85.552 branches per 1000 inst\nbranch misses        10375087163    # 0.62% branch miss\nconditional          1680443078208  # 85.552 conditional branches per 1000 inst\nindirect             515691696613   # 26.254 indirect branches per 1000 inst\nslots                21897839692094 #\nretiring             9731326497537  # 44.4% (44.4%)\n-- ucode             413075777081   #     1.9%\n-- fastpath          9318250720456  #    42.6%\nfrontend             4331544860416  # 19.8% (19.8%)\n-- latency           3653140251914  #    16.7%\n-- bandwidth         678404608502   #     3.1%\nbackend              7277613673818  # 33.2% (33.2%)\n-- cpu               2606213574374  #    11.9%\n-- memory            4671400099444  #    21.3%\nspeculation          604514458652   #  2.8% ( 2.8%)\n-- branch mispredict 537968863823   #     2.5%\n-- pipeline restart  66545594829    #     0.3%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           13966210798435 # 1.34 GHz\ninstructions         19034644351405 # 1.36 IPC\nl2 access            516811085855   # 52.304 l2 access per 1000 inst\nl2 miss              5176108418     # 1.00% l2 miss\ncpu-cycles           7239863197863  # 37.8% memory latency\nload stalls          2649725159323  # 16.7% l1 bound\nl1 miss              1442347735969  # 19.8% l2 bound\nl2 miss              9228044552     #  0.1% l3 bound\nl3 miss              4311222790     #  0.1% dram bound\nstore_stalls         85504284960    #  1.2% store bound\n<\/code><\/pre>\n\n\n\n<p>Process overview shows RooT.x64f64 as the primary process<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>796 processes\n\t408 RooT.x64f64          121183.97   394.41\n\t 68 clinfo                  15.21     7.64\n\t 38 vulkaninfo               1.33     1.15\n\t  6 php                      0.14     0.15\n\t  4 vulkani:disk$0           0.14     0.12\n\t  6 glxinfo:gdrv0            0.09     0.09\n\t  6 glxinfo:gl0              0.09     0.09\n\t  6 clang                    0.08     0.04\n\t  2 llvmpipe-0               0.07     0.06\n\t  2 llvmpipe-1               0.07     0.06\n\t  2 llvmpipe-10              0.07     0.06\n\t  2 llvmpipe-11              0.07     0.06\n\t  2 llvmpipe-12              0.07     0.06\n\t  2 llvmpipe-13              0.07     0.06\n\t  2 llvmpipe-14              0.07     0.06\n\t  2 llvmpipe-15              0.07     0.06\n\t  2 llvmpipe-2               0.07     0.06\n\t  2 llvmpipe-3               0.07     0.06\n\t  2 llvmpipe-4               0.07     0.06\n\t  2 llvmpipe-5               0.07     0.06\n\t  2 llvmpipe-6               0.07     0.06\n\t  2 llvmpipe-7               0.07     0.06\n\t  2 llvmpipe-8               0.07     0.06\n\t  2 llvmpipe-9               0.07     0.06\n\t  2 glxinfo                  0.05     0.03\n\t  2 glxinfo:cs0              0.05     0.03\n\t  2 glxinfo:disk$0           0.05     0.03\n\t  2 glxinfo:sh0              0.05     0.03\n\t  2 glxinfo:shlo0            0.05     0.03\n\t  3 rocminfo                 0.03     0.00\n\t  1 lspci                    0.01     0.01\n\t  1 ps                       0.00     0.01\n\t 97 sh                       0.00     0.00\n\t 24 quadray                  0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t 13 gsettings                0.00     0.00\n\t  9 systemd-detect-          0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  3 gmain                    0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Computation blocks<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      22741) quadray          cpu=13 start=5.64  finish=25.78\n        22742) RooT.x64f64      cpu=9 start=5.64  finish=25.78\n          22743) RooT.x64f64      cpu=0 start=5.64  finish=25.77\n          22744) RooT.x64f64      cpu=1 start=5.64  finish=25.77\n          22745) RooT.x64f64      cpu=2 start=5.64  finish=25.77\n          22746) RooT.x64f64      cpu=3 start=5.64  finish=25.77\n          22747) RooT.x64f64      cpu=4 start=5.64  finish=25.77\n          22748) RooT.x64f64      cpu=5 start=5.64  finish=25.77\n          22749) RooT.x64f64      cpu=6 start=5.64  finish=25.77\n          22750) RooT.x64f64      cpu=7 start=5.64  finish=25.77\n          22751) RooT.x64f64      cpu=8 start=5.64  finish=25.77\n          22752) RooT.x64f64      cpu=9 start=5.64  finish=25.77\n          22753) RooT.x64f64      cpu=10 start=5.64  finish=25.77\n          22754) RooT.x64f64      cpu=11 start=5.64  finish=25.77\n          22755) RooT.x64f64      cpu=12 start=5.64  finish=25.77\n          22756) RooT.x64f64      cpu=13 start=5.64  finish=25.77\n          22757) RooT.x64f64      cpu=14 start=5.64  finish=25.77\n          22758) RooT.x64f64      cpu=15 start=5.64  finish=25.77\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>A real-time vector ray-tracing engine. There are eight subtests. These run on all cores. Topdown profile shows these mostly backend bound with few frontend stalls. AMD metrics show backend stalls split between CPU and memory. There is not much floating <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/quadray\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2194","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2194","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=2194"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2194\/revisions"}],"predecessor-version":[{"id":2229,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2194\/revisions\/2229"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=2194"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}