{"id":2241,"date":"2024-05-31T12:18:34","date_gmt":"2024-05-31T12:18:34","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=2241"},"modified":"2024-05-31T12:39:10","modified_gmt":"2024-05-31T12:39:10","slug":"smallpt","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/smallpt\/","title":{"rendered":"smallpt"},"content":{"rendered":"\n<p>A small C++ code for illumination rendering. Looks to be multi-threaded and quickly running.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/05\/systemtime.png\" alt=\"\" class=\"wp-image-2242\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/05\/systemtime.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/05\/systemtime-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/05\/systemtime-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows backend stalls as largest issue with a moderate retirement rate.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/05\/amdtopdown.png\" alt=\"\" class=\"wp-image-2243\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/05\/amdtopdown.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/05\/amdtopdown-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/05\/amdtopdown-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show this is floating point code with small amount of L2 access.  Backend stalls are mostly CPU stalls.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              45.090\non_cpu               0.654          # 10.47 \/ 16 cores\nutime                471.119\nstime                0.870\nnvcsw                1678           # 25.47%\nnivcsw               4911           # 74.53%\ninblock              0              # 0.00\/sec\nonblock              62944          # 1395.95\/sec\ncpu-clock            472009776505   # 472.010 seconds\ntask-clock           472014224464   # 472.014 seconds\npage faults          163926         # 347.290\/sec\ncontext switches     6635           # 14.057\/sec\ncpu migrations       212            # 0.449\/sec\nmajor page faults    12             # 0.025\/sec\nminor page faults    163914         # 347.265\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             290450354693   # 101.312 branches per 1000 inst\nbranch misses        6280963989     # 2.16% branch miss\nconditional          224571701814   # 78.333 conditional branches per 1000 inst\nindirect             8558837143     # 2.985 indirect branches per 1000 inst\ncpu-cycles           1882296852708  # 2.61 GHz\ninstructions         2865046175872  # 1.52 IPC\nslots                3766723443972  #\nretiring             1031193614822  # 27.4% (41.8%)\n-- ucode             740624499      #     0.0%\n-- fastpath          1030452990323  #    27.4%\nfrontend             128086560200   #  3.4% ( 5.2%)\n-- latency           89260572702    #     2.4%\n-- bandwidth         38825987498    #     1.0%\nbackend              1182909332439  # 31.4% (47.9%)\n-- cpu               1030784053449  #    27.4%\n-- memory            152125278990   #     4.0%\nspeculation          124961204508   #  3.3% ( 5.1%)\n-- branch mispredict 122654034594   #     3.3%\n-- pipeline restart  2307169914     #     0.1%\nsmt-contention       1299568541868  # 34.5% ( 0.0%)\ncpu-cycles           1879246306828  # 2.61 GHz\ninstructions         2869706329426  # 1.53 IPC\ninstructions         954570926607   # 0.574 l2 access per 1000 inst\nl2 hit from l1       383137104      # 4.29% l2 miss\nl2 miss from l1      12700086       #\nl2 hit from l2 pf    154085933      #\nl3 hit from l2 pf    5453940        #\nl3 miss from l2 pf   5369977        #\ninstructions         955485420525   # 391.055 float per 1000 inst\nfloat 512            77             # 0.000 AVX-512 per 1000 inst\nfloat 256            586            # 0.000 AVX-256 per 1000 inst\nfloat 128            373647513792   # 391.055 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\ninstructions         2866802424129  #\nopcache              402332512660   # 140.342 opcache per 1000 inst\nopcache miss         1292555935     #  0.3% opcache miss rate\nl1 dTLB miss         31443107       # 0.011 L1 dTLB per 1000 inst\nl2 dTLB miss         5381707        # 0.002 L2 dTLB per 1000 inst\ninstructions         2866800052557  #\nicache               2307657177     # 0.805 icache per 1000 inst\nicache miss          286395407      # 12.4% icache miss rate\nl1 iTLB miss         8608125        # 0.003 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            16974          # 0.000 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics confirm low L2 access and show higher level of branch misprediction<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              46.525\non_cpu               0.673          # 10.77 \/ 16 cores\nutime                500.506\nstime                0.384\nnvcsw                1211           # 19.40%\nnivcsw               5030           # 80.60%\ninblock              4656           # 100.08\/sec\nonblock              51592          # 1108.91\/sec\ncpu-clock            500904808159   # 500.905 seconds\ntask-clock           500907949143   # 500.908 seconds\npage faults          99508          # 198.655\/sec\ncontext switches     6290           # 12.557\/sec\ncpu migrations       222            # 0.443\/sec\nmajor page faults    58             # 0.116\/sec\nminor page faults    99450          # 198.539\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             289123524356   # 101.078 branches per 1000 inst\nbranch misses        6697550758     # 2.32% branch miss\nconditional          289123534756   # 101.078 conditional branches per 1000 inst\nindirect             53838970043    # 18.822 indirect branches per 1000 inst\nslots                2855244531524  #\nretiring             1627251960026  # 57.0% (57.0%) high\n-- ucode             35202779377    #     1.2%\n-- fastpath          1592049180649  #    55.8%\nfrontend             521415453265   # 18.3% (18.3%)\n-- latency           452095257232   #    15.8%\n-- bandwidth         69320196033    #     2.4%\nbackend              297810108968   # 10.4% (10.4%) low\n-- cpu               214954934707   #     7.5%\n-- memory            82855174261    #     2.9%\nspeculation          409638370395   # 14.3% (14.3%) high\n-- branch mispredict 409107060915   #    14.3%\n-- pipeline restart  531309480      #     0.0%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           3296409027952  # 2.19 GHz\ninstructions         5741847939829  # 1.74 IPC\nl2 access            306096009      # 0.094 l2 access per 1000 inst\nl2 miss              67249322       # 21.97% l2 miss\ncpu-cycles           1860051893950  # 16.4% memory latency\nload stalls          304362400662   # 16.3% l1 bound\nl1 miss              798391811      #  0.0% l2 bound\nl2 miss              362458552      #  0.0% l3 bound\nl3 miss              166527645      #  0.0% dram bound\nstore_stalls         143931493      #  0.0% store bound\n<\/code><\/pre>\n\n\n\n<p>Process profile shows the smallpt-rendere process is primary process<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>384 processes\n\t 48 smallpt-rendere       7485.92     2.08\n\t 68 clinfo                  15.87     6.24\n\t 38 vulkaninfo               1.15     1.14\n\t  4 vulkani:disk$0           0.12     0.12\n\t  6 php                      0.06     0.06\n\t  2 llvmpipe-0               0.06     0.06\n\t  2 llvmpipe-1               0.06     0.06\n\t  2 llvmpipe-10              0.06     0.06\n\t  2 llvmpipe-11              0.06     0.06\n\t  2 llvmpipe-12              0.06     0.06\n\t  2 llvmpipe-13              0.06     0.06\n\t  2 llvmpipe-14              0.06     0.06\n\t  2 llvmpipe-15              0.06     0.06\n\t  2 llvmpipe-2               0.06     0.06\n\t  2 llvmpipe-3               0.06     0.06\n\t  2 llvmpipe-4               0.06     0.06\n\t  2 llvmpipe-5               0.06     0.06\n\t  2 llvmpipe-6               0.06     0.06\n\t  2 llvmpipe-7               0.06     0.06\n\t  2 llvmpipe-8               0.06     0.06\n\t  2 llvmpipe-9               0.06     0.06\n\t  6 clang                    0.05     0.07\n\t  3 rocminfo                 0.00     0.03\n\t  1 lspci                    0.00     0.02\n\t 84 sh                       0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t 11 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 glxinfo                  0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  3 gmain                    0.00     0.00\n\t  3 smallpt                  0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 dconf worker             0.00     0.00\n\t  2 grep                     0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 setterm                  0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 ps                       0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Process tree shows following pattern for core computation blocks.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      230105) smallpt          cpu=1 start=5.51  finish=15.35\n        230106) smallpt-rendere  cpu=11 start=5.51  finish=15.35\n          230107) smallpt-rendere  cpu=6 start=5.52  finish=15.35\n          230108) smallpt-rendere  cpu=12 start=5.52  finish=15.35\n          230109) smallpt-rendere  cpu=8 start=5.52  finish=15.35\n          230110) smallpt-rendere  cpu=15 start=5.52  finish=15.35\n          230111) smallpt-rendere  cpu=5 start=5.52  finish=15.35\n          230112) smallpt-rendere  cpu=2 start=5.52  finish=15.35\n          230113) smallpt-rendere  cpu=9 start=5.52  finish=15.35\n          230114) smallpt-rendere  cpu=3 start=5.52  finish=15.35\n          230115) smallpt-rendere  cpu=13 start=5.52  finish=15.35\n          230116) smallpt-rendere  cpu=4 start=5.52  finish=15.35\n          230117) smallpt-rendere  cpu=14 start=5.52  finish=15.35\n          230118) smallpt-rendere  cpu=7 start=5.52  finish=15.35\n          230119) smallpt-rendere  cpu=10 start=5.53  finish=15.35\n          230120) smallpt-rendere  cpu=0 start=5.53  finish=15.35\n          230121) smallpt-rendere  cpu=1 start=5.53  finish=15.35\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>A small C++ code for illumination rendering. Looks to be multi-threaded and quickly running. Topdown profile shows backend stalls as largest issue with a moderate retirement rate. AMD metrics show this is floating point code with small amount of L2 <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/smallpt\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2241","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2241","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=2241"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2241\/revisions"}],"predecessor-version":[{"id":2256,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2241\/revisions\/2256"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=2241"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}