{"id":2534,"date":"2024-08-08T12:38:14","date_gmt":"2024-08-08T12:38:14","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=2534"},"modified":"2024-08-08T22:06:46","modified_gmt":"2024-08-08T22:06:46","slug":"ttsiod-renderer","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/ttsiod-renderer\/","title":{"rendered":"ttsiod-renderer"},"content":{"rendered":"\n<p>A 3D software renderer that uses OpenMP and Intel Thread Building Blocks. This test has one workload.  The workload is multi-threaded and runs quickly.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/08\/systemtime-1.png\" alt=\"\" class=\"wp-image-2541\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/08\/systemtime-1.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/08\/systemtime-1-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/08\/systemtime-1-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown metrics are dominated by backend stalls.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/08\/amdtopdown.png\" alt=\"\" class=\"wp-image-2540\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/08\/amdtopdown.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/08\/amdtopdown-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/08\/amdtopdown-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics confirm the topdown stalls and is more CPU-bound than memory bound. This is floating point code.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              96.469\non_cpu               0.824          # 13.19 \/ 16 cores\nutime                1271.434\nstime                0.828\nnvcsw                31883          # 75.38%\nnivcsw               10411          # 24.62%\ninblock              8              # 0.08\/sec\nonblock              12632          # 130.94\/sec\ncpu-clock            1272444095522  # 1272.444 seconds\ntask-clock           1272459019517  # 1272.459 seconds\npage faults          145878         # 114.643\/sec\ncontext switches     42600          # 33.478\/sec\ncpu migrations       279            # 0.219\/sec\nmajor page faults    2              # 0.002\/sec\nminor page faults    145876         # 114.641\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             391349607640   # 89.894 branches per 1000 inst\nbranch misses        10754257318    # 2.75% branch miss\nconditional          305970757981   # 70.283 conditional branches per 1000 inst\nindirect             11845354997    # 2.721 indirect branches per 1000 inst\ncpu-cycles           5344543691186  # 3.46 GHz\ninstructions         4348276881357  # 0.81 IPC\nslots                10693312025760 #\nretiring             1604929468240  # 15.0% (22.3%)\n-- ucode             12995719283    #     0.1%\n-- fastpath          1591933748957  #    14.9%\nfrontend             574101577622   #  5.4% ( 8.0%)\n-- latency           236084046810   #     2.2%\n-- bandwidth         338017530812   #     3.2%\nbackend              4809901133590  # 45.0% (66.9%)\n-- cpu               2734706267287  #    25.6%\n-- memory            2075194866303  #    19.4%\nspeculation          199470645645   #  1.9% ( 2.8%)\n-- branch mispredict 198054692143   #     1.9%\n-- pipeline restart  1415953502     #     0.0%\nsmt-contention       3504894008324  # 32.8% ( 0.0%)\ncpu-cycles           5328975712732  # 3.46 GHz\ninstructions         4357225211014  # 0.82 IPC\ninstructions         1451050978351  # 13.451 l2 access per 1000 inst\nl2 hit from l1       13882641022    # 41.79% l2 miss\nl2 miss from l1      4418378075     #\nl2 hit from l2 pf    1897404366     #\nl3 hit from l2 pf    3504862131     #\nl3 miss from l2 pf   233107773      #\ninstructions         1447220853101  # 320.239 float per 1000 inst\nfloat 512            44             # 0.000 AVX-512 per 1000 inst\nfloat 256            604            # 0.000 AVX-256 per 1000 inst\nfloat 128            463456838102   # 320.239 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\ninstructions         4350710944191  #\nopcache              553340397942   # 127.184 opcache per 1000 inst\nopcache miss         4162732235     #  0.8% opcache miss rate\nl1 dTLB miss         16224971580    # 3.729 L1 dTLB per 1000 inst\nl2 dTLB miss         585000071      # 0.134 L2 dTLB per 1000 inst\ninstructions         4350509639256  #\nicache               6002763036     # 1.380 icache per 1000 inst\nicache miss          932442359      # 15.5% icache miss rate\nl1 iTLB miss         3629974        # 0.001 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            17764          # 0.000 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics show most memory stalls at L2 level.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              100.551\non_cpu               0.827          # 13.24 \/ 16 cores\nutime                1329.997\nstime                0.842\nnvcsw                32996          # 67.94%\nnivcsw               15571          # 32.06%\ninblock              301688         # 3000.34\/sec\nonblock              1288           # 12.81\/sec\ncpu-clock            1330954352419  # 1330.954 seconds\ntask-clock           1330970775363  # 1330.971 seconds\npage faults          113745         # 85.460\/sec\ncontext switches     48876          # 36.722\/sec\ncpu migrations       286            # 0.215\/sec\nmajor page faults    1595           # 1.198\/sec\nminor page faults    112150         # 84.262\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             343392892437   # 81.445 branches per 1000 inst\nbranch misses        10999047114    # 3.20% branch miss\nconditional          343392903733   # 81.445 conditional branches per 1000 inst\nindirect             86915724488    # 20.614 indirect branches per 1000 inst\nslots                7225154861570  #\nretiring             2377199852848  # 32.9% (32.9%)\n-- ucode             222923249497   #     3.1%\n-- fastpath          2154276603351  #    29.8%\nfrontend             856658332219   # 11.9% (11.9%)\n-- latency           647601467795   #     9.0%\n-- bandwidth         209056864424   #     2.9%\nbackend              3421909787471  # 47.4% (47.4%)\n-- cpu               3113118947117  #    43.1%\n-- memory            308790840354   #     4.3%\nspeculation          602695009453   #  8.3% ( 8.3%)\n-- branch mispredict 591256229546   #     8.2%\n-- pipeline restart  11438779907    #     0.2%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           2317565854634  # 1.46 GHz\ninstructions         2312562929728  # 1.00 IPC\nl2 access            42519533920    # 18.604 l2 access per 1000 inst\nl2 miss              21623683450    # 50.86% l2 miss\ncpu-cycles           2291687404134  # 13.2% memory latency\nload stalls          257807824450   #  0.3% l1 bound\nl1 miss              251638988470   #  7.3% l2 bound\nl2 miss              83527035396    #  3.5% l3 bound\nl3 miss              2257341377     #  0.1% dram bound\nstore_stalls         44052125709    #  1.9% store bound\n<\/code><\/pre>\n\n\n\n<p>Process statistics show time spent in renderer process<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>352 processes\n\t 48 renderer             20199.68     2.56\n\t 36 clinfo                   4.11     2.07\n\t 38 vulkaninfo               1.14     1.15\n\t  4 vulkani:disk$0           0.12     0.13\n\t  2 llvmpipe-0               0.06     0.06\n\t  2 llvmpipe-1               0.06     0.06\n\t  2 llvmpipe-10              0.06     0.06\n\t  2 llvmpipe-11              0.06     0.06\n\t  2 llvmpipe-12              0.06     0.06\n\t  2 llvmpipe-13              0.06     0.06\n\t  2 llvmpipe-14              0.06     0.06\n\t  2 llvmpipe-15              0.06     0.06\n\t  2 llvmpipe-2               0.06     0.06\n\t  2 llvmpipe-3               0.06     0.06\n\t  2 llvmpipe-4               0.06     0.06\n\t  2 llvmpipe-5               0.06     0.06\n\t  2 llvmpipe-6               0.06     0.06\n\t  2 llvmpipe-7               0.06     0.06\n\t  2 llvmpipe-8               0.06     0.06\n\t  2 llvmpipe-9               0.06     0.06\n\t  6 clang                    0.05     0.07\n\t  6 php                      0.05     0.07\n\t  3 rocminfo                 0.03     0.00\n\t  1 lspci                    0.00     0.01\n\t  1 ps                       0.00     0.01\n\t 84 sh                       0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  7 gsettings                0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 glxinfo                  0.00     0.00\n\t  5 gmain                    0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  4 dconf worker             0.00     0.00\n\t  3 ttsiod-renderer          0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 grep                     0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 setterm                  0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>The computation blocks are simple<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      11887) ttsiod-renderer  cpu=11 start=5.14  finish=32.24\n        11888) renderer         cpu=14 start=5.14  finish=32.24\n          11889) renderer         cpu=6 start=5.21  finish=32.24\n          11890) renderer         cpu=15 start=5.21  finish=32.24\n          11891) renderer         cpu=2 start=5.21  finish=32.24\n          11892) renderer         cpu=4 start=5.21  finish=32.24\n          11893) renderer         cpu=1 start=5.21  finish=32.24\n          11894) renderer         cpu=13 start=5.21  finish=32.24\n          11895) renderer         cpu=11 start=5.21  finish=32.24\n          11896) renderer         cpu=0 start=5.21  finish=32.24\n          11897) renderer         cpu=3 start=5.21  finish=32.24\n          11898) renderer         cpu=8 start=5.21  finish=32.24\n          11899) renderer         cpu=7 start=5.21  finish=32.24\n          11900) renderer         cpu=12 start=5.21  finish=32.24\n          11901) renderer         cpu=9 start=5.21  finish=32.24\n          11902) renderer         cpu=10 start=5.21  finish=32.24\n          11903) renderer         cpu=5 start=5.21  finish=32.24\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>A 3D software renderer that uses OpenMP and Intel Thread Building Blocks. This test has one workload. The workload is multi-threaded and runs quickly. Topdown metrics are dominated by backend stalls. AMD metrics confirm the topdown stalls and is more <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/ttsiod-renderer\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2534","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2534","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=2534"}],"version-history":[{"count":3,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2534\/revisions"}],"predecessor-version":[{"id":2542,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2534\/revisions\/2542"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=2534"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}