{"id":606,"date":"2024-01-15T17:10:42","date_gmt":"2024-01-15T17:10:42","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=606"},"modified":"2024-01-15T18:03:48","modified_gmt":"2024-01-15T18:03:48","slug":"tesseract","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/tesseract\/","title":{"rendered":"tesseract"},"content":{"rendered":"\n<p>Tesseract is a GPU focused game that tries a set of scenes at different resolutions and reports frames\/sec. As such unclear how much this really tests the CPU so don&#8217;t expect to run more of these  game applications. The progression below uses increasing resolutions and presumably slower waits for GPU. Just a few processes are runable.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-23.png\" alt=\"\" class=\"wp-image-607\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-23.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-23-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-23-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Running on a system with discrete GPU and ROCm running we can also used the amdsmi library to monitor business of AMD graphics and memory.  In yellow below we can see the graphics is kept busier than CPU and the UMC in blue is also shown.  Unfortunately, this uses the amdgpu driver and only seems to work platforms with ROCm.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-24.png\" alt=\"\" class=\"wp-image-612\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-24.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-24-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-24-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown metrics shows mix of frontend stalls and then retiring and backend.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-61.png\" alt=\"\" class=\"wp-image-608\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-61.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-61-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-61-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show only 0.60 on cpu so most all is a GPU application time. There is a moderate L2 access with a higher miss rate.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              940.138\non_cpu               0.037          # 0.60 \/ 16 cores\nutime                448.137\nstime                114.663\nnvcsw                5006246        # 99.31%\nnivcsw               34548          # 0.69%\ninblock              760            # 0.81\/sec\nonblock              15352          # 16.33\/sec\ncpu-clock            555072782963   # 555.073 seconds\ntask-clock           557387113153   # 557.387 seconds\npage faults          1091959        # 1959.068\/sec\ncontext switches     5045214        # 9051.544\/sec\ncpu migrations       6429           # 11.534\/sec\nmajor page faults    783            # 1.405\/sec\nminor page faults    1091176        # 1957.663\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             432496189191   # 164.537 branches per 1000 inst\nbranch misses        12518545903    # 2.89% branch miss\nconditional          312883554453   # 119.032 conditional branches per 1000 inst\nindirect             16879982752    # 6.422 indirect branches per 1000 inst\ncpu-cycles           2297827692401  # 0.10 GHz\ninstructions         4134264086743  # 1.80 IPC\nslots                4592679099624  #\nretiring             1399510657504  # 30.5% (30.6%)\n-- ucode             5332102804     #     0.1%\n-- fastpath          1394178554700  #    30.4%\nfrontend             1631768830655  # 35.5% (35.7%)\n-- latency           1193385343362  #    26.0%\n-- bandwidth         438383487293   #     9.5%\nbackend              1228985655369  # 26.8% (26.9%)\n-- cpu               195338081902   #     4.3%\n-- memory            1033647573467  #    22.5%\nspeculation          306018898077   #  6.7% ( 6.7%)\n-- branch mispredict 301962227802   #     6.6%\n-- pipeline restart  4056670275     #     0.1%\nsmt-contention       26346755888    #  0.6% ( 0.0%)\ncpu-cycles           2081086819075  # 0.10 GHz\ninstructions         3702405530816  # 1.78 IPC\ninstructions         1229550171162  # 41.165 l2 access per 1000 inst\nl2 hit from l1       47072177297    # 25.67% l2 miss\nl2 miss from l1      10767695677    #\nl2 hit from l2 pf    1316928266     #\nl3 hit from l2 pf    2055888447     #\nl3 miss from l2 pf   169576582      #\ninstructions         1224949261309  # 63.013 float per 1000 inst\nfloat 512            127            # 0.000 AVX-512 per 1000 inst\nfloat 256            158978         # 0.000 AVX-256 per 1000 inst\nfloat 128            77187093384    # 63.012 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics with a somewhat different iGPU that seems ~2x slower frame rate.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              916.368\non_cpu               0.044          # 0.71 \/ 16 cores\nutime                560.936\nstime                85.333\nnvcsw                1113258        # 99.18%\nnivcsw               9227           # 0.82%\ninblock              16088          # 17.56\/sec\nonblock              9976           # 10.89\/sec\ncpu-clock            639362215876   # 639.362 seconds\ntask-clock           640526730187   # 640.527 seconds\npage faults          1119966        # 1748.508\/sec\ncontext switches     1126776        # 1759.140\/sec\ncpu migrations       8294           # 12.949\/sec\nmajor page faults    109            # 0.170\/sec\nminor page faults    1119857        # 1748.338\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             386646868699   # 170.775 branches per 1000 inst\nbranch misses        6406907616     # 1.66% branch miss\nconditional          386646903227   # 170.775 conditional branches per 1000 inst\nindirect             13807109676    # 6.098 indirect branches per 1000 inst\nslots                7765648184300  #\nretiring             2874712201531  # 37.0% (37.0%)\n-- ucode             217635797671   #     2.8%\n-- fastpath          2657076403860  #    34.2%\nfrontend             1908382930340  # 24.6% (24.6%)\n-- latency           858073053677   #    11.0%\n-- bandwidth         1050309876663  #    13.5%\nbackend              1650754897442  # 21.3% (21.3%)\n-- cpu               768652337590   #     9.9%\n-- memory            882102559852   #    11.4%\nspeculation          1405928996655  # 18.1% (18.1%)\n-- branch mispredict 1385587146745  #    17.8%\n-- pipeline restart  20341849910    #     0.3%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           1319100239590  # 0.09 GHz\ninstructions         2254013924600  # 1.71 IPC\nl2 access            92921018270    # 41.629 l2 access per 1000 inst\nl2 miss              37922616224    # 40.81% l2 miss\n<\/code><\/pre>\n\n\n\n<p>Process summary shows both gdrv (driver?) and several client and disk processes.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>1005 processes\n\t 96 linux_64:gdrv0         433.33    94.69\n\t 24 linux_64_client        430.57    93.18\n\t 24 SDLTimer               430.53    93.16\n\t 24 linux_64_c:cs0         430.53    93.15\n\t 24 linux_6:disk$0         430.52    93.15\n\t 24 linux_64:shlo0         430.51    93.15\n\t 24 linux_64_c:sh1         430.51    93.14\n\t 24 linux_64_c:sh10        430.51    93.14\n\t 24 linux_64_c:sh2         430.51    93.14\n\t 24 linux_64_c:sh3         430.51    93.14\n\t 24 linux_64_c:sh4         430.51    93.14\n\t 24 linux_64_c:sh5         430.51    93.14\n\t 24 linux_64_c:sh7         430.51    93.14\n\t 24 linux_64_c:sh8         430.51    93.14\n\t 24 linux_64_c:sh0         430.50    93.14\n\t 24 linux_64_c:sh11        430.50    93.14\n\t 24 linux_64_c:sh6         430.50    93.14\n\t 24 linux_64_c:sh9         430.50    93.14\n\t 24 PulseHotplug           430.47    93.11\n\t 24 SDLAudioP2             430.46    93.11\n\t 68 clinfo                  19.06     7.30\n\t 38 vulkaninfo               1.14     1.53\n\t  6 glxinfo:gdrv0            0.17     0.10\n\t  6 php                      0.13     0.26\n\t  4 vulkani:disk$0           0.12     0.17\n\t  2 glxinfo                  0.08     0.04\n\t  2 glxinfo:cs0              0.08     0.04\n\t  2 glxinfo:disk$0           0.08     0.04\n\t  6 clang                    0.07     0.05\n\t  2 glxinfo:sh0              0.07     0.04\n\t  2 glxinfo:shlo0            0.07     0.04\n\t  2 llvmpipe-0               0.06     0.09\n\t  2 llvmpipe-1               0.06     0.09\n\t  2 llvmpipe-10              0.06     0.09\n\t  2 llvmpipe-11              0.06     0.09\n\t  2 llvmpipe-12              0.06     0.09\n\t  2 llvmpipe-13              0.06     0.09\n\t  2 llvmpipe-14              0.06     0.09\n\t  2 llvmpipe-15              0.06     0.09\n\t  2 llvmpipe-2               0.06     0.09\n\t  2 llvmpipe-3               0.06     0.09\n\t  2 llvmpipe-4               0.06     0.09\n\t  2 llvmpipe-5               0.06     0.09\n\t  2 llvmpipe-6               0.06     0.09\n\t  2 llvmpipe-7               0.06     0.09\n\t  2 llvmpipe-8               0.06     0.09\n\t  2 llvmpipe-9               0.06     0.09\n\t  3 rocminfo                 0.00     0.03\n\t  1 lspci                    0.00     0.03\n\t  1 ps                       0.00     0.01\n\t 97 sh                       0.00     0.00\n\t 50 uname                    0.00     0.00\n\t 24 rm                       0.00     0.00\n\t 24 tesseract                0.00     0.00\n\t 14 gsettings                0.00     0.00\n\t 12 gcc                      0.00     0.00\n\t  9 systemd-detect-          0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  2 gmain                    0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xrandr                   0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 cc                       0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Following is the processes for one iteration<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      290837) tesseract        cpu=11 start=5.72  finish=37.33\n        290838) rm               cpu=4 start=5.72  finish=5.72 \n        290839) linux_64_client  cpu=4 start=5.72  finish=37.31\n          290840) uname            cpu=1 start=5.72  finish=5.72 \n          290841) uname            cpu=10 start=5.72  finish=5.73 \n          290842) SDLTimer         cpu=4 start=5.74  finish=37.31\n          290843) linux_64_c:cs0   cpu=9 start=5.78  finish=37.31\n          290844) linux_6:disk$0   cpu=10 start=5.78  finish=37.31\n          290845) linux_64_c:sh0   cpu=14 start=5.78  finish=37.31\n          290846) linux_64:shlo0   cpu=15 start=5.78  finish=37.31\n          290847) linux_64:gdrv0   cpu=14 start=5.79  finish=5.80 \n          290848) linux_64:gdrv0   cpu=14 start=5.80  finish=5.80 \n          290850) PulseHotplug     cpu=15 start=5.80  finish=37.30\n          290851) linux_64:gdrv0   cpu=4 start=5.80  finish=5.80 \n          290852) linux_64:gdrv0   cpu=7 start=5.81  finish=37.31\n          290853) linux_64_c:sh1   cpu=3 start=5.82  finish=37.31\n          290854) linux_64_c:sh2   cpu=12 start=5.82  finish=37.31\n          290855) linux_64_c:sh3   cpu=2 start=5.82  finish=37.31\n          290856) linux_64_c:sh4   cpu=6 start=5.82  finish=37.31\n          290857) SDLAudioP2       cpu=10 start=5.88  finish=37.30\n          290858) linux_64_c:sh5   cpu=12 start=5.91  finish=37.31\n          290859) linux_64_c:sh6   cpu=6 start=5.91  finish=37.31\n          290860) linux_64_c:sh7   cpu=5 start=5.91  finish=37.31\n          290861) linux_64_c:sh8   cpu=5 start=5.91  finish=37.31\n          290862) linux_64_c:sh9   cpu=15 start=5.91  finish=37.31\n          290863) linux_64_c:sh10  cpu=6 start=5.92  finish=37.31\n          290864) linux_64_c:sh11  cpu=1 start=5.92  finish=37.31\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Tesseract is a GPU focused game that tries a set of scenes at different resolutions and reports frames\/sec. As such unclear how much this really tests the CPU so don&#8217;t expect to run more of these game applications. The progression <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/tesseract\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-606","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/606","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=606"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/606\/revisions"}],"predecessor-version":[{"id":613,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/606\/revisions\/613"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=606"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}