{"id":698,"date":"2024-01-19T13:06:41","date_gmt":"2024-01-19T13:06:41","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=698"},"modified":"2024-01-19T13:06:42","modified_gmt":"2024-01-19T13:06:42","slug":"povray","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/povray\/","title":{"rendered":"povray"},"content":{"rendered":"\n<p>A quick running ray tracer test.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-37.png\" alt=\"\" class=\"wp-image-699\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-37.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-37-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-37-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown metrics shows a fairly high retirement rate limited by backend stalls of both CPU and memory.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-75.png\" alt=\"\" class=\"wp-image-700\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-75.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-75-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-75-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show running on all cores, floating point code and not many L2 accesses or misses.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              148.090\non_cpu               0.838          # 13.40 \/ 16 cores\nutime                1983.478\nstime                1.405\nnvcsw                20374          # 34.56%\nnivcsw               38574          # 65.44%\ninblock              0              # 0.00\/sec\nonblock              14056          # 94.92\/sec\ncpu-clock            1984937360494  # 1984.937 seconds\ntask-clock           1984951444204  # 1984.951 seconds\npage faults          198620         # 100.063\/sec\ncontext switches     59509          # 29.980\/sec\ncpu migrations       2780           # 1.401\/sec\nmajor page faults    2              # 0.001\/sec\nminor page faults    198618         # 100.062\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             1014202012293  # 73.753 branches per 1000 inst\nbranch misses        6191746506     # 0.61% branch miss\nconditional          609554742644   # 44.327 conditional branches per 1000 inst\nindirect             84296971221    # 6.130 indirect branches per 1000 inst\ncpu-cycles           7888007229708  # 3.32 GHz\ninstructions         13748191578115 # 1.74 IPC\nslots                15779680550700 #\nretiring             4822507950863  # 30.6% (48.6%)\n-- ucode             2839998364     #     0.0%\n-- fastpath          4819667952499  #    30.5%\nfrontend             584429036683   #  3.7% ( 5.9%)\n-- latency           422768735268   #     2.7%\n-- bandwidth         161660301415   #     1.0%\nbackend              4314856481849  # 27.3% (43.4%)\n-- cpu               2741261117778  #    17.4%\n-- memory            1573595364071  #    10.0%\nspeculation          209239720319   #  1.3% ( 2.1%)\n-- branch mispredict 165667444230   #     1.0%\n-- pipeline restart  43572276089    #     0.3%\nsmt-contention       5848634155814  # 37.1% ( 0.0%)\ncpu-cycles           7888366991559  # 3.33 GHz\ninstructions         13756479664683 # 1.74 IPC\ninstructions         4587432604742  # 18.648 l2 access per 1000 inst\nl2 hit from l1       76718057387    # 0.50% l2 miss\nl2 miss from l1      223100129      #\nl2 hit from l2 pf    8622365520     #\nl3 hit from l2 pf    188302969      #\nl3 miss from l2 pf   15496089       #\ninstructions         4578337251665  # 214.073 float per 1000 inst\nfloat 512            51             # 0.000 AVX-512 per 1000 inst\nfloat 256            664            # 0.000 AVX-256 per 1000 inst\nfloat 128            980097688996   # 214.073 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         4              # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics.  The Intel version wasn&#8217;t as stable when running, perhaps differences between cores?<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              807.771\non_cpu               0.880          # 14.07 \/ 16 cores\nutime                11364.904\nstime                2.490\nnvcsw                80897          # 31.26%\nnivcsw               177859         # 68.74%\ninblock              1328           # 1.64\/sec\nonblock              7864           # 9.74\/sec\ncpu-clock            11367494151495 # 11367.494 seconds\ntask-clock           11367537895233 # 11367.538 seconds\npage faults          360565         # 31.719\/sec\ncontext switches     262583         # 23.099\/sec\ncpu migrations       7004           # 0.616\/sec\nmajor page faults    14             # 0.001\/sec\nminor page faults    360551         # 31.718\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             4663954542631  # 70.318 branches per 1000 inst\nbranch misses        23249701772    # 0.50% branch miss\nconditional          4663954574567  # 70.318 conditional branches per 1000 inst\nindirect             1515855054426  # 22.855 indirect branches per 1000 inst\nslots                11383013944694 #\nretiring             7724310148011  # 67.9% (67.9%)\n-- ucode             434825331904   #     3.8%\n-- fastpath          7289484816107  #    64.0%\nfrontend             2693413158175  # 23.7% (23.7%)\n-- latency           1649219459909  #    14.5%\n-- bandwidth         1044193698266  #     9.2%\nbackend              741447952721   #  6.5% ( 6.5%)\n-- cpu               415808685357   #     3.7%\n-- memory            325639267364   #     2.9%\nspeculation          241057697650   #  2.1% ( 2.1%)\n-- branch mispredict 230083087910   #     2.0%\n-- pipeline restart  10974609740    #     0.1%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           6366658262732  # 2.26 GHz\ninstructions         12862934995320 # 2.02 IPC\nl2 access            48169438096    # 6.398 l2 access per 1000 inst\nl2 miss              502292780      # 1.04% l2 miss\n<\/code><\/pre>\n\n\n\n<p>Process overview<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              807.771\non_cpu               0.880          # 14.07 \/ 16 cores\nutime                11364.904\nstime                2.490\nnvcsw                80897          # 31.26%\nnivcsw               177859         # 68.74%\ninblock              1328           # 1.64\/sec\nonblock              7864           # 9.74\/sec\ncpu-clock            11367494151495 # 11367.494 seconds\ntask-clock           11367537895233 # 11367.538 seconds\npage faults          360565         # 31.719\/sec\ncontext switches     262583         # 23.099\/sec\ncpu migrations       7004           # 0.616\/sec\nmajor page faults    14             # 0.001\/sec\nminor page faults    360551         # 31.718\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             4663954542631  # 70.318 branches per 1000 inst\nbranch misses        23249701772    # 0.50% branch miss\nconditional          4663954574567  # 70.318 conditional branches per 1000 inst\nindirect             1515855054426  # 22.855 indirect branches per 1000 inst\nslots                11383013944694 #\nretiring             7724310148011  # 67.9% (67.9%)\n-- ucode             434825331904   #     3.8%\n-- fastpath          7289484816107  #    64.0%\nfrontend             2693413158175  # 23.7% (23.7%)\n-- latency           1649219459909  #    14.5%\n-- bandwidth         1044193698266  #     9.2%\nbackend              741447952721   #  6.5% ( 6.5%)\n-- cpu               415808685357   #     3.7%\n-- memory            325639267364   #     2.9%\nspeculation          241057697650   #  2.1% ( 2.1%)\n-- branch mispredict 230083087910   #     2.0%\n-- pipeline restart  10974609740    #     0.1%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           6366658262732  # 2.26 GHz\ninstructions         12862934995320 # 2.02 IPC\nl2 access            48169438096    # 6.398 l2 access per 1000 inst\nl2 miss              502292780      # 1.04% l2 miss\n<\/code><\/pre>\n\n\n\n<p>Computation block<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      2580699) povray           cpu=8 start=53.46 finish=97.40\n        2580700) povray           cpu=9 start=53.46 finish=53.46\n        2580701) povray           cpu=8 start=53.46 finish=97.39\n          2580702) povray           cpu=3 start=53.46 finish=97.40\n          2580703) povray           cpu=13 start=53.46 finish=97.39\n            2580704) povray           cpu=3 start=53.47 finish=97.34\n              2580705) povray           cpu=15 start=53.72 finish=97.14\n                2580706) povray           cpu=9 start=53.72 finish=53.96\n                2580707) povray           cpu=0 start=53.96 finish=53.96\n              2580708) povray           cpu=5 start=53.97 finish=97.14\n                2580709) povray           cpu=9 start=53.97 finish=53.97\n                2580710) povray           cpu=10 start=53.97 finish=53.97\n                2580711) povray           cpu=9 start=54.02 finish=55.04\n                2580712) povray           cpu=10 start=54.02 finish=54.20\n                2580713) povray           cpu=11 start=54.02 finish=54.02\n                2580714) ?? cpu=0 start=54.03 finish=0.00 \n                2580715) povray           cpu=15 start=54.03 finish=54.03\n                2580716) povray           cpu=4 start=54.03 finish=54.03\n                2580717) povray           cpu=14 start=54.03 finish=54.03\n                2580718) povray           cpu=13 start=54.03 finish=54.03\n                2580719) povray           cpu=11 start=54.03 finish=54.03\n                2580720) povray           cpu=15 start=54.03 finish=54.03\n                2580721) povray           cpu=12 start=54.03 finish=54.03\n                2580722) povray           cpu=13 start=54.03 finish=54.03\n                2580723) povray           cpu=13 start=54.03 finish=54.03\n                2580724) povray           cpu=14 start=54.03 finish=54.03\n                2580725) povray           cpu=13 start=54.03 finish=54.03\n                2580726) povray           cpu=13 start=54.03 finish=54.03\n                2580727) povray           cpu=9 start=55.08 finish=55.09\n                2580728) povray           cpu=9 start=55.13 finish=95.89\n                2580729) povray           cpu=3 start=55.13 finish=95.84\n                2580730) povray           cpu=13 start=55.13 finish=96.33\n                2580731) povray           cpu=14 start=55.13 finish=96.38\n                2580732) povray           cpu=10 start=55.13 finish=96.65\n                2580733) povray           cpu=11 start=55.13 finish=96.66\n                2580734) povray           cpu=15 start=55.13 finish=96.56\n                2580735) povray           cpu=12 start=55.13 finish=96.40\n                2580736) povray           cpu=0 start=55.13 finish=95.82\n                2580737) povray           cpu=8 start=55.13 finish=95.96\n                2580738) povray           cpu=11 start=55.13 finish=95.81\n                2580739) povray           cpu=0 start=55.13 finish=96.84\n                2580740) povray           cpu=9 start=55.13 finish=97.00\n                2580741) povray           cpu=15 start=55.13 finish=95.82\n                2580742) povray           cpu=1 start=55.13 finish=96.27\n                2580743) povray           cpu=2 start=55.13 finish=95.92\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>A quick running ray tracer test. Topdown metrics shows a fairly high retirement rate limited by backend stalls of both CPU and memory. AMD metrics show running on all cores, floating point code and not many L2 accesses or misses. <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/povray\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-698","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/698","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=698"}],"version-history":[{"count":1,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/698\/revisions"}],"predecessor-version":[{"id":701,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/698\/revisions\/701"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=698"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}