{"id":663,"date":"2024-01-17T12:08:23","date_gmt":"2024-01-17T12:08:23","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=663"},"modified":"2024-01-18T02:09:16","modified_gmt":"2024-01-18T02:09:16","slug":"cloverleaf","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/cloverleaf\/","title":{"rendered":"cloverleaf"},"content":{"rendered":"\n<p>Cloverleaf is a hydrodynamics benchmark with three workloads. Almost all the time is spent in the second workload. The overall profile suggests a runable process on every core.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-32.png\" alt=\"\" class=\"wp-image-668\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-32.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-32-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-32-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown metrics show a very memory bound application.with little time in retiring instructions.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-69.png\" alt=\"\" class=\"wp-image-669\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-69.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-69-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-69-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show floating point code without many branches. There is a reasonably high L2 miss rate.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              5873.670\non_cpu               0.972          # 15.56 \/ 16 cores\nutime                91203.508\nstime                189.167\nnvcsw                3950849        # 83.79%\nnivcsw               764581         # 16.21%\ninblock              8              # 0.00\/sec\nonblock              47808          # 8.14\/sec\ncpu-clock            91496573378200 # 91496.573 seconds\ntask-clock           91501724220020 # 91501.724 seconds\npage faults          11117501       # 121.500\/sec\ncontext switches     4744569        # 51.852\/sec\ncpu migrations       100280         # 1.096\/sec\nmajor page faults    55             # 0.001\/sec\nminor page faults    11117446       # 121.500\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             5634504972305  # 52.699 branches per 1000 inst\nbranch misses        34518004796    # 0.61% branch miss\nconditional          4723812723676  # 44.181 conditional branches per 1000 inst\nindirect             450208732      # 0.004 indirect branches per 1000 inst\ncpu-cycles           416297076847393 # 4.42 GHz\ninstructions         106923594819988 # 0.26 IPC\nslots                832408287294990 #\nretiring             36729774536367 #  4.4% ( 4.7%)\n-- ucode             16033010471    #     0.0%\n-- fastpath          36713741525896 #     4.4%\nfrontend             19057802930313 #  2.3% ( 2.4%)\n-- latency           13207251742008 #     1.6%\n-- bandwidth         5850551188305  #     0.7%\nbackend              733264248976694 # 88.1% (92.9%)\n-- cpu               125417997459295 #    15.1%\n-- memory            607846251517399 #    73.0%\nspeculation          665948773067   #  0.1% ( 0.1%)\n-- branch mispredict 591782708845   #     0.1%\n-- pipeline restart  74166064222    #     0.0%\nsmt-contention       42689832224123 #  5.1% ( 0.0%)\ncpu-cycles           416281856017187 # 4.42 GHz\ninstructions         106924788382785 # 0.26 IPC\ninstructions         35640174860193 # 85.328 l2 access per 1000 inst\nl2 hit from l1       1723378795510  # 34.64% l2 miss\nl2 miss from l1      239934204030   #\nl2 hit from l2 pf    504278478852   #\nl3 hit from l2 pf    37875169844    #\nl3 miss from l2 pf   775570101725   #\ninstructions         35622630123443 # 271.330 float per 1000 inst\nfloat 512            60             # 0.000 AVX-512 per 1000 inst\nfloat 256            672            # 0.000 AVX-256 per 1000 inst\nfloat 128            9665485999618  # 271.330 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         1              # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics similarly show missing L2 cache.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              9195.794\non_cpu               0.893          # 14.28 \/ 16 cores\nutime                131158.728\nstime                180.440\nnvcsw                4459732        # 79.97%\nnivcsw               1117243        # 20.03%\ninblock              7232           # 0.79\/sec\nonblock              53072          # 5.77\/sec\ncpu-clock            131303755757174 # 131303.756 seconds\ntask-clock           131309362105649 # 131309.362 seconds\npage faults          11289647       # 85.977\/sec\ncontext switches     5622700        # 42.820\/sec\ncpu migrations       690226         # 5.256\/sec\nmajor page faults    162            # 0.001\/sec\nminor page faults    11289485       # 85.976\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             6119594162100  # 52.521 branches per 1000 inst\nbranch misses        32286755507    # 0.53% branch miss\nconditional          6119594182260  # 52.521 conditional branches per 1000 inst\nindirect             2090955801758  # 17.945 indirect branches per 1000 inst\nslots                474390606585344 #\nretiring             71789897080078 # 15.1% (15.1%)\n-- ucode             13179968617235 #     2.8%\n-- fastpath          58609928462843 #    12.4%\nfrontend             29418725753857 #  6.2% ( 6.2%)\n-- latency           18241180851442 #     3.8%\n-- bandwidth         11177544902415 #     2.4%\nbackend              378719773656819 # 79.8% (79.8%)\n-- cpu               62995505319387 #    13.3%\n-- memory            315724268337432 #    66.6%\nspeculation          4567039699865  #  1.0% ( 1.0%)\n-- branch mispredict 3065558847064  #     0.6%\n-- pipeline restart  1501480852801  #     0.3%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           345428388454584 # 2.51 GHz\ninstructions         122694459382065 # 0.36 IPC\nl2 access            3480548081213  # 62.611 l2 access per 1000 inst\nl2 miss              1791504627857  # 51.47% l2 miss\n<\/code><\/pre>\n\n\n\n<p>Process overview crashed part way through the second workload so we don&#8217;t have a full account.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              9195.794\non_cpu               0.893          # 14.28 \/ 16 cores\nutime                131158.728\nstime                180.440\nnvcsw                4459732        # 79.97%\nnivcsw               1117243        # 20.03%\ninblock              7232           # 0.79\/sec\nonblock              53072          # 5.77\/sec\ncpu-clock            131303755757174 # 131303.756 seconds\ntask-clock           131309362105649 # 131309.362 seconds\npage faults          11289647       # 85.977\/sec\ncontext switches     5622700        # 42.820\/sec\ncpu migrations       690226         # 5.256\/sec\nmajor page faults    162            # 0.001\/sec\nminor page faults    11289485       # 85.976\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             6119594162100  # 52.521 branches per 1000 inst\nbranch misses        32286755507    # 0.53% branch miss\nconditional          6119594182260  # 52.521 conditional branches per 1000 inst\nindirect             2090955801758  # 17.945 indirect branches per 1000 inst\nslots                474390606585344 #\nretiring             71789897080078 # 15.1% (15.1%)\n-- ucode             13179968617235 #     2.8%\n-- fastpath          58609928462843 #    12.4%\nfrontend             29418725753857 #  6.2% ( 6.2%)\n-- latency           18241180851442 #     3.8%\n-- bandwidth         11177544902415 #     2.4%\nbackend              378719773656819 # 79.8% (79.8%)\n-- cpu               62995505319387 #    13.3%\n-- memory            315724268337432 #    66.6%\nspeculation          4567039699865  #  1.0% ( 1.0%)\n-- branch mispredict 3065558847064  #     0.6%\n-- pipeline restart  1501480852801  #     0.3%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           345428388454584 # 2.51 GHz\ninstructions         122694459382065 # 0.36 IPC\nl2 access            3480548081213  # 62.611 l2 access per 1000 inst\nl2 miss              1791504627857  # 51.47% l2 miss\n\n\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Cloverleaf is a hydrodynamics benchmark with three workloads. Almost all the time is spent in the second workload. The overall profile suggests a runable process on every core. Topdown metrics show a very memory bound application.with little time in retiring <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/cloverleaf\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-663","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/663","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=663"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/663\/revisions"}],"predecessor-version":[{"id":670,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/663\/revisions\/670"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=663"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}