{"id":200,"date":"2024-01-04T01:22:00","date_gmt":"2024-01-04T01:22:00","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=200"},"modified":"2024-01-04T01:22:02","modified_gmt":"2024-01-04T01:22:02","slug":"openfoam","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/openfoam\/","title":{"rendered":"openfoam"},"content":{"rendered":"\n<p>OpenFoam CFD program has several different sized models.  I picked the second smallest but it would be interesting to see what happens are we scale to larger models.  After a startup period, the overall running time is dominated by backend activity with memory being ~2x that of cpu.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-10.png\" alt=\"\" class=\"wp-image-201\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-10.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-10-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-10-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>The AMD metrics below.  We seem to use only 1\/2 of the cores. This is somewhat branchy code and branch mis-predicts are slightly higher than normal.  It is also floating point code.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              411.853\non_cpu               0.485          # 7.77 \/ 16 cores\nutime                3116.981\nstime                81.863\nnvcsw                55573          # 85.56%\nnivcsw               9381           # 14.44%\ninblock              274000         # 665.29\/sec\nonblock              771784         # 1873.93\/sec\ncpu-clock            3198390654945  # 3198.391 seconds\ntask-clock           3198522379496  # 3198.522 seconds\npage faults          28886234       # 9031.118\/sec\ncontext switches     66207          # 20.699\/sec\ncpu migrations       5928           # 1.853\/sec\nmajor page faults    5779           # 1.807\/sec\nminor page faults    28880455       # 9029.312\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             1953073862036  # 123.095 branches per 1000 inst\nbranch misses        36594044022    # 1.87% branch miss\nconditional          1498320940542  # 94.434 conditional branches per 1000 inst\nindirect             81449302162    # 5.133 indirect branches per 1000 inst\ncpu-cycles           14728766760104 # 2.24 GHz\ninstructions         15750869300593 # 1.07 IPC\nslots                29456694670128 #\nretiring             5325501761058  # 18.1% (18.1%)\n-- ucode             3259869641     #     0.0%\n-- fastpath          5322241891417  #    18.1%\nfrontend             2275208597330  #  7.7% ( 7.7%)\n-- latency           1559401842588  #     5.3%\n-- bandwidth         715806754742   #     2.4%\nbackend              20565968541167 # 69.8% (69.9%)\n-- cpu               3818610930354  #    13.0%\n-- memory            16747357610813 #    56.9%\nspeculation          1274324683572  #  4.3% ( 4.3%)\n-- branch mispredict 1237491238390  #     4.2%\n-- pipeline restart  36833445182    #     0.1%\nsmt-contention       15678849842    #  0.1% ( 0.0%)\ncpu-cycles           14762307677757 # 2.25 GHz\ninstructions         15592330738951 # 1.06 IPC\ninstructions         5196681087432  # 43.719 l2 access per 1000 inst\nl2 hit from l1       130413856989   # 35.89% l2 miss\nl2 miss from l1      14515875294    #\nl2 hit from l2 pf    29748688315    #\nl3 hit from l2 pf    18730462115    #\nl3 miss from l2 pf   48299527102    #\ninstructions         5199921049632  # 247.898 float per 1000 inst\nfloat 512            196            # 0.000 AVX-512 per 1000 inst\nfloat 256            2623           # 0.000 AVX-256 per 1000 inst\nfloat 128            1289049778180  # 247.898 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         806            # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>The corresponding Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              630.059\non_cpu               0.733          # 11.72 \/ 16 cores\nutime                7274.138\nstime                110.304\nnvcsw                75730          # 77.59%\nnivcsw               21868          # 22.41%\ninblock              225432         # 357.79\/sec\nonblock              779616         # 1237.37\/sec\ncpu-clock            7384424152013  # 7384.424 seconds\ntask-clock           7384521885456  # 7384.522 seconds\npage faults          27008298       # 3657.420\/sec\ncontext switches     99959          # 13.536\/sec\ncpu migrations       12795          # 1.733\/sec\nmajor page faults    7665           # 1.038\/sec\nminor page faults    27000632       # 3656.382\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             4920084801682  # 147.418 branches per 1000 inst\nbranch misses        42207040760    # 0.86% branch miss\nconditional          4920084833266  # 147.418 conditional branches per 1000 inst\nindirect             1009320119672  # 30.242 indirect branches per 1000 inst\nslots                48575631182720 #\nretiring             23167973123611 # 47.7% (47.7%)\n-- ucode             1531906094319  #     3.2%\n-- fastpath          21636067029292 #    44.5%\nfrontend             3810435128963  #  7.8% ( 7.8%)\n-- latency           1596180788057  #     3.3%\n-- bandwidth         2214254340906  #     4.6%\nbackend              18854956683996 # 38.8% (38.8%)\n-- cpu               5664129591282  #    11.7%\n-- memory            13190827092714 #    27.2%\nspeculation          3100904740821  #  6.4% ( 6.4%)\n-- branch mispredict 2878415958491  #     5.9%\n-- pipeline restart  222488782330   #     0.5%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           24196271950931 # 2.35 GHz\ninstructions         72440581264655 # 2.99 IPC\nl2 access            193922593948   # 8.019 l2 access per 1000 inst\nl2 miss              96198280028    # 49.61% l2 miss<\/code><\/pre>\n\n\n\n<p>A small number of processes dominate on where the time is spent. It looks like we don&#8217;t get a full profile, this is just the initial 50 seconds before we have a hang, so need to get better structure after that.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>821 processes\n\t 24 snappyHexMesh         1019.83    10.26\n\t  2 cc1plus                  0.39     0.07\n\t 19 vulkaninfo               0.19     0.76\n\t  2 ld.bfd                   0.05     0.04\n\t  3 glxinfo:gdrv0            0.04     0.06\n\t  6 clang                    0.04     0.03\n\t  1 decomposePar             0.03     0.03\n\t  2 vulkani:disk$0           0.02     0.08\n\t  1 glxinfo                  0.02     0.02\n\t  1 glxinfo:cs0              0.02     0.02\n\t  1 glxinfo:disk$0           0.02     0.02\n\t  1 glxinfo:sh0              0.02     0.02\n\t  1 glxinfo:shlo0            0.02     0.02\n\t  1 blockMesh                0.02     0.00\n\t  1 llvmpipe-0               0.01     0.04\n\t  1 llvmpipe-1               0.01     0.04\n\t  1 llvmpipe-10              0.01     0.04\n\t  1 llvmpipe-11              0.01     0.04\n\t  1 llvmpipe-12              0.01     0.04\n\t  1 llvmpipe-13              0.01     0.04\n\t  1 llvmpipe-14              0.01     0.04\n\t  1 llvmpipe-15              0.01     0.04\n\t  1 llvmpipe-2               0.01     0.04\n\t  1 llvmpipe-3               0.01     0.04\n\t  1 llvmpipe-4               0.01     0.04\n\t  1 llvmpipe-5               0.01     0.04\n\t  1 llvmpipe-6               0.01     0.04\n\t  1 llvmpipe-7               0.01     0.04\n\t  1 llvmpipe-8               0.01     0.04\n\t  1 llvmpipe-9               0.01     0.04\n\t  6 make                     0.01     0.02\n\t271 sh                       0.00     0.00\n\t108 foamCleanPath            0.00     0.00\n\t 96 tr                       0.00     0.00\n\t 57 sed                      0.00     0.00\n\t 22 rm                       0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t 12 foamEtcFile              0.00     0.00\n\t 10 grep                     0.00     0.00\n\t  9 gsettings                0.00     0.00\n\t  9 stty                     0.00     0.00\n\t  8 dirname                  0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  8 wmakeLnIncludeA          0.00     0.00\n\t  7 stat                     0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  6 openfoam                 0.00     0.00\n\t  5 find                     0.00     0.00\n\t  5 mkdir                    0.00     0.00\n\t  4 g++                      0.00     0.00\n\t  4 makeTargetDir            0.00     0.00\n\t  4 phoronix-test-s          0.00     0.00\n\t  4 wmake                    0.00     0.00\n\t  3 dconf worker             0.00     0.00\n\t  3 gmain                    0.00     0.00\n\t  2 as                       0.00     0.00\n\t  2 collect2                 0.00     0.00\n<\/code><\/pre>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>OpenFoam CFD program has several different sized models. I picked the second smallest but it would be interesting to see what happens are we scale to larger models. After a startup period, the overall running time is dominated by backend <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/openfoam\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-200","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/200","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=200"}],"version-history":[{"count":1,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/200\/revisions"}],"predecessor-version":[{"id":202,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/200\/revisions\/202"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=200"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}