OpenFoam CFD program has several different sized models. I picked the second smallest but it would be interesting to see what happens are we scale to larger models. After a startup period, the overall running time is dominated by backend activity with memory being ~2x that of cpu.

The AMD metrics below. We seem to use only 1/2 of the cores. This is somewhat branchy code and branch mis-predicts are slightly higher than normal. It is also floating point code.

elapsed              411.853
on_cpu               0.485          # 7.77 / 16 cores
utime                3116.981
stime                81.863
nvcsw                55573          # 85.56%
nivcsw               9381           # 14.44%
inblock              274000         # 665.29/sec
onblock              771784         # 1873.93/sec
cpu-clock            3198390654945  # 3198.391 seconds
task-clock           3198522379496  # 3198.522 seconds
page faults          28886234       # 9031.118/sec
context switches     66207          # 20.699/sec
cpu migrations       5928           # 1.853/sec
major page faults    5779           # 1.807/sec
minor page faults    28880455       # 9029.312/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             1953073862036  # 123.095 branches per 1000 inst
branch misses        36594044022    # 1.87% branch miss
conditional          1498320940542  # 94.434 conditional branches per 1000 inst
indirect             81449302162    # 5.133 indirect branches per 1000 inst
cpu-cycles           14728766760104 # 2.24 GHz
instructions         15750869300593 # 1.07 IPC
slots                29456694670128 #
retiring             5325501761058  # 18.1% (18.1%)
-- ucode             3259869641     #     0.0%
-- fastpath          5322241891417  #    18.1%
frontend             2275208597330  #  7.7% ( 7.7%)
-- latency           1559401842588  #     5.3%
-- bandwidth         715806754742   #     2.4%
backend              20565968541167 # 69.8% (69.9%)
-- cpu               3818610930354  #    13.0%
-- memory            16747357610813 #    56.9%
speculation          1274324683572  #  4.3% ( 4.3%)
-- branch mispredict 1237491238390  #     4.2%
-- pipeline restart  36833445182    #     0.1%
smt-contention       15678849842    #  0.1% ( 0.0%)
cpu-cycles           14762307677757 # 2.25 GHz
instructions         15592330738951 # 1.06 IPC
instructions         5196681087432  # 43.719 l2 access per 1000 inst
l2 hit from l1       130413856989   # 35.89% l2 miss
l2 miss from l1      14515875294    #
l2 hit from l2 pf    29748688315    #
l3 hit from l2 pf    18730462115    #
l3 miss from l2 pf   48299527102    #
instructions         5199921049632  # 247.898 float per 1000 inst
float 512            196            # 0.000 AVX-512 per 1000 inst
float 256            2623           # 0.000 AVX-256 per 1000 inst
float 128            1289049778180  # 247.898 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         806            # 0.000 scalar per 1000 inst

The corresponding Intel metrics

elapsed              630.059
on_cpu               0.733          # 11.72 / 16 cores
utime                7274.138
stime                110.304
nvcsw                75730          # 77.59%
nivcsw               21868          # 22.41%
inblock              225432         # 357.79/sec
onblock              779616         # 1237.37/sec
cpu-clock            7384424152013  # 7384.424 seconds
task-clock           7384521885456  # 7384.522 seconds
page faults          27008298       # 3657.420/sec
context switches     99959          # 13.536/sec
cpu migrations       12795          # 1.733/sec
major page faults    7665           # 1.038/sec
minor page faults    27000632       # 3656.382/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             4920084801682  # 147.418 branches per 1000 inst
branch misses        42207040760    # 0.86% branch miss
conditional          4920084833266  # 147.418 conditional branches per 1000 inst
indirect             1009320119672  # 30.242 indirect branches per 1000 inst
slots                48575631182720 #
retiring             23167973123611 # 47.7% (47.7%)
-- ucode             1531906094319  #     3.2%
-- fastpath          21636067029292 #    44.5%
frontend             3810435128963  #  7.8% ( 7.8%)
-- latency           1596180788057  #     3.3%
-- bandwidth         2214254340906  #     4.6%
backend              18854956683996 # 38.8% (38.8%)
-- cpu               5664129591282  #    11.7%
-- memory            13190827092714 #    27.2%
speculation          3100904740821  #  6.4% ( 6.4%)
-- branch mispredict 2878415958491  #     5.9%
-- pipeline restart  222488782330   #     0.5%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           24196271950931 # 2.35 GHz
instructions         72440581264655 # 2.99 IPC
l2 access            193922593948   # 8.019 l2 access per 1000 inst
l2 miss              96198280028    # 49.61% l2 miss

A small number of processes dominate on where the time is spent. It looks like we don’t get a full profile, this is just the initial 50 seconds before we have a hang, so need to get better structure after that.

821 processes
	 24 snappyHexMesh         1019.83    10.26
	  2 cc1plus                  0.39     0.07
	 19 vulkaninfo               0.19     0.76
	  2 ld.bfd                   0.05     0.04
	  3 glxinfo:gdrv0            0.04     0.06
	  6 clang                    0.04     0.03
	  1 decomposePar             0.03     0.03
	  2 vulkani:disk$0           0.02     0.08
	  1 glxinfo                  0.02     0.02
	  1 glxinfo:cs0              0.02     0.02
	  1 glxinfo:disk$0           0.02     0.02
	  1 glxinfo:sh0              0.02     0.02
	  1 glxinfo:shlo0            0.02     0.02
	  1 blockMesh                0.02     0.00
	  1 llvmpipe-0               0.01     0.04
	  1 llvmpipe-1               0.01     0.04
	  1 llvmpipe-10              0.01     0.04
	  1 llvmpipe-11              0.01     0.04
	  1 llvmpipe-12              0.01     0.04
	  1 llvmpipe-13              0.01     0.04
	  1 llvmpipe-14              0.01     0.04
	  1 llvmpipe-15              0.01     0.04
	  1 llvmpipe-2               0.01     0.04
	  1 llvmpipe-3               0.01     0.04
	  1 llvmpipe-4               0.01     0.04
	  1 llvmpipe-5               0.01     0.04
	  1 llvmpipe-6               0.01     0.04
	  1 llvmpipe-7               0.01     0.04
	  1 llvmpipe-8               0.01     0.04
	  1 llvmpipe-9               0.01     0.04
	  6 make                     0.01     0.02
	271 sh                       0.00     0.00
	108 foamCleanPath            0.00     0.00
	 96 tr                       0.00     0.00
	 57 sed                      0.00     0.00
	 22 rm                       0.00     0.00
	 13 gcc                      0.00     0.00
	 12 foamEtcFile              0.00     0.00
	 10 grep                     0.00     0.00
	  9 gsettings                0.00     0.00
	  9 stty                     0.00     0.00
	  8 dirname                  0.00     0.00
	  8 systemd-detect-          0.00     0.00
	  8 wmakeLnIncludeA          0.00     0.00
	  7 stat                     0.00     0.00
	  6 llvm-link                0.00     0.00
	  6 openfoam                 0.00     0.00
	  5 find                     0.00     0.00
	  5 mkdir                    0.00     0.00
	  4 g++                      0.00     0.00
	  4 makeTargetDir            0.00     0.00
	  4 phoronix-test-s          0.00     0.00
	  4 wmake                    0.00     0.00
	  3 dconf worker             0.00     0.00
	  3 gmain                    0.00     0.00
	  2 as                       0.00     0.00
	  2 collect2                 0.00     0.00