Livermore OpenMP test with one workload test. Looks to be mostly single-threaded with short sections of multi-threaded runs.

Topdown profile looks backend bound with the short parallel sections less so.

AMD metrics show an average of only 3.5 cores. This is floating point code with few branch misses. Frontend stalls are low and backend stalls are high.

elapsed              546.949
on_cpu               0.218          # 3.49 / 16 cores
utime                1904.013
stime                2.173
nvcsw                16236          # 51.16%
nivcsw               15499          # 48.84%
inblock              0              # 0.00/sec
onblock              2056           # 3.76/sec
cpu-clock            1906902816556  # 1906.903 seconds
task-clock           1906923992083  # 1906.924 seconds
page faults          138839         # 72.808/sec
context switches     34302          # 17.988/sec
cpu migrations       339            # 0.178/sec
major page faults    0              # 0.000/sec
minor page faults    138839         # 72.808/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             773040432866   # 112.899 branches per 1000 inst
branch misses        1708677115     # 0.22% branch miss
conditional          770588283706   # 112.540 conditional branches per 1000 inst
indirect             374106320      # 0.055 indirect branches per 1000 inst
cpu-cycles           1692125537767  # 0.94 GHz
instructions         1370561071471  # 0.81 IPC
slots                3386944277190  #
retiring             409289835857   # 12.1% (18.2%)
-- ucode             463432023      #     0.0%
-- fastpath          408826403834   #    12.1%
frontend             52934629700    #  1.6% ( 2.4%) low
-- latency           26013781506    #     0.8%
-- bandwidth         26920848194    #     0.8%
backend              1774101880707  # 52.4% (79.0%) high
-- cpu               635790079324   #    18.8%
-- memory            1138311801383  #    33.6%
speculation          8381426298     #  0.2% ( 0.4%) low
-- branch mispredict 7898633078     #     0.2%
-- pipeline restart  482793220      #     0.0%
smt-contention       1142234822831  # 33.7% ( 0.0%)
cpu-cycles           1697637019615  # 0.95 GHz
instructions         1372189486239  # 0.81 IPC
instructions         456767926414   # 142.516 l2 access per 1000 inst
l2 hit from l1       26648010394    # 44.94% l2 miss
l2 miss from l1      1513928287     #
l2 hit from l2 pf    10710793952    #
l3 hit from l2 pf    25090146143    #
l3 miss from l2 pf   2647865317     #
instructions         457188883424   # 329.247 float per 1000 inst
float 512            72             # 0.000 AVX-512 per 1000 inst
float 256            344            # 0.000 AVX-256 per 1000 inst
float 128            150528050110   # 329.247 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         0              # 0.000 scalar per 1000 inst
instructions         2390641        #
opcache              897366         # 375.366 opcache per 1000 inst
opcache miss         478333         # 53.3% opcache miss rate
l1 dTLB miss         5470           # 2.288 L1 dTLB per 1000 inst
l2 dTLB miss         1094           # 0.458 L2 dTLB per 1000 inst
instructions         2418972        #
icache               1193224        # 493.277 icache per 1000 inst
icache miss          111159         #  9.3% icache miss rate
l1 iTLB miss         7              # 0.003 L1 iTLB per 1000 inst
l2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst
tlb flush            19             # 0.008 TLB flush per 1000 inst

Intel metrics are much quicker, looks like the AMD version needs multiple runs to reduce tolerance.

elapsed              115.355
on_cpu               0.318          # 5.08 / 16 cores
utime                585.447
stime                0.776
nvcsw                4673           # 48.25%
nivcsw               5012           # 51.75%
inblock              616            # 5.34/sec
onblock              1416           # 12.28/sec
cpu-clock            586340601288   # 586.341 seconds
task-clock           586347075893   # 586.347 seconds
page faults          152353         # 259.834/sec
context switches     10093          # 17.213/sec
cpu migrations       421            # 0.718/sec
major page faults    2              # 0.003/sec
minor page faults    152351         # 259.831/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             157824839681   # 114.070 branches per 1000 inst
branch misses        328274560      # 0.21% branch miss
conditional          157824853057   # 114.070 conditional branches per 1000 inst
indirect             47055770829    # 34.010 indirect branches per 1000 inst
slots                10662556995632 #
retiring             2005207474450  # 18.8% (18.8%)
-- ucode             38968677224    #     0.4%
-- fastpath          1966238797226  #    18.4%
frontend             1142542081923  # 10.7% (10.7%)
-- latency           1027167896973  #     9.6%
-- bandwidth         115374184950   #     1.1%
backend              7477821619753  # 70.1% (70.1%) high
-- cpu               4816740097479  #    45.2%
-- memory            2661081522274  #    25.0%
speculation          51623307331    #  0.5% ( 0.5%) low
-- branch mispredict 48674846500    #     0.5%
-- pipeline restart  2948460831     #     0.0%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           1389467375584  # 0.74 GHz
instructions         1100892371408  # 0.79 IPC
l2 access            104540813869   # 108.996 l2 access per 1000 inst
l2 miss              55139555141    # 52.74% l2 miss
cpu-cycles           1612435730781  # 32.1% memory latency
load stalls          517933116525   # 17.9% l1 bound
l1 miss              229865098888   #  4.7% l2 bound
l2 miss              154331027827   #  5.8% l3 bound
l3 miss              60457495909    #  3.7% dram bound
store_stalls         315269774      #  0.0% store bound

Process overview shows most of the time in clomp-build.

293 processes
	 48 clomp_build           6106.56     6.88
	 38 vulkaninfo               1.31     0.95
	  6 glxinfo:gdrv0            0.14     0.10
	  4 vulkani:disk$0           0.13     0.10
	  6 php                      0.07     0.05
	  2 llvmpipe-0               0.07     0.05
	  2 llvmpipe-1               0.07     0.05
	  2 llvmpipe-10              0.07     0.05
	  2 llvmpipe-11              0.07     0.05
	  2 llvmpipe-12              0.07     0.05
	  2 llvmpipe-13              0.07     0.05
	  2 llvmpipe-14              0.07     0.05
	  2 llvmpipe-15              0.07     0.05
	  2 llvmpipe-2               0.07     0.05
	  2 llvmpipe-3               0.07     0.05
	  2 llvmpipe-4               0.07     0.05
	  2 llvmpipe-5               0.07     0.05
	  2 llvmpipe-6               0.07     0.05
	  2 llvmpipe-7               0.07     0.05
	  2 llvmpipe-8               0.07     0.05
	  2 llvmpipe-9               0.07     0.05
	  2 glxinfo                  0.06     0.04
	  2 glxinfo:cs0              0.06     0.04
	  2 glxinfo:disk$0           0.06     0.04
	  2 glxinfo:sh0              0.06     0.04
	  2 glxinfo:shlo0            0.06     0.04
	  1 lspci                    0.01     0.01
	  1 ps                       0.00     0.01
	 66 sh                       0.00     0.00
	 13 gcc                      0.00     0.00
	  8 gsettings                0.00     0.00
	  8 stat                     0.00     0.00
	  8 systemd-detect-          0.00     0.00
	  5 gmain                    0.00     0.00
	  5 phoronix-test-s          0.00     0.00
	  3 clomp                    0.00     0.00
	  3 dconf worker             0.00     0.00
	  2 cc                       0.00     0.00
	  2 lscpu                    0.00     0.00
	  2 uname                    0.00     0.00
	  2 which                    0.00     0.00
	  2 xset                     0.00     0.00
	  1 date                     0.00     0.00
	  1 dirname                  0.00     0.00
	  1 dmesg                    0.00     0.00
	  1 dmidecode                0.00     0.00
	  1 grep                     0.00     0.00
	  1 ifconfig                 0.00     0.00
	  1 ip                       0.00     0.00
	  1 lsmod                    0.00     0.00
	  1 mktemp                   0.00     0.00
	  1 readlink                 0.00     0.00
	  1 realpath                 0.00     0.00
	  1 sed                      0.00     0.00
	  1 sort                     0.00     0.00
	  1 stty                     0.00     0.00
	  1 systemctl                0.00     0.00
	  1 template.sh              0.00     0.00
	  1 wc                       0.00     0.00
	  1 xrandr                   0.00     0.00
0 processes running
47 maximum processes

Computation blocks

      83118) clomp            cpu=3 start=4.90  finish=36.81
        83119) clomp_build      cpu=13 start=4.90  finish=36.81
          83120) clomp_build      cpu=15 start=4.90  finish=36.81
          83121) clomp_build      cpu=0 start=4.90  finish=36.81
          83122) clomp_build      cpu=14 start=4.90  finish=36.81
          83123) clomp_build      cpu=4 start=4.90  finish=36.81
          83124) clomp_build      cpu=9 start=4.90  finish=36.81
          83125) clomp_build      cpu=10 start=4.90  finish=36.81
          83126) clomp_build      cpu=3 start=4.90  finish=36.81
          83127) clomp_build      cpu=5 start=4.90  finish=36.81
          83128) clomp_build      cpu=1 start=4.90  finish=36.81
          83129) clomp_build      cpu=7 start=4.90  finish=36.81
          83130) clomp_build      cpu=8 start=4.90  finish=36.81
          83131) clomp_build      cpu=12 start=4.90  finish=36.81
          83132) clomp_build      cpu=6 start=4.90  finish=36.81
          83133) clomp_build      cpu=2 start=4.90  finish=36.81
          83134) clomp_build      cpu=11 start=4.90  finish=36.81