OpenMP implementation that performs imaging tests. This workload has seven tests. Here is a case where Intel does better on two tests and worse on the other five.

Overall metrics shows a fairly high retirement rate that gets limited by backend CPU depending on the test. Backend memory is not as much of an issue.

AMD metrics show reasonable floating point code and not as many branches.

elapsed              1644.398
on_cpu               0.678          # 10.85 / 16 cores
utime                17308.374
stime                533.184
nvcsw                210800         # 47.67%
nivcsw               231435         # 52.33%
inblock              0              # 0.00/sec
onblock              16448          # 10.00/sec
cpu-clock            17840403316898 # 17840.403 seconds
task-clock           17840962416578 # 17840.962 seconds
page faults          252232595      # 14137.836/sec
context switches     450222         # 25.235/sec
cpu migrations       1361           # 0.076/sec
major page faults    7744           # 0.434/sec
minor page faults    252224851      # 14137.402/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             12792964509200 # 102.333 branches per 1000 inst
branch misses        150169852450   # 1.17% branch miss
conditional          9860363999725  # 78.875 conditional branches per 1000 inst
indirect             568740984297   # 4.549 indirect branches per 1000 inst
cpu-cycles           61383485116311 # 2.76 GHz
instructions         104201051892088 # 1.70 IPC
slots                122743660326036 #
retiring             37571862517062 # 30.6% (49.6%)
-- ucode             28373669760    #     0.0%
-- fastpath          37543488847302 #    30.6%
frontend             4144904451957  #  3.4% ( 5.5%)
-- latency           3064929197046  #     2.5%
-- bandwidth         1079975254911  #     0.9%
backend              32869577134221 # 26.8% (43.4%)
-- cpu               27447483662627 #    22.4%
-- memory            5422093471594  #     4.4%
speculation          1234412203853  #  1.0% ( 1.6%)
-- branch mispredict 1212655803309  #     1.0%
-- pipeline restart  21756400544    #     0.0%
smt-contention       46922793804825 # 38.2% ( 0.0%)
cpu-cycles           61361382873177 # 2.76 GHz
instructions         104032576646593 # 1.70 IPC
instructions         34674101421249 # 5.491 l2 access per 1000 inst
l2 hit from l1       115093256997   # 24.08% l2 miss
l2 miss from l1      20719036688    #
l2 hit from l2 pf    50157748862    #
l3 hit from l2 pf    15765270503    #
l3 miss from l2 pf   9361897652     #
instructions         34673981783248 # 297.334 float per 1000 inst
float 512            72             # 0.000 AVX-512 per 1000 inst
float 256            366            # 0.000 AVX-256 per 1000 inst
float 128            10309748438666 # 297.334 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         0              # 0.000 scalar per 1000 inst

Intel metrics

elapsed              2541.963
on_cpu               0.664          # 10.62 / 16 cores
utime                26094.262
stime                908.881
nvcsw                393269         # 53.97%
nivcsw               335418         # 46.03%
inblock              194600         # 76.55/sec
onblock              5536           # 2.18/sec
cpu-clock            27000229963741 # 27000.230 seconds
task-clock           27000708582783 # 27000.709 seconds
page faults          560961765      # 20775.816/sec
context switches     741142         # 27.449/sec
cpu migrations       93087          # 3.448/sec
major page faults    11609          # 0.430/sec
minor page faults    560950155      # 20775.386/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             20986332656634 # 124.239 branches per 1000 inst
branch misses        60224093511    # 0.29% branch miss
conditional          20986332688570 # 124.239 conditional branches per 1000 inst
indirect             4793102645941  # 28.375 indirect branches per 1000 inst
slots                91805315111198 #
retiring             58807660233086 # 64.1% (64.1%)
-- ucode             6235444695905  #     6.8%
-- fastpath          52572215537181 #    57.3%
frontend             12103639365990 # 13.2% (13.2%)
-- latency           10394033274777 #    11.3%
-- bandwidth         1709606091213  #     1.9%
backend              17605664593201 # 19.2% (19.2%)
-- cpu               13055563780626 #    14.2%
-- memory            4550100812575  #     5.0%
speculation          3311541093858  #  3.6% ( 3.6%)
-- branch mispredict 3145341811481  #     3.4%
-- pipeline restart  166199282377   #     0.2%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           53809972855771 # 2.22 GHz
instructions         104620904496198 # 1.94 IPC
l2 access            405500583142   # 7.270 l2 access per 1000 inst
l2 miss              186008133845   # 45.87% l2 miss

Process structure shows the gm process is the workhorse.

666 processes
	291 gm                   228017.81  6656.12
	 68 clinfo                  16.54     5.99
	 38 vulkaninfo               0.95     1.33
	  6 glxinfo:gdrv0            0.16     0.07
	  6 php                      0.15     0.22
	  4 vulkani:disk$0           0.10     0.14
	  2 glxinfo                  0.07     0.03
	  2 glxinfo:cs0              0.07     0.03
	  2 glxinfo:disk$0           0.07     0.03
	  2 glxinfo:sh0              0.07     0.03
	  2 glxinfo:shlo0            0.07     0.03
	  6 clang                    0.06     0.05
	  2 llvmpipe-0               0.05     0.07
	  2 llvmpipe-1               0.05     0.07
	  2 llvmpipe-10              0.05     0.07
	  2 llvmpipe-11              0.05     0.07
	  2 llvmpipe-12              0.05     0.07
	  2 llvmpipe-13              0.05     0.07
	  2 llvmpipe-14              0.05     0.07
	  2 llvmpipe-15              0.05     0.07
	  2 llvmpipe-2               0.05     0.07
	  2 llvmpipe-3               0.05     0.07
	  2 llvmpipe-4               0.05     0.07
	  2 llvmpipe-5               0.05     0.07
	  2 llvmpipe-6               0.05     0.07
	  2 llvmpipe-7               0.05     0.07
	  2 llvmpipe-8               0.05     0.07
	  2 llvmpipe-9               0.05     0.07
	  3 rocminfo                 0.00     0.03
	  1 lspci                    0.00     0.02
	 94 sh                       0.00     0.00
	 21 graphics-magick          0.00     0.00
	 13 gcc                      0.00     0.00
	  9 gsettings                0.00     0.00
	  8 stat                     0.00     0.00
	  8 systemd-detect-          0.00     0.00
	  6 llvm-link                0.00     0.00
	  5 phoronix-test-s          0.00     0.00
	  4 gmain                    0.00     0.00
	  3 dconf worker             0.00     0.00
	  2 cc                       0.00     0.00
	  2 lscpu                    0.00     0.00
	  2 uname                    0.00     0.00
	  2 which                    0.00     0.00
	  2 xset                     0.00     0.00
	  1 date                     0.00     0.00
	  1 dirname                  0.00     0.00
	  1 dmesg                    0.00     0.00
	  1 dmidecode                0.00     0.00
	  1 grep                     0.00     0.00
	  1 ifconfig                 0.00     0.00
	  1 ip                       0.00     0.00
	  1 lsmod                    0.00     0.00
	  1 mktemp                   0.00     0.00
	  1 ps                       0.00     0.00
	  1 qdbus                    0.00     0.00
	  1 readlink                 0.00     0.00
	  1 realpath                 0.00     0.00
	  1 sed                      0.00     0.00
	  1 sort                     0.00     0.00
	  1 stty                     0.00     0.00
	  1 systemctl                0.00     0.00
	  1 template.sh              0.00     0.00
	  1 wc                       0.00     0.00
	  1 xrandr                   0.00     0.00
0 processes running
47 maximum processes

We seem to start one thread on each core.

      98515) graphics-magick  cpu=14 start=6.18  finish=66.22
        98516) gm               cpu=7 start=6.18  finish=66.22
          98517) gm               cpu=15 start=6.19  finish=66.22
          98518) gm               cpu=2 start=6.19  finish=66.22
          98519) gm               cpu=12 start=6.19  finish=66.22
          98520) gm               cpu=5 start=6.19  finish=66.22
          98521) gm               cpu=4 start=6.19  finish=66.22
          98522) gm               cpu=1 start=6.19  finish=66.22
          98523) gm               cpu=0 start=6.19  finish=66.22
          98524) gm               cpu=14 start=6.19  finish=66.22
          98525) gm               cpu=10 start=6.19  finish=66.22
          98526) gm               cpu=11 start=6.19  finish=66.22
          98527) gm               cpu=8 start=6.19  finish=66.22
          98528) gm               cpu=3 start=6.19  finish=66.22
          98529) gm               cpu=13 start=6.19  finish=66.22
          98530) gm               cpu=9 start=6.19  finish=66.22
          98531) gm               cpu=6 start=6.19  finish=66.22