A quick running ray tracer test.

Topdown metrics shows a fairly high retirement rate limited by backend stalls of both CPU and memory.

AMD metrics show running on all cores, floating point code and not many L2 accesses or misses.

elapsed              148.090
on_cpu               0.838          # 13.40 / 16 cores
utime                1983.478
stime                1.405
nvcsw                20374          # 34.56%
nivcsw               38574          # 65.44%
inblock              0              # 0.00/sec
onblock              14056          # 94.92/sec
cpu-clock            1984937360494  # 1984.937 seconds
task-clock           1984951444204  # 1984.951 seconds
page faults          198620         # 100.063/sec
context switches     59509          # 29.980/sec
cpu migrations       2780           # 1.401/sec
major page faults    2              # 0.001/sec
minor page faults    198618         # 100.062/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             1014202012293  # 73.753 branches per 1000 inst
branch misses        6191746506     # 0.61% branch miss
conditional          609554742644   # 44.327 conditional branches per 1000 inst
indirect             84296971221    # 6.130 indirect branches per 1000 inst
cpu-cycles           7888007229708  # 3.32 GHz
instructions         13748191578115 # 1.74 IPC
slots                15779680550700 #
retiring             4822507950863  # 30.6% (48.6%)
-- ucode             2839998364     #     0.0%
-- fastpath          4819667952499  #    30.5%
frontend             584429036683   #  3.7% ( 5.9%)
-- latency           422768735268   #     2.7%
-- bandwidth         161660301415   #     1.0%
backend              4314856481849  # 27.3% (43.4%)
-- cpu               2741261117778  #    17.4%
-- memory            1573595364071  #    10.0%
speculation          209239720319   #  1.3% ( 2.1%)
-- branch mispredict 165667444230   #     1.0%
-- pipeline restart  43572276089    #     0.3%
smt-contention       5848634155814  # 37.1% ( 0.0%)
cpu-cycles           7888366991559  # 3.33 GHz
instructions         13756479664683 # 1.74 IPC
instructions         4587432604742  # 18.648 l2 access per 1000 inst
l2 hit from l1       76718057387    # 0.50% l2 miss
l2 miss from l1      223100129      #
l2 hit from l2 pf    8622365520     #
l3 hit from l2 pf    188302969      #
l3 miss from l2 pf   15496089       #
instructions         4578337251665  # 214.073 float per 1000 inst
float 512            51             # 0.000 AVX-512 per 1000 inst
float 256            664            # 0.000 AVX-256 per 1000 inst
float 128            980097688996   # 214.073 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         4              # 0.000 scalar per 1000 inst

Intel metrics. The Intel version wasn’t as stable when running, perhaps differences between cores?

elapsed              807.771
on_cpu               0.880          # 14.07 / 16 cores
utime                11364.904
stime                2.490
nvcsw                80897          # 31.26%
nivcsw               177859         # 68.74%
inblock              1328           # 1.64/sec
onblock              7864           # 9.74/sec
cpu-clock            11367494151495 # 11367.494 seconds
task-clock           11367537895233 # 11367.538 seconds
page faults          360565         # 31.719/sec
context switches     262583         # 23.099/sec
cpu migrations       7004           # 0.616/sec
major page faults    14             # 0.001/sec
minor page faults    360551         # 31.718/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             4663954542631  # 70.318 branches per 1000 inst
branch misses        23249701772    # 0.50% branch miss
conditional          4663954574567  # 70.318 conditional branches per 1000 inst
indirect             1515855054426  # 22.855 indirect branches per 1000 inst
slots                11383013944694 #
retiring             7724310148011  # 67.9% (67.9%)
-- ucode             434825331904   #     3.8%
-- fastpath          7289484816107  #    64.0%
frontend             2693413158175  # 23.7% (23.7%)
-- latency           1649219459909  #    14.5%
-- bandwidth         1044193698266  #     9.2%
backend              741447952721   #  6.5% ( 6.5%)
-- cpu               415808685357   #     3.7%
-- memory            325639267364   #     2.9%
speculation          241057697650   #  2.1% ( 2.1%)
-- branch mispredict 230083087910   #     2.0%
-- pipeline restart  10974609740    #     0.1%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           6366658262732  # 2.26 GHz
instructions         12862934995320 # 2.02 IPC
l2 access            48169438096    # 6.398 l2 access per 1000 inst
l2 miss              502292780      # 1.04% l2 miss

Process overview

elapsed              807.771
on_cpu               0.880          # 14.07 / 16 cores
utime                11364.904
stime                2.490
nvcsw                80897          # 31.26%
nivcsw               177859         # 68.74%
inblock              1328           # 1.64/sec
onblock              7864           # 9.74/sec
cpu-clock            11367494151495 # 11367.494 seconds
task-clock           11367537895233 # 11367.538 seconds
page faults          360565         # 31.719/sec
context switches     262583         # 23.099/sec
cpu migrations       7004           # 0.616/sec
major page faults    14             # 0.001/sec
minor page faults    360551         # 31.718/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             4663954542631  # 70.318 branches per 1000 inst
branch misses        23249701772    # 0.50% branch miss
conditional          4663954574567  # 70.318 conditional branches per 1000 inst
indirect             1515855054426  # 22.855 indirect branches per 1000 inst
slots                11383013944694 #
retiring             7724310148011  # 67.9% (67.9%)
-- ucode             434825331904   #     3.8%
-- fastpath          7289484816107  #    64.0%
frontend             2693413158175  # 23.7% (23.7%)
-- latency           1649219459909  #    14.5%
-- bandwidth         1044193698266  #     9.2%
backend              741447952721   #  6.5% ( 6.5%)
-- cpu               415808685357   #     3.7%
-- memory            325639267364   #     2.9%
speculation          241057697650   #  2.1% ( 2.1%)
-- branch mispredict 230083087910   #     2.0%
-- pipeline restart  10974609740    #     0.1%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           6366658262732  # 2.26 GHz
instructions         12862934995320 # 2.02 IPC
l2 access            48169438096    # 6.398 l2 access per 1000 inst
l2 miss              502292780      # 1.04% l2 miss

Computation block

      2580699) povray           cpu=8 start=53.46 finish=97.40
        2580700) povray           cpu=9 start=53.46 finish=53.46
        2580701) povray           cpu=8 start=53.46 finish=97.39
          2580702) povray           cpu=3 start=53.46 finish=97.40
          2580703) povray           cpu=13 start=53.46 finish=97.39
            2580704) povray           cpu=3 start=53.47 finish=97.34
              2580705) povray           cpu=15 start=53.72 finish=97.14
                2580706) povray           cpu=9 start=53.72 finish=53.96
                2580707) povray           cpu=0 start=53.96 finish=53.96
              2580708) povray           cpu=5 start=53.97 finish=97.14
                2580709) povray           cpu=9 start=53.97 finish=53.97
                2580710) povray           cpu=10 start=53.97 finish=53.97
                2580711) povray           cpu=9 start=54.02 finish=55.04
                2580712) povray           cpu=10 start=54.02 finish=54.20
                2580713) povray           cpu=11 start=54.02 finish=54.02
                2580714) ?? cpu=0 start=54.03 finish=0.00 
                2580715) povray           cpu=15 start=54.03 finish=54.03
                2580716) povray           cpu=4 start=54.03 finish=54.03
                2580717) povray           cpu=14 start=54.03 finish=54.03
                2580718) povray           cpu=13 start=54.03 finish=54.03
                2580719) povray           cpu=11 start=54.03 finish=54.03
                2580720) povray           cpu=15 start=54.03 finish=54.03
                2580721) povray           cpu=12 start=54.03 finish=54.03
                2580722) povray           cpu=13 start=54.03 finish=54.03
                2580723) povray           cpu=13 start=54.03 finish=54.03
                2580724) povray           cpu=14 start=54.03 finish=54.03
                2580725) povray           cpu=13 start=54.03 finish=54.03
                2580726) povray           cpu=13 start=54.03 finish=54.03
                2580727) povray           cpu=9 start=55.08 finish=55.09
                2580728) povray           cpu=9 start=55.13 finish=95.89
                2580729) povray           cpu=3 start=55.13 finish=95.84
                2580730) povray           cpu=13 start=55.13 finish=96.33
                2580731) povray           cpu=14 start=55.13 finish=96.38
                2580732) povray           cpu=10 start=55.13 finish=96.65
                2580733) povray           cpu=11 start=55.13 finish=96.66
                2580734) povray           cpu=15 start=55.13 finish=96.56
                2580735) povray           cpu=12 start=55.13 finish=96.40
                2580736) povray           cpu=0 start=55.13 finish=95.82
                2580737) povray           cpu=8 start=55.13 finish=95.96
                2580738) povray           cpu=11 start=55.13 finish=95.81
                2580739) povray           cpu=0 start=55.13 finish=96.84
                2580740) povray           cpu=9 start=55.13 finish=97.00
                2580741) povray           cpu=15 start=55.13 finish=95.82
                2580742) povray           cpu=1 start=55.13 finish=96.27
                2580743) povray           cpu=2 start=55.13 finish=95.92