Workload builds the Gem5 architectural simulator, much of the build looks parallel followed by a more serial link type step.

Topdown profile has some general trends of increasing and decreasing backend stalls vs frontend stalls with retirement rate fairly constant. Link steps are also more scattered.

AMD metrics show an average of ~3/4 of the cores. Not much floating point and a relatively low retirement rate. Overall frontend stalls are more than backend stalls.

elapsed              1810.174
on_cpu               0.798          # 12.77 / 16 cores
utime                21039.238
stime                2080.631
nvcsw                5803731        # 49.87%
nivcsw               5834033        # 50.13%
inblock              264736         # 146.25/sec
onblock              28346440       # 15659.51/sec
cpu-clock            23120522443545 # 23120.522 seconds
task-clock           23121037448160 # 23121.037 seconds
page faults          570590844      # 24678.427/sec
context switches     11599518       # 501.687/sec
cpu migrations       85722          # 3.708/sec
major page faults    355            # 0.015/sec
minor page faults    570590489      # 24678.412/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             16761584492164 # 212.300 branches per 1000 inst
branch misses        551611408858   # 3.29% branch miss
conditional          12834111922087 # 162.555 conditional branches per 1000 inst
indirect             447266406613   # 5.665 indirect branches per 1000 inst
cpu-cycles           93170670452725 # 3.21 GHz
instructions         78575451599875 # 0.84 IPC
slots                187059658915890 #
retiring             25539899492009 # 13.7% (16.7%)
-- ucode             28169462556    #     0.0%
-- fastpath          25511730029453 #    13.6%
frontend             66132587341647 # 35.4% (43.3%)
-- latency           50402032103478 #    26.9%
-- bandwidth         15730555238169 #     8.4%
backend              55784381168682 # 29.8% (36.5%)
-- cpu               5570341440845  #     3.0%
-- memory            50214039727837 #    26.8%
speculation          5229248647360  #  2.8% ( 3.4%)
-- branch mispredict 5186046814231  #     2.8%
-- pipeline restart  43201833129    #     0.0%
smt-contention       34373330798460 # 18.4% ( 0.0%)
cpu-cycles           93195909000520 # 3.22 GHz
instructions         78560812172363 # 0.84 IPC
instructions         26242593467181 # 56.818 l2 access per 1000 inst
l2 hit from l1       1312292335999  # 22.59% l2 miss
l2 miss from l1      243869038203   #
l2 hit from l2 pf    85842273432    #
l3 hit from l2 pf    33296301776    #
l3 miss from l2 pf   59615007749    #
instructions         26232003919217 # 16.024 float per 1000 inst
float 512            18926          # 0.000 AVX-512 per 1000 inst
float 256            15702223       # 0.001 AVX-256 per 1000 inst
float 128            420327679598   # 16.023 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         96             # 0.000 scalar per 1000 inst
instructions         2688226        #
opcache              992870         # 369.340 opcache per 1000 inst
opcache miss         533920         # 53.8% opcache miss rate
l1 dTLB miss         6899           # 2.566 L1 dTLB per 1000 inst
l2 dTLB miss         1234           # 0.459 L2 dTLB per 1000 inst
instructions         2731843        #
icache               1332619        # 487.810 icache per 1000 inst
icache miss          114026         #  8.6% icache miss rate
l1 iTLB miss         11             # 0.004 L1 iTLB per 1000 inst
l2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst
tlb flush            20             # 0.007 TLB flush per 1000 inst


Intel metrics

elapsed              2076.244
on_cpu               0.816          # 13.06 / 16 cores
utime                25333.567
stime                1777.001
nvcsw                5656806        # 49.63%
nivcsw               5740028        # 50.37%
inblock              5287080        # 2546.46/sec
onblock              28341616       # 13650.43/sec
cpu-clock            27109637330759 # 27109.637 seconds
task-clock           27110152714913 # 27110.153 seconds
page faults          570476396      # 21042.906/sec
context switches     11358674       # 418.982/sec
cpu migrations       77819          # 2.870/sec
major page faults    3385           # 0.125/sec
minor page faults    570473011      # 21042.781/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             16554533948074 # 210.331 branches per 1000 inst
branch misses        420682840418   # 2.54% branch miss
conditional          16554535761194 # 210.331 conditional branches per 1000 inst
indirect             3208214057581  # 40.761 indirect branches per 1000 inst
slots                131943668586950 #
retiring             41217833930568 # 31.2% (31.2%)
-- ucode             3112135628076  #     2.4%
-- fastpath          38105698302492 #    28.9%
frontend             48463557156999 # 36.7% (36.7%)
-- latency           27839065050420 #    21.1%
-- bandwidth         20624492106579 #    15.6%
backend              25399276106394 # 19.3% (19.3%)
-- cpu               7245422419378  #     5.5%
-- memory            18153853687016 #    13.8%
speculation          17330208537534 # 13.1% (13.1%) high
-- branch mispredict 16790007180191 #    12.7%
-- pipeline restart  540201357343   #     0.4%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           56299806242194 # 1.71 GHz
instructions         58419307240398 # 1.04 IPC
l2 access            2518816460984  # 58.099 l2 access per 1000 inst
l2 miss              761491659232   # 30.23% l2 miss
cpu-cycles           41773327892901 # 39.7% memory latency
load stalls          16220631545169 #  6.6% l1 bound
l1 miss              13482977988921 #  9.4% l2 bound
l2 miss              9548023073896  #  4.6% l3 bound
l3 miss              7615533491900  # 18.2% dram bound
store_stalls         357499861823   #  0.9% store bound

Process overview shows C++ code with most time in the c11plus front end.

56641 processes
	6774 cc1plus              18279.62  1401.27
	 54 scons                 2290.93   186.88
	7803 as                    1291.39   202.47
	2730 gem5py_m5              761.97    38.80
	 51 ld                     261.27    23.85
	2781 gem5py                 104.47     9.99
	 68 clinfo                  17.85     6.34
	1029 cc1                      1.66     0.19
	  3 gzip                     1.00     0.09
	 38 vulkaninfo               0.76     1.52
	  6 php                      0.18     0.43
	  6 glxinfo:gdrv0            0.13     0.05
	  6 glxinfo:gl0              0.13     0.05
	  4 vulkani:disk$0           0.08     0.16
	  4 python3                  0.08     0.03
	  2 glxinfo                  0.08     0.02
	  2 glxinfo:cs0              0.08     0.02
	  2 glxinfo:disk$0           0.08     0.02
	  2 glxinfo:sh0              0.08     0.02
	  2 glxinfo:shlo0            0.08     0.02
	  6 clang                    0.07     0.05
	  3 tar                      0.04     1.03
	  2 llvmpipe-0               0.04     0.08
	  2 llvmpipe-1               0.04     0.08
	  2 llvmpipe-10              0.04     0.08
	  2 llvmpipe-11              0.04     0.08
	  2 llvmpipe-12              0.04     0.08
	  2 llvmpipe-13              0.04     0.08
	  2 llvmpipe-14              0.04     0.08
	  2 llvmpipe-15              0.04     0.08
	  2 llvmpipe-2               0.04     0.08
	  2 llvmpipe-3               0.04     0.08
	  2 llvmpipe-4               0.04     0.08
	  2 llvmpipe-5               0.04     0.08
	  2 llvmpipe-6               0.04     0.08
	  2 llvmpipe-7               0.04     0.08
	  2 llvmpipe-8               0.04     0.08
	  2 llvmpipe-9               0.04     0.08
	  3 rocminfo                 0.03     0.00
	  7 rm                       0.02     2.94
	 21 ar                       0.02     0.07
	6833 g++                      0.00     0.06
	 21 ranlib                   0.00     0.06
	  1 lspci                    0.00     0.02
	  1 ps                       0.00     0.01
	26956 sh                       0.00     0.00
	1078 gcc                      0.00     0.00
	 66 python3-config           0.00     0.00
	 51 collect2                 0.00     0.00
	 28 sed                      0.00     0.00
	 13 dirname                  0.00     0.00
	 13 readlink                 0.00     0.00
	  9 m4                       0.00     0.00
	  9 native-elf-form          0.00     0.00
	  8 stat                     0.00     0.00
	  8 systemd-detect-          0.00     0.00
	  8 which                    0.00     0.00
	  6 gsettings                0.00     0.00
	  6 llvm-link                0.00     0.00
	  6 pkg-config               0.00     0.00
	  6 swig                     0.00     0.00
	  5 cc                       0.00     0.00
	  5 dconf worker             0.00     0.00
	  5 gmain                    0.00     0.00
	  5 phoronix-test-s          0.00     0.00
	  4 bash                     0.00     0.00
	  4 date                     0.00     0.00
	  4 mktemp                   0.00     0.00
	  3 awk                      0.00     0.00
	  3 basename                 0.00     0.00
	  3 build-gem5               0.00     0.00
	  3 conftest_2176b2          0.00     0.00
	  3 conftest_283931          0.00     0.00
	  3 readelf                  0.00     0.00
	  3 touch                    0.00     0.00
	  2 lscpu                    0.00     0.00
	  2 uname                    0.00     0.00
	  2 xset                     0.00     0.00
	  1 dmesg                    0.00     0.00
	  1 dmidecode                0.00     0.00
	  1 grep                     0.00     0.00
	  1 ifconfig                 0.00     0.00
	  1 ip                       0.00     0.00
	  1 lsmod                    0.00     0.00
	  1 python                   0.00     0.00
	  1 qdbus                    0.00     0.00
	  1 realpath                 0.00     0.00
	  1 sort                     0.00     0.00
	  1 stty                     0.00     0.00
	  1 systemctl                0.00     0.00
	  1 template.sh              0.00     0.00
	  1 wc                       0.00     0.00
	  1 xrandr                   0.00     0.00
1 processes running
108 maximum processes