A test of memory and cache bandwidth. There are three subtests: read, write and read/modify/write. From the profile below these look to be single-threaded and taking similar times.

Topdown profile shows the read is memory-bound, write is memory-bound but at lower level and read/write/modify has the highest retirement rate.

AMD metrics show a lot of floating point and relatively low L2 rate.
elapsed 1178.850
on_cpu 0.060 # 0.96 / 16 cores
utime 1126.046
stime 0.815
nvcsw 2097 # 32.65%
nivcsw 4325 # 67.35%
inblock 16 # 0.01/sec
onblock 12792 # 10.85/sec
cpu-clock 1127005636652 # 1127.006 seconds
task-clock 1127015577254 # 1127.016 seconds
page faults 188978 # 167.680/sec
context switches 12122 # 10.756/sec
cpu migrations 292 # 0.259/sec
major page faults 3 # 0.003/sec
minor page faults 188975 # 167.677/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 3634315461911 # 196.499 branches per 1000 inst
branch misses 9696274332 # 0.27% branch miss
conditional 3632170862135 # 196.383 conditional branches per 1000 inst
indirect 371552984 # 0.020 indirect branches per 1000 inst
cpu-cycles 5264528460862 # 0.28 GHz
instructions 18466319602366 # 3.51 IPC high
slots 10530906718080 #
retiring 4944646133622 # 47.0% (47.0%)
-- ucode 122706597 # 0.0%
-- fastpath 4944523427025 # 47.0%
frontend 1370144825102 # 13.0% (13.0%)
-- latency 295123694280 # 2.8%
-- bandwidth 1075021130822 # 10.2%
backend 4093208221590 # 38.9% (38.9%)
-- cpu 3679471645191 # 34.9%
-- memory 413736576399 # 3.9%
speculation 122376475801 # 1.2% ( 1.2%)
-- branch mispredict 122366430335 # 1.2%
-- pipeline restart 10045466 # 0.0%
smt-contention 530236733 # 0.0% ( 0.0%)
cpu-cycles 5263576000968 # 0.28 GHz
instructions 18472923856823 # 3.51 IPC high
instructions 6157833251749 # 6.829 l2 access per 1000 inst
l2 hit from l1 38763136163 # 0.06% l2 miss
l2 miss from l1 13623829 #
l2 hit from l2 pf 3276978159 #
l3 hit from l2 pf 5015543 #
l3 miss from l2 pf 5931174 #
instructions 6159376479220 # 375.514 float per 1000 inst
float 512 46 # 0.000 AVX-512 per 1000 inst
float 256 610 # 0.000 AVX-256 per 1000 inst
float 128 2312930817690 # 375.514 AVX-128 per 1000 inst
float MMX 0 # 0.000 MMX per 1000 inst
float scalar 0 # 0.000 scalar per 1000 inst
instructions 18504565007273 #
opcache 2049762220432 # 110.771 opcache per 1000 inst
opcache miss 1017908492 # 0.0% opcache miss rate
l1 dTLB miss 28190064 # 0.002 L1 dTLB per 1000 inst
l2 dTLB miss 5684855 # 0.000 L2 dTLB per 1000 inst
instructions 18499508052063 #
icache 2286810824 # 0.124 icache per 1000 inst
icache miss 211834745 # 9.3% icache miss rate
l1 iTLB miss 8797810 # 0.000 L1 iTLB per 1000 inst
l2 iTLB miss 0 # 0.000 L2 iTLB per 1000 inst
tlb flush 17235 # 0.000 TLB flush per 1000 inst
Intel metrics break things down by levels of memory hierarchy with the L1 being the highest amount.
elapsed 1178.774
on_cpu 0.060 # 0.96 / 16 cores
utime 1126.308
stime 0.563
nvcsw 2104 # 29.23%
nivcsw 5094 # 70.77%
inblock 1136 # 0.96/sec
onblock 1504 # 1.28/sec
cpu-clock 1126963839893 # 1126.964 seconds
task-clock 1126971932472 # 1126.972 seconds
page faults 184328 # 163.560/sec
context switches 12897 # 11.444/sec
cpu migrations 1050 # 0.932/sec
major page faults 0 # 0.000/sec
minor page faults 184328 # 163.560/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 3606911258653 # 199.754 branches per 1000 inst
branch misses 7048723296 # 0.20% branch miss
conditional 3606911271773 # 199.754 conditional branches per 1000 inst
indirect 633890851 # 0.035 indirect branches per 1000 inst
slots 25618559493746 #
retiring 14217732274034 # 55.5% (55.5%) high
-- ucode 484349856141 # 1.9%
-- fastpath 13733382417893 # 53.6%
frontend 4621490630497 # 18.0% (18.0%)
-- latency 881073797992 # 3.4%
-- bandwidth 3740416832505 # 14.6%
backend 6967643203188 # 27.2% (27.2%)
-- cpu 5626301638829 # 22.0%
-- memory 1341341564359 # 5.2%
speculation 125507624550 # 0.5% ( 0.5%) low
-- branch mispredict 125315315281 # 0.5%
-- pipeline restart 192309269 # 0.0%
smt-contention 0 # 0.0% ( 0.0%)
cpu-cycles 4270504875023 # 0.23 GHz
instructions 18137358870259 # 4.25 IPC high
l2 access 59921390459 # 3.304 l2 access per 1000 inst
l2 miss 207094023 # 0.35% l2 miss
cpu-cycles 4270576848122 # 11.3% memory latency
load stalls 430798387441 # 8.3% l1 bound
l1 miss 76131137261 # 1.8% l2 bound
l2 miss 433393833 # 0.0% l3 bound
l3 miss 222225184 # 0.0% dram bound
store_stalls 52809562442 # 1.2% store bound
Process summary shows execution mostly in a cachebench executable.
370 processes
18 cachebench 1125.03 0.02
68 clinfo 18.19 5.98
38 vulkaninfo 1.15 1.15
6 glxinfo:gdrv0 0.16 0.06
6 glxinfo:gl0 0.16 0.06
4 vulkani:disk$0 0.13 0.13
6 php 0.08 0.19
2 glxinfo 0.08 0.02
2 glxinfo:cs0 0.08 0.02
2 glxinfo:disk$0 0.08 0.02
2 glxinfo:sh0 0.08 0.02
2 glxinfo:shlo0 0.08 0.02
2 llvmpipe-0 0.07 0.07
2 llvmpipe-1 0.07 0.07
2 llvmpipe-2 0.07 0.07
2 llvmpipe-3 0.07 0.07
2 llvmpipe-4 0.07 0.07
2 llvmpipe-5 0.07 0.07
2 llvmpipe-6 0.07 0.07
2 llvmpipe-7 0.07 0.07
2 llvmpipe-8 0.07 0.07
2 llvmpipe-10 0.06 0.07
2 llvmpipe-11 0.06 0.07
2 llvmpipe-12 0.06 0.07
2 llvmpipe-13 0.06 0.07
2 llvmpipe-14 0.06 0.07
2 llvmpipe-15 0.06 0.07
2 llvmpipe-9 0.06 0.07
6 clang 0.06 0.06
3 rocminfo 0.03 0.00
1 lspci 0.00 0.02
1 ps 0.00 0.01
86 sh 0.00 0.00
13 gcc 0.00 0.00
10 gsettings 0.00 0.00
8 stat 0.00 0.00
8 systemd-detect- 0.00 0.00
6 llvm-link 0.00 0.00
5 gmain 0.00 0.00
5 phoronix-test-s 0.00 0.00
2 cc 0.00 0.00
2 lscpu 0.00 0.00
2 uname 0.00 0.00
2 which 0.00 0.00
2 xset 0.00 0.00
1 date 0.00 0.00
1 dconf worker 0.00 0.00
1 dirname 0.00 0.00
1 dmesg 0.00 0.00
1 dmidecode 0.00 0.00
1 grep 0.00 0.00
1 ifconfig 0.00 0.00
1 ip 0.00 0.00
1 lsmod 0.00 0.00
1 mktemp 0.00 0.00
1 qdbus 0.00 0.00
1 readlink 0.00 0.00
1 realpath 0.00 0.00
1 sed 0.00 0.00
1 sort 0.00 0.00
1 stty 0.00 0.00
1 systemctl 0.00 0.00
1 template.sh 0.00 0.00
1 wc 0.00 0.00
1 xrandr 0.00 0.00
0 processes running
47 maximum processes
Computation blocks are straightforward
7982) cachebench cpu=14 start=5.68 finish=130.71
7983) cachebench cpu=14 start=5.69 finish=130.71
7987) cachebench cpu=8 start=134.71 finish=259.74
7988) cachebench cpu=1 start=134.72 finish=259.74
7990) cachebench cpu=0 start=263.74 finish=388.77
7991) cachebench cpu=9 start=263.75 finish=388.77
7993) sh cpu=10 start=388.77 finish=388.77
7994) sh cpu=3 start=388.77 finish=388.77
7995) cachebench cpu=3 start=399.06 finish=524.08
7996) cachebench cpu=12 start=399.06 finish=524.08
7997) cachebench cpu=10 start=528.09 finish=653.11
7998) cachebench cpu=3 start=528.09 finish=653.11
8002) cachebench cpu=2 start=657.12 finish=782.14
8003) cachebench cpu=11 start=657.12 finish=782.14
8004) sh cpu=2 start=782.14 finish=782.15
8005) sh cpu=11 start=782.15 finish=782.15
8006) cachebench cpu=2 start=792.33 finish=917.35
8007) cachebench cpu=3 start=792.33 finish=917.35
8009) cachebench cpu=2 start=921.36 finish=1046.38
8010) cachebench cpu=3 start=921.36 finish=1046.38
8011) cachebench cpu=10 start=1050.39 finish=1175.41
8012) cachebench cpu=3 start=1050.39 finish=1175.41
This is one where breaking out the subtests might show different sets of counter values.
