A multi-threaded DGEMM benchmark. This reports one score with a GLOP/s rate. As the name implies this runs on all the core. There are many separate runs underneath but not sure what size GEMMs are being tested though the process is invoked with “./mtdgemm 3072 4”

Topdown profile shows almost entirely memory bound program.

AMD metrics confirm floating point, very low retirement rate and high backend memory stalls.
elapsed 236.762
on_cpu 0.814 # 13.02 / 16 cores
utime 3082.168
stime 1.545
nvcsw 2176 # 6.99%
nivcsw 28943 # 93.01%
inblock 0 # 0.00/sec
onblock 12568 # 53.08/sec
cpu-clock 3084227120940 # 3084.227 seconds
task-clock 3084245086573 # 3084.245 seconds
page faults 313289 # 101.577/sec
context switches 32134 # 10.419/sec
cpu migrations 344 # 0.112/sec
major page faults 2 # 0.001/sec
minor page faults 313287 # 101.577/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 351139025627 # 143.215 branches per 1000 inst
branch misses 320886189 # 0.09% branch miss
conditional 349858597521 # 142.692 conditional branches per 1000 inst
indirect 53554288 # 0.022 indirect branches per 1000 inst
cpu-cycles 13433029028472 # 3.62 GHz
instructions 2454061328682 # 0.18 IPC low
slots 26852113381386 #
retiring 702637987249 # 2.6% ( 2.8%) low
-- ucode 48563383 # 0.0%
-- fastpath 702589423866 # 2.6%
frontend 164843205984 # 0.6% ( 0.6%) low
-- latency 130868974098 # 0.5%
-- bandwidth 33974231886 # 0.1%
backend 24525712173385 # 91.3% (96.2%) high
-- cpu 1884899227506 # 7.0%
-- memory 22640812945879 # 84.3%
speculation 91876384132 # 0.3% ( 0.4%) low
-- branch mispredict 12856371210 # 0.0%
-- pipeline restart 79020012922 # 0.3%
smt-contention 1367029251075 # 5.1% ( 0.0%)
cpu-cycles 13306883314532 # 3.60 GHz
instructions 2453753385490 # 0.18 IPC low
instructions 817406031890 # 448.524 l2 access per 1000 inst
l2 hit from l1 231006146862 # 36.78% l2 miss
l2 miss from l1 79974047304 #
l2 hit from l2 pf 80741481294 #
l3 hit from l2 pf 43129765140 #
l3 miss from l2 pf 11748482860 #
instructions 816967728215 # 141.991 float per 1000 inst
float 512 68 # 0.000 AVX-512 per 1000 inst
float 256 660 # 0.000 AVX-256 per 1000 inst
float 128 116002191550 # 141.991 AVX-128 per 1000 inst
float MMX 0 # 0.000 MMX per 1000 inst
float scalar 0 # 0.000 scalar per 1000 inst
instructions 2667710 #
opcache 992627 # 372.090 opcache per 1000 inst
opcache miss 530181 # 53.4% opcache miss rate
l1 dTLB miss 5737 # 2.151 L1 dTLB per 1000 inst
l2 dTLB miss 1200 # 0.450 L2 dTLB per 1000 inst
instructions 2679046 #
icache 1285981 # 480.015 icache per 1000 inst
icache miss 108585 # 8.4% icache miss rate
l1 iTLB miss 17 # 0.006 L1 iTLB per 1000 inst
l2 iTLB miss 0 # 0.000 L2 iTLB per 1000 inst
tlb flush 20 # 0.007 TLB flush per 1000 inst
Intel metrics show the memory stalls as being mostly dram with L3 also contributing.
elapsed 492.592
on_cpu 0.833 # 13.33 / 16 cores
utime 6565.321
stime 1.594
nvcsw 2251 # 4.08%
nivcsw 52966 # 95.92%
inblock 952 # 1.93/sec
onblock 1376 # 2.79/sec
cpu-clock 6567284228049 # 6567.284 seconds
task-clock 6567309903417 # 6567.310 seconds
page faults 414015 # 63.042/sec
context switches 57503 # 8.756/sec
cpu migrations 454 # 0.069/sec
major page faults 6 # 0.001/sec
minor page faults 414009 # 63.041/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 583567246701 # 166.739 branches per 1000 inst
branch misses 213359226 # 0.04% branch miss
conditional 583567261165 # 166.739 conditional branches per 1000 inst
indirect 290449592779 # 82.988 indirect branches per 1000 inst
slots 17661678302474 #
retiring 932467763006 # 5.3% ( 5.3%) low
-- ucode 22575755190 # 0.1%
-- fastpath 909892007816 # 5.2%
frontend 814357561668 # 4.6% ( 4.6%) low
-- latency 751489672407 # 4.3%
-- bandwidth 62867889261 # 0.4%
backend 15954992298439 # 90.3% (90.3%) high
-- cpu 1423967208298 # 8.1%
-- memory 14531025090141 # 82.3%
speculation 22870782998 # 0.1% ( 0.1%) low
-- branch mispredict 9848240863 # 0.1%
-- pipeline restart 13022542135 # 0.1%
smt-contention 0 # 0.0% ( 0.0%)
cpu-cycles 30406891078350 # 1.27 GHz
instructions 5395127285156 # 0.18 IPC low
l2 access 1894511816359 # 361.071 l2 access per 1000 inst
l2 miss 1005343361487 # 53.07% l2 miss
cpu-cycles 5951947247518 # 91.0% memory latency
load stalls 5414382577033 # 2.8% l1 bound
l1 miss 5248001289215 # 10.8% l2 bound
l2 miss 4606656595480 # 35.0% l3 bound
l3 miss 2523570196776 # 42.4% dram bound
store_stalls 449624468 # 0.0% store bound
Process overview shows not too many mtdgemm calls
416 processes
64 mtdgemm 66452.80 15.04
68 clinfo 16.86 5.99
38 vulkaninfo 1.14 1.15
4 vulkani:disk$0 0.12 0.13
6 php 0.09 0.10
6 glxinfo:gdrv0 0.08 0.10
6 glxinfo:gl0 0.08 0.10
6 clang 0.07 0.05
2 llvmpipe-0 0.06 0.07
2 llvmpipe-1 0.06 0.07
2 llvmpipe-10 0.06 0.07
2 llvmpipe-11 0.06 0.07
2 llvmpipe-12 0.06 0.07
2 llvmpipe-13 0.06 0.07
2 llvmpipe-14 0.06 0.07
2 llvmpipe-15 0.06 0.07
2 llvmpipe-2 0.06 0.07
2 llvmpipe-3 0.06 0.07
2 llvmpipe-4 0.06 0.07
2 llvmpipe-5 0.06 0.07
2 llvmpipe-6 0.06 0.07
2 llvmpipe-7 0.06 0.07
2 llvmpipe-8 0.06 0.07
2 llvmpipe-9 0.06 0.06
2 glxinfo 0.04 0.04
2 glxinfo:cs0 0.04 0.04
2 glxinfo:disk$0 0.04 0.04
2 glxinfo:sh0 0.04 0.04
2 glxinfo:shlo0 0.04 0.04
1 lspci 0.01 0.01
3 rocminfo 0.00 0.01
1 ps 0.00 0.01
82 sh 0.00 0.00
13 gcc 0.00 0.00
12 gsettings 0.00 0.00
8 stat 0.00 0.00
8 systemd-detect- 0.00 0.00
6 llvm-link 0.00 0.00
5 phoronix-test-s 0.00 0.00
4 mt-dgemm 0.00 0.00
3 gmain 0.00 0.00
2 cc 0.00 0.00
2 lscpu 0.00 0.00
2 uname 0.00 0.00
2 which 0.00 0.00
2 xset 0.00 0.00
1 date 0.00 0.00
1 dconf worker 0.00 0.00
1 dirname 0.00 0.00
1 dmesg 0.00 0.00
1 dmidecode 0.00 0.00
1 grep 0.00 0.00
1 ifconfig 0.00 0.00
1 ip 0.00 0.00
1 lsmod 0.00 0.00
1 mktemp 0.00 0.00
1 qdbus 0.00 0.00
1 readlink 0.00 0.00
1 realpath 0.00 0.00
1 sed 0.00 0.00
1 sort 0.00 0.00
1 stty 0.00 0.00
1 systemctl 0.00 0.00
1 template.sh 0.00 0.00
1 wc 0.00 0.00
1 xrandr 0.00 0.00
0 processes running
47 maximum processes
Computation blocks
408399) mt-dgemm cpu=3 start=5.61 finish=77.72
408400) mtdgemm cpu=0 start=5.61 finish=77.72
408401) mtdgemm cpu=8 start=5.61 finish=77.72
408402) mtdgemm cpu=9 start=5.61 finish=77.72
408403) mtdgemm cpu=1 start=5.61 finish=77.72
408404) mtdgemm cpu=2 start=5.61 finish=77.72
408405) mtdgemm cpu=10 start=5.61 finish=77.72
408406) mtdgemm cpu=11 start=5.61 finish=77.72
408407) mtdgemm cpu=3 start=5.61 finish=77.72
408408) mtdgemm cpu=4 start=5.61 finish=77.72
408409) mtdgemm cpu=12 start=5.61 finish=77.72
408410) mtdgemm cpu=13 start=5.61 finish=77.72
408411) mtdgemm cpu=5 start=5.61 finish=77.72
408412) mtdgemm cpu=6 start=5.61 finish=77.72
408413) mtdgemm cpu=14 start=5.61 finish=77.72
408414) mtdgemm cpu=7 start=5.61 finish=77.72
408415) mtdgemm cpu=15 start=5.61 finish=77.72
