A test for scientific/numerical computing with FFT, Jacobi Successive Over-relaxation, Monte Carlo, Sparse Matrix Multiply and dense LU factorization. This is a single-threaded program with array type functions that likely benefit from -O3 optimizations. Also a care where relatively speaking my Intel CPU performs better.

Topdown profile shows differences among the six tests with one of them having particularly high branch misprediction.

AMD metrics show very low frontend stalls and not many L2 accesses.
elapsed 120.143
on_cpu 0.041 # 0.65 / 16 cores
utime 77.062
stime 0.901
nvcsw 2073 # 75.85%
nivcsw 660 # 24.15%
inblock 0 # 0.00/sec
onblock 12952 # 107.81/sec
cpu-clock 77989027760 # 77.989 seconds
task-clock 77992605087 # 77.993 seconds
page faults 171397 # 2197.606/sec
context switches 3130 # 40.132/sec
cpu migrations 277 # 3.552/sec
major page faults 2 # 0.026/sec
minor page faults 171395 # 2197.580/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 41435159232 # 45.318 branches per 1000 inst
branch misses 1064480394 # 2.57% branch miss
conditional 36540908643 # 39.965 conditional branches per 1000 inst
indirect 45509669 # 0.050 indirect branches per 1000 inst
cpu-cycles 374643884210 # 0.19 GHz
instructions 948713151358 # 2.53 IPC
slots 752276514300 #
retiring 330387842212 # 43.9% (43.9%)
-- ucode 10231549 # 0.0%
-- fastpath 330377610663 # 43.9%
frontend 34320778963 # 4.6% ( 4.6%) low
-- latency 22696737306 # 3.0%
-- bandwidth 11624041657 # 1.5%
backend 334210686266 # 44.4% (44.4%)
-- cpu 155565850726 # 20.7%
-- memory 178644835540 # 23.7%
speculation 53315171232 # 7.1% ( 7.1%)
-- branch mispredict 52405351963 # 7.0%
-- pipeline restart 909819269 # 0.1%
smt-contention 41777915 # 0.0% ( 0.0%)
cpu-cycles 356773196761 # 0.18 GHz
instructions 911718531981 # 2.56 IPC
instructions 304843400647 # 17.199 l2 access per 1000 inst
l2 hit from l1 2559865446 # 36.41% l2 miss
l2 miss from l1 112336155 #
l2 hit from l2 pf 886371975 #
l3 hit from l2 pf 1712944459 #
l3 miss from l2 pf 83870847 #
instructions 304871254285 # 138.593 float per 1000 inst
float 512 63 # 0.000 AVX-512 per 1000 inst
float 256 600 # 0.000 AVX-256 per 1000 inst
float 128 42252886621 # 138.593 AVX-128 per 1000 inst
float MMX 0 # 0.000 MMX per 1000 inst
float scalar 0 # 0.000 scalar per 1000 inst
Intel metrics
elapsed 111.250
on_cpu 0.049 # 0.79 / 16 cores
utime 86.983
stime 0.416
nvcsw 1833 # 82.68%
nivcsw 384 # 17.32%
inblock 0 # 0.00/sec
onblock 1704 # 15.32/sec
cpu-clock 87416742309 # 87.417 seconds
task-clock 87419658637 # 87.420 seconds
page faults 160844 # 1839.907/sec
context switches 2602 # 29.764/sec
cpu migrations 282 # 3.226/sec
major page faults 0 # 0.000/sec
minor page faults 160844 # 1839.907/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 49111974552 # 45.159 branches per 1000 inst
branch misses 1006651173 # 2.05% branch miss
conditional 49111986488 # 45.159 conditional branches per 1000 inst
indirect 66878428 # 0.061 indirect branches per 1000 inst
slots 2066702758058 #
retiring 1113671660547 # 53.9% (53.9%)
-- ucode 19320553958 # 0.9%
-- fastpath 1094351106589 # 53.0%
frontend 57972748845 # 2.8% ( 2.8%) low
-- latency 27368677487 # 1.3%
-- bandwidth 30604071358 # 1.5%
backend 687206848332 # 33.3% (33.3%)
-- cpu 510136569456 # 24.7%
-- memory 177070278876 # 8.6%
speculation 280845134496 # 13.6% (13.6%) high
-- branch mispredict 265676095678 # 12.9%
-- pipeline restart 15169038818 # 0.7%
smt-contention 0 # 0.0% ( 0.0%)
cpu-cycles 371966209251 # 0.19 GHz
instructions 1199189287856 # 3.22 IPC high
l2 access 19221667822 # 16.042 l2 access per 1000 inst
l2 miss 10810784391 # 56.24% l2 miss
Process summary shows six invocations of scimark2 and otherwise test overhead.
354 processes
6 scimark2 79.83 0.02
68 clinfo 19.51 6.99
38 vulkaninfo 1.52 1.14
4 vulkani:disk$0 0.16 0.12
6 glxinfo:gdrv0 0.15 0.05
6 glxinfo:gl0 0.15 0.05
2 llvmpipe-0 0.08 0.06
2 llvmpipe-1 0.08 0.06
2 llvmpipe-10 0.08 0.06
2 llvmpipe-11 0.08 0.06
2 llvmpipe-12 0.08 0.06
2 llvmpipe-13 0.08 0.06
2 llvmpipe-14 0.08 0.06
2 llvmpipe-15 0.08 0.06
2 llvmpipe-2 0.08 0.06
2 llvmpipe-3 0.08 0.06
2 llvmpipe-4 0.08 0.06
2 llvmpipe-5 0.08 0.06
2 llvmpipe-6 0.08 0.06
2 llvmpipe-7 0.08 0.06
2 llvmpipe-8 0.08 0.06
2 llvmpipe-9 0.08 0.06
6 php 0.07 0.09
2 glxinfo 0.07 0.03
2 glxinfo:cs0 0.07 0.03
2 glxinfo:disk$0 0.07 0.03
2 glxinfo:sh0 0.07 0.03
2 glxinfo:shlo0 0.07 0.03
6 clang 0.06 0.06
1 lspci 0.01 0.02
3 rocminfo 0.00 0.03
82 sh 0.00 0.00
13 gcc 0.00 0.00
8 gsettings 0.00 0.00
8 stat 0.00 0.00
8 systemd-detect- 0.00 0.00
6 llvm-link 0.00 0.00
5 gmain 0.00 0.00
5 phoronix-test-s 0.00 0.00
3 dconf worker 0.00 0.00
2 cc 0.00 0.00
2 lscpu 0.00 0.00
2 uname 0.00 0.00
2 which 0.00 0.00
2 xset 0.00 0.00
1 date 0.00 0.00
1 dirname 0.00 0.00
1 dmesg 0.00 0.00
1 dmidecode 0.00 0.00
1 grep 0.00 0.00
1 ifconfig 0.00 0.00
1 ip 0.00 0.00
1 lsmod 0.00 0.00
1 mktemp 0.00 0.00
1 ps 0.00 0.00
1 qdbus 0.00 0.00
1 readlink 0.00 0.00
1 realpath 0.00 0.00
1 sed 0.00 0.00
1 sort 0.00 0.00
1 stty 0.00 0.00
1 systemctl 0.00 0.00
1 template.sh 0.00 0.00
1 wc 0.00 0.00
1 xrandr 0.00 0.00
0 processes running
47 maximum processes
The computation block
1189178) scimark2 cpu=7 start=5.83 finish=35.09
1189179) scimark2 cpu=1 start=5.84 finish=35.09
1189181) scimark2 cpu=6 start=39.09 finish=64.41
1189182) scimark2 cpu=7 start=39.09 finish=64.41
1189184) scimark2 cpu=6 start=68.42 finish=93.77
1189185) scimark2 cpu=7 start=68.42 finish=93.77
