hmmer is scientific code looking through profile hidden markov models. There is one test where the goal is to minimize time. This is parallel code running on half the cores. Looking at Intel code suggests it runs on cores but not hyperthreaded.

Topdown profile shows a moderate retirement rate with some frontend stalls and not as many backend stalls.

AMD metrics show heavily floating point code with a low level of L2 access. Backend stalls are cpu-centric. So this would be a good code to drill lower on cpu-centric bottlenecks. Also a good candidate to try AVX-256 at least.
elapsed 331.598
on_cpu 0.472 # 7.56 / 16 cores
utime 2446.019
stime 60.105
nvcsw 1974074 # 98.50%
nivcsw 30023 # 1.50%
inblock 2888 # 8.71/sec
onblock 13016 # 39.25/sec
cpu-clock 2512599080302 # 2512.599 seconds
task-clock 2513608145421 # 2513.608 seconds
page faults 551481 # 219.398/sec
context switches 2005436 # 797.832/sec
cpu migrations 130384 # 51.871/sec
major page faults 101 # 0.040/sec
minor page faults 551380 # 219.358/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 2258521454096 # 92.354 branches per 1000 inst
branch misses 28635942877 # 1.27% branch miss
conditional 2126819155402 # 86.969 conditional branches per 1000 inst
indirect 17843603406 # 0.730 indirect branches per 1000 inst
cpu-cycles 10482667685704 # 1.97 GHz
instructions 24364658068770 # 2.32 IPC
slots 21107611910478 #
retiring 7659667626542 # 36.3% (44.4%)
-- ucode 2206940738 # 0.0%
-- fastpath 7657460685804 # 36.3%
frontend 3554660839585 # 16.8% (20.6%)
-- latency 1513553670744 # 7.2%
-- bandwidth 2041107168841 # 9.7%
backend 4871025342216 # 23.1% (28.3%)
-- cpu 4439682897969 # 21.0%
-- memory 431342444247 # 2.0%
speculation 1151927521901 # 5.5% ( 6.7%)
-- branch mispredict 1148950630289 # 5.4%
-- pipeline restart 2976891612 # 0.0%
smt-contention 3870104848807 # 18.3% ( 0.0%)
cpu-cycles 10487395728592 # 1.97 GHz
instructions 24380098876766 # 2.32 IPC
instructions 8155654202288 # 10.013 l2 access per 1000 inst
l2 hit from l1 68286282185 # 1.92% l2 miss
l2 miss from l1 1012878111 #
l2 hit from l2 pf 12822770451 #
l3 hit from l2 pf 467457741 #
l3 miss from l2 pf 88265283 #
instructions 8149997728883 # 587.540 float per 1000 inst
float 512 81 # 0.000 AVX-512 per 1000 inst
float 256 584 # 0.000 AVX-256 per 1000 inst
float 128 4788447283765 # 587.540 AVX-128 per 1000 inst
float MMX 0 # 0.000 MMX per 1000 inst
float scalar 2 # 0.000 scalar per 1000 inst
Intel metrics show mostly running on 12 cores.
elapsed 459.336
on_cpu 0.721 # 11.54 / 16 cores
utime 5237.164
stime 65.096
nvcsw 3057554 # 94.06%
nivcsw 193158 # 5.94%
inblock 320 # 0.70/sec
onblock 1768 # 3.85/sec
cpu-clock 5308742804653 # 5308.743 seconds
task-clock 5309513369462 # 5309.513 seconds
page faults 750501 # 141.350/sec
context switches 3252164 # 612.516/sec
cpu migrations 639780 # 120.497/sec
major page faults 18 # 0.003/sec
minor page faults 750483 # 141.347/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 3382433495485 # 92.205 branches per 1000 inst
branch misses 40757068576 # 1.20% branch miss
conditional 3382447770653 # 92.206 conditional branches per 1000 inst
indirect 1567243251234 # 42.723 indirect branches per 1000 inst
slots 30187099759652 #
retiring 15109005184623 # 50.1% (50.1%)
-- ucode 126581066445 # 0.4%
-- fastpath 14982424118178 # 49.6%
frontend 4109048417357 # 13.6% (13.6%)
-- latency 1786065710888 # 5.9%
-- bandwidth 2322982706469 # 7.7%
backend 7816602085255 # 25.9% (25.9%)
-- cpu 5358445823079 # 17.8%
-- memory 2458156262176 # 8.1%
speculation 3111458169602 # 10.3% (10.3%)
-- branch mispredict 3096804788571 # 10.3%
-- pipeline restart 14653381031 # 0.0%
smt-contention 0 # 0.0% ( 0.0%)
cpu-cycles 17094061098071 # 2.32 GHz
instructions 41367833831551 # 2.42 IPC
l2 access 50706897733 # 3.145 l2 access per 1000 inst
l2 miss 4410666064 # 8.70% l2 miss
Process tree shows a huge number of threads, though interesting they don’t show up on process runable above
297507 processes
297144 hmmsearch 15169403.33 214883.04
68 clinfo 16.53 6.32
18 mpirun 7.41 66.93
38 vulkaninfo 0.95 1.23
6 php 0.44 4.27
6 glxinfo:gdrv0 0.15 0.06
4 vulkani:disk$0 0.10 0.13
2 glxinfo 0.07 0.02
2 glxinfo:cs0 0.07 0.02
2 glxinfo:disk$0 0.07 0.02
2 glxinfo:sh0 0.07 0.02
2 glxinfo:shlo0 0.07 0.02
6 clang 0.06 0.06
2 llvmpipe-0 0.05 0.07
2 llvmpipe-1 0.05 0.07
2 llvmpipe-10 0.05 0.07
2 llvmpipe-11 0.05 0.07
2 llvmpipe-12 0.05 0.07
2 llvmpipe-13 0.05 0.07
2 llvmpipe-14 0.05 0.07
2 llvmpipe-15 0.05 0.07
2 llvmpipe-2 0.05 0.07
2 llvmpipe-3 0.05 0.07
2 llvmpipe-4 0.05 0.07
2 llvmpipe-5 0.05 0.07
2 llvmpipe-6 0.05 0.07
2 llvmpipe-7 0.05 0.07
2 llvmpipe-8 0.05 0.07
2 llvmpipe-9 0.05 0.07
1 lspci 0.01 0.02
3 rocminfo 0.00 0.03
1 ps 0.00 0.01
82 sh 0.00 0.00
13 gcc 0.00 0.00
11 gsettings 0.00 0.00
8 stat 0.00 0.00
8 systemd-detect- 0.00 0.00
6 llvm-link 0.00 0.00
5 phoronix-test-s 0.00 0.00
4 gmain 0.00 0.00
3 hmmer 0.00 0.00
2 cc 0.00 0.00
2 lscpu 0.00 0.00
2 uname 0.00 0.00
2 which 0.00 0.00
2 xset 0.00 0.00
1 date 0.00 0.00
1 dconf worker 0.00 0.00
1 dirname 0.00 0.00
1 dmesg 0.00 0.00
1 dmidecode 0.00 0.00
1 grep 0.00 0.00
1 ifconfig 0.00 0.00
1 ip 0.00 0.00
1 lsmod 0.00 0.00
1 mktemp 0.00 0.00
1 qdbus 0.00 0.00
1 readlink 0.00 0.00
1 realpath 0.00 0.00
1 sed 0.00 0.00
1 sort 0.00 0.00
1 stty 0.00 0.00
1 systemctl 0.00 0.00
1 template.sh 0.00 0.00
1 wc 0.00 0.00
1 xrandr 0.00 0.00
0 processes running
47 maximum processes
The computation blocks show starts of these quick threads
1600545) hmmer cpu=0 start=5.74 finish=110.22
1600546) mpirun cpu=7 start=5.74 finish=110.19
1600549) mpirun cpu=12 start=6.33 finish=110.19
1600550) mpirun cpu=13 start=6.33 finish=6.33
1600551) mpirun cpu=10 start=6.35 finish=110.19
1600552) mpirun cpu=12 start=6.83 finish=110.19
1600553) mpirun cpu=1 start=6.83 finish=110.19
1600554) hmmsearch cpu=9 start=6.86 finish=109.62
1600560) hmmsearch cpu=3 start=6.89 finish=6.89
1600561) hmmsearch cpu=1 start=6.89 finish=6.89
1600568) hmmsearch cpu=14 start=6.90 finish=6.90
1600569) hmmsearch cpu=9 start=6.90 finish=6.90
1600588) hmmsearch cpu=9 start=6.92 finish=6.92
1600589) hmmsearch cpu=2 start=6.92 finish=6.92
1600608) hmmsearch cpu=4 start=6.94 finish=6.94
1600609) hmmsearch cpu=15 start=6.94 finish=6.94
1600624) hmmsearch cpu=13 start=6.96 finish=6.96
1600625) hmmsearch cpu=15 start=6.96 finish=6.96
1600634) hmmsearch cpu=13 start=6.97 finish=6.97
1600635) hmmsearch cpu=15 start=6.97 finish=6.97
1600648) hmmsearch cpu=15 start=6.99 finish=6.99
1600649) hmmsearch cpu=13 start=6.99 finish=6.99
1600656) hmmsearch cpu=15 start=7.00 finish=7.00
1600657) hmmsearch cpu=13 start=7.00 finish=7.00
1600682) hmmsearch cpu=6 start=7.02 finish=7.02
1600683) hmmsearch cpu=5 start=7.02 finish=7.02
1600702) hmmsearch cpu=0 start=7.04 finish=7.04
1600703) hmmsearch cpu=5 start=7.04 finish=7.04
1600722) hmmsearch cpu=14 start=7.07 finish=7.07
1600723) hmmsearch cpu=0 start=7.07 finish=7.07
1600738) hmmsearch cpu=9 start=7.09 finish=7.09
1600739) hmmsearch cpu=6 start=7.09 finish=7.09
1600748) hmmsearch cpu=12 start=7.11 finish=7.11
1600749) hmmsearch cpu=11 start=7.11 finish=7.11
1600762) hmmsearch cpu=14 start=7.12 finish=7.12
1600763) hmmsearch cpu=11 start=7.12 finish=7.12
1600774) hmmsearch cpu=6 start=7.14 finish=7.14
1600775) hmmsearch cpu=11 start=7.14 finish=7.14
1600788) hmmsearch cpu=13 start=7.15 finish=7.15
1600789) hmmsearch cpu=3 start=7.15 finish=7.15
1600822) hmmsearch cpu=11 start=7.20 finish=7.20
1600823) hmmsearch cpu=4 start=7.20 finish=7.20
1600836) hmmsearch cpu=12 start=7.21 finish=7.21
1600837) hmmsearch cpu=3 start=7.21 finish=7.21
1600848) hmmsearch cpu=13 start=7.22 finish=7.22
1600849) hmmsearch cpu=4 start=7.22 finish=7.22
1600858) hmmsearch cpu=12 start=7.24 finish=7.24
1600859) hmmsearch cpu=1 start=7.24 finish=7.24
1600876) hmmsearch cpu=13 start=7.26 finish=7.26
1600877) hmmsearch cpu=4 start=7.26 finish=7.26
1600892) hmmsearch cpu=5 start=7.28 finish=7.28
1600893) hmmsearch cpu=4 start=7.28 finish=7.28
1600902) hmmsearch cpu=12 start=7.29 finish=7.29
1600903) hmmsearch cpu=5 start=7.29 finish=7.29
1600924) hmmsearch cpu=13 start=7.32 finish=7.32
1600925) hmmsearch cpu=12 start=7.32 finish=7.32
1600934) hmmsearch cpu=5 start=7.34 finish=7.34
1600935) hmmsearch cpu=12 start=7.34 finish=7.34
