hmmer is scientific code looking through profile hidden markov models. There is one test where the goal is to minimize time. This is parallel code running on half the cores. Looking at Intel code suggests it runs on cores but not hyperthreaded.

Topdown profile shows a moderate retirement rate with some frontend stalls and not as many backend stalls.

AMD metrics show heavily floating point code with a low level of L2 access. Backend stalls are cpu-centric. So this would be a good code to drill lower on cpu-centric bottlenecks. Also a good candidate to try AVX-256 at least.

elapsed              331.598
on_cpu               0.472          # 7.56 / 16 cores
utime                2446.019
stime                60.105
nvcsw                1974074        # 98.50%
nivcsw               30023          # 1.50%
inblock              2888           # 8.71/sec
onblock              13016          # 39.25/sec
cpu-clock            2512599080302  # 2512.599 seconds
task-clock           2513608145421  # 2513.608 seconds
page faults          551481         # 219.398/sec
context switches     2005436        # 797.832/sec
cpu migrations       130384         # 51.871/sec
major page faults    101            # 0.040/sec
minor page faults    551380         # 219.358/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             2258521454096  # 92.354 branches per 1000 inst
branch misses        28635942877    # 1.27% branch miss
conditional          2126819155402  # 86.969 conditional branches per 1000 inst
indirect             17843603406    # 0.730 indirect branches per 1000 inst
cpu-cycles           10482667685704 # 1.97 GHz
instructions         24364658068770 # 2.32 IPC
slots                21107611910478 #
retiring             7659667626542  # 36.3% (44.4%)
-- ucode             2206940738     #     0.0%
-- fastpath          7657460685804  #    36.3%
frontend             3554660839585  # 16.8% (20.6%)
-- latency           1513553670744  #     7.2%
-- bandwidth         2041107168841  #     9.7%
backend              4871025342216  # 23.1% (28.3%)
-- cpu               4439682897969  #    21.0%
-- memory            431342444247   #     2.0%
speculation          1151927521901  #  5.5% ( 6.7%)
-- branch mispredict 1148950630289  #     5.4%
-- pipeline restart  2976891612     #     0.0%
smt-contention       3870104848807  # 18.3% ( 0.0%)
cpu-cycles           10487395728592 # 1.97 GHz
instructions         24380098876766 # 2.32 IPC
instructions         8155654202288  # 10.013 l2 access per 1000 inst
l2 hit from l1       68286282185    # 1.92% l2 miss
l2 miss from l1      1012878111     #
l2 hit from l2 pf    12822770451    #
l3 hit from l2 pf    467457741      #
l3 miss from l2 pf   88265283       #
instructions         8149997728883  # 587.540 float per 1000 inst
float 512            81             # 0.000 AVX-512 per 1000 inst
float 256            584            # 0.000 AVX-256 per 1000 inst
float 128            4788447283765  # 587.540 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         2              # 0.000 scalar per 1000 inst

Intel metrics show mostly running on 12 cores.

elapsed              459.336
on_cpu               0.721          # 11.54 / 16 cores
utime                5237.164
stime                65.096
nvcsw                3057554        # 94.06%
nivcsw               193158         # 5.94%
inblock              320            # 0.70/sec
onblock              1768           # 3.85/sec
cpu-clock            5308742804653  # 5308.743 seconds
task-clock           5309513369462  # 5309.513 seconds
page faults          750501         # 141.350/sec
context switches     3252164        # 612.516/sec
cpu migrations       639780         # 120.497/sec
major page faults    18             # 0.003/sec
minor page faults    750483         # 141.347/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             3382433495485  # 92.205 branches per 1000 inst
branch misses        40757068576    # 1.20% branch miss
conditional          3382447770653  # 92.206 conditional branches per 1000 inst
indirect             1567243251234  # 42.723 indirect branches per 1000 inst
slots                30187099759652 #
retiring             15109005184623 # 50.1% (50.1%)
-- ucode             126581066445   #     0.4%
-- fastpath          14982424118178 #    49.6%
frontend             4109048417357  # 13.6% (13.6%)
-- latency           1786065710888  #     5.9%
-- bandwidth         2322982706469  #     7.7%
backend              7816602085255  # 25.9% (25.9%)
-- cpu               5358445823079  #    17.8%
-- memory            2458156262176  #     8.1%
speculation          3111458169602  # 10.3% (10.3%)
-- branch mispredict 3096804788571  #    10.3%
-- pipeline restart  14653381031    #     0.0%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           17094061098071 # 2.32 GHz
instructions         41367833831551 # 2.42 IPC
l2 access            50706897733    # 3.145 l2 access per 1000 inst
l2 miss              4410666064     # 8.70% l2 miss

Process tree shows a huge number of threads, though interesting they don’t show up on process runable above

297507 processes
	297144 hmmsearch            15169403.33 214883.04
	 68 clinfo                  16.53     6.32
	 18 mpirun                   7.41    66.93
	 38 vulkaninfo               0.95     1.23
	  6 php                      0.44     4.27
	  6 glxinfo:gdrv0            0.15     0.06
	  4 vulkani:disk$0           0.10     0.13
	  2 glxinfo                  0.07     0.02
	  2 glxinfo:cs0              0.07     0.02
	  2 glxinfo:disk$0           0.07     0.02
	  2 glxinfo:sh0              0.07     0.02
	  2 glxinfo:shlo0            0.07     0.02
	  6 clang                    0.06     0.06
	  2 llvmpipe-0               0.05     0.07
	  2 llvmpipe-1               0.05     0.07
	  2 llvmpipe-10              0.05     0.07
	  2 llvmpipe-11              0.05     0.07
	  2 llvmpipe-12              0.05     0.07
	  2 llvmpipe-13              0.05     0.07
	  2 llvmpipe-14              0.05     0.07
	  2 llvmpipe-15              0.05     0.07
	  2 llvmpipe-2               0.05     0.07
	  2 llvmpipe-3               0.05     0.07
	  2 llvmpipe-4               0.05     0.07
	  2 llvmpipe-5               0.05     0.07
	  2 llvmpipe-6               0.05     0.07
	  2 llvmpipe-7               0.05     0.07
	  2 llvmpipe-8               0.05     0.07
	  2 llvmpipe-9               0.05     0.07
	  1 lspci                    0.01     0.02
	  3 rocminfo                 0.00     0.03
	  1 ps                       0.00     0.01
	 82 sh                       0.00     0.00
	 13 gcc                      0.00     0.00
	 11 gsettings                0.00     0.00
	  8 stat                     0.00     0.00
	  8 systemd-detect-          0.00     0.00
	  6 llvm-link                0.00     0.00
	  5 phoronix-test-s          0.00     0.00
	  4 gmain                    0.00     0.00
	  3 hmmer                    0.00     0.00
	  2 cc                       0.00     0.00
	  2 lscpu                    0.00     0.00
	  2 uname                    0.00     0.00
	  2 which                    0.00     0.00
	  2 xset                     0.00     0.00
	  1 date                     0.00     0.00
	  1 dconf worker             0.00     0.00
	  1 dirname                  0.00     0.00
	  1 dmesg                    0.00     0.00
	  1 dmidecode                0.00     0.00
	  1 grep                     0.00     0.00
	  1 ifconfig                 0.00     0.00
	  1 ip                       0.00     0.00
	  1 lsmod                    0.00     0.00
	  1 mktemp                   0.00     0.00
	  1 qdbus                    0.00     0.00
	  1 readlink                 0.00     0.00
	  1 realpath                 0.00     0.00
	  1 sed                      0.00     0.00
	  1 sort                     0.00     0.00
	  1 stty                     0.00     0.00
	  1 systemctl                0.00     0.00
	  1 template.sh              0.00     0.00
	  1 wc                       0.00     0.00
	  1 xrandr                   0.00     0.00
0 processes running
47 maximum processes

The computation blocks show starts of these quick threads

      1600545) hmmer            cpu=0 start=5.74  finish=110.22
        1600546) mpirun           cpu=7 start=5.74  finish=110.19
          1600549) mpirun           cpu=12 start=6.33  finish=110.19
          1600550) mpirun           cpu=13 start=6.33  finish=6.33 
          1600551) mpirun           cpu=10 start=6.35  finish=110.19
          1600552) mpirun           cpu=12 start=6.83  finish=110.19
          1600553) mpirun           cpu=1 start=6.83  finish=110.19
          1600554) hmmsearch        cpu=9 start=6.86  finish=109.62
            1600560) hmmsearch        cpu=3 start=6.89  finish=6.89 
            1600561) hmmsearch        cpu=1 start=6.89  finish=6.89 
            1600568) hmmsearch        cpu=14 start=6.90  finish=6.90 
            1600569) hmmsearch        cpu=9 start=6.90  finish=6.90 
            1600588) hmmsearch        cpu=9 start=6.92  finish=6.92 
            1600589) hmmsearch        cpu=2 start=6.92  finish=6.92 
            1600608) hmmsearch        cpu=4 start=6.94  finish=6.94 
            1600609) hmmsearch        cpu=15 start=6.94  finish=6.94 
            1600624) hmmsearch        cpu=13 start=6.96  finish=6.96 
            1600625) hmmsearch        cpu=15 start=6.96  finish=6.96 
            1600634) hmmsearch        cpu=13 start=6.97  finish=6.97 
            1600635) hmmsearch        cpu=15 start=6.97  finish=6.97 
            1600648) hmmsearch        cpu=15 start=6.99  finish=6.99 
            1600649) hmmsearch        cpu=13 start=6.99  finish=6.99 
            1600656) hmmsearch        cpu=15 start=7.00  finish=7.00 
            1600657) hmmsearch        cpu=13 start=7.00  finish=7.00 
            1600682) hmmsearch        cpu=6 start=7.02  finish=7.02 
            1600683) hmmsearch        cpu=5 start=7.02  finish=7.02 
            1600702) hmmsearch        cpu=0 start=7.04  finish=7.04 
            1600703) hmmsearch        cpu=5 start=7.04  finish=7.04 
            1600722) hmmsearch        cpu=14 start=7.07  finish=7.07 
            1600723) hmmsearch        cpu=0 start=7.07  finish=7.07 
            1600738) hmmsearch        cpu=9 start=7.09  finish=7.09 
            1600739) hmmsearch        cpu=6 start=7.09  finish=7.09 
            1600748) hmmsearch        cpu=12 start=7.11  finish=7.11 
            1600749) hmmsearch        cpu=11 start=7.11  finish=7.11 
            1600762) hmmsearch        cpu=14 start=7.12  finish=7.12 
            1600763) hmmsearch        cpu=11 start=7.12  finish=7.12 
            1600774) hmmsearch        cpu=6 start=7.14  finish=7.14 
            1600775) hmmsearch        cpu=11 start=7.14  finish=7.14 
            1600788) hmmsearch        cpu=13 start=7.15  finish=7.15 
            1600789) hmmsearch        cpu=3 start=7.15  finish=7.15 
            1600822) hmmsearch        cpu=11 start=7.20  finish=7.20 
            1600823) hmmsearch        cpu=4 start=7.20  finish=7.20 
            1600836) hmmsearch        cpu=12 start=7.21  finish=7.21 
            1600837) hmmsearch        cpu=3 start=7.21  finish=7.21 
            1600848) hmmsearch        cpu=13 start=7.22  finish=7.22 
            1600849) hmmsearch        cpu=4 start=7.22  finish=7.22 
            1600858) hmmsearch        cpu=12 start=7.24  finish=7.24 
            1600859) hmmsearch        cpu=1 start=7.24  finish=7.24 
            1600876) hmmsearch        cpu=13 start=7.26  finish=7.26 
            1600877) hmmsearch        cpu=4 start=7.26  finish=7.26 
            1600892) hmmsearch        cpu=5 start=7.28  finish=7.28 
            1600893) hmmsearch        cpu=4 start=7.28  finish=7.28 
            1600902) hmmsearch        cpu=12 start=7.29  finish=7.29 
            1600903) hmmsearch        cpu=5 start=7.29  finish=7.29 
            1600924) hmmsearch        cpu=13 start=7.32  finish=7.32 
            1600925) hmmsearch        cpu=12 start=7.32  finish=7.32 
            1600934) hmmsearch        cpu=5 start=7.34  finish=7.34 
            1600935) hmmsearch        cpu=12 start=7.34  finish=7.34