botan is a cryptography library. There are six workloads for different algorithms. Looks like they are all single-threaded

Topdown profile shows variations among the workloads with retirement rates in the 70s for two workloads and backend stalls limiting other workloads. Frontend stalls look to be low except for the second workload and brief time at start of each run.

AMD metrics confirm this is single-threaded with almost no L2 access. Backend limitation is split between cpu and memory. There is a light amount of floating point

elapsed              669.399
on_cpu               0.053          # 0.84 / 16 cores
utime                562.839
stime                0.988
nvcsw                2105           # 44.72%
nivcsw               2602           # 55.28%
inblock              0              # 0.00/sec
onblock              16152          # 24.13/sec
cpu-clock            563924791974   # 563.925 seconds
task-clock           563932739166   # 563.933 seconds
page faults          166765         # 295.718/sec
context switches     7838           # 13.899/sec
cpu migrations       367            # 0.651/sec
major page faults    2              # 0.004/sec
minor page faults    166763         # 295.714/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             139814477227   # 18.870 branches per 1000 inst
branch misses        203665490      # 0.15% branch miss
conditional          97115995872    # 13.107 conditional branches per 1000 inst
indirect             13663343420    # 1.844 indirect branches per 1000 inst
cpu-cycles           2610654455476  # 0.24 GHz
instructions         7394582566327  # 2.83 IPC
slots                5229705578412  #
retiring             2826810653088  # 54.1% (54.1%) high
-- ucode             8993481439     #     0.2%
-- fastpath          2817817171649  #    53.9%
frontend             325117530676   #  6.2% ( 6.2%)
-- latency           280235511756   #     5.4%
-- bandwidth         44882018920    #     0.9%
backend              2072076508020  # 39.6% (39.6%)
-- cpu               847158851484   #    16.2%
-- memory            1224917656536  #    23.4%
speculation          5536343464     #  0.1% ( 0.1%) low
-- branch mispredict 4117725817     #     0.1%
-- pipeline restart  1418617647     #     0.0%
smt-contention       164221719      #  0.0% ( 0.0%)
cpu-cycles           3878464228704  # 0.25 GHz
instructions         10719706960041 # 2.76 IPC
instructions         3578967705036  # 0.116 l2 access per 1000 inst
l2 hit from l1       394836268      # 6.20% l2 miss
l2 miss from l1      15707758       #
l2 hit from l2 pf    11116615       #
l3 hit from l2 pf    4545213        #
l3 miss from l2 pf   5561337        #
instructions         3572012090278  # 98.084 float per 1000 inst
float 512            86             # 0.000 AVX-512 per 1000 inst
float 256            664            # 0.000 AVX-256 per 1000 inst
float 128            350357518379   # 98.084 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         0              # 0.000 scalar per 1000 inst
instructions         2688067        #
opcache              984278         # 366.166 opcache per 1000 inst
opcache miss         525590         # 53.4% opcache miss rate
l1 dTLB miss         6562           # 2.441 L1 dTLB per 1000 inst
l2 dTLB miss         1164           # 0.433 L2 dTLB per 1000 inst
instructions         2703815        #
icache               1321789        # 488.861 icache per 1000 inst
icache miss          112783         #  8.5% icache miss rate
l1 iTLB miss         10             # 0.004 L1 iTLB per 1000 inst
l2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst
tlb flush            19             # 0.007 TLB flush per 1000 inst

Intel metrics confirm memory access is minimal.

elapsed              660.876
on_cpu               0.053          # 0.84 / 16 cores
utime                556.986
stime                0.629
nvcsw                2047           # 42.81%
nivcsw               2735           # 57.19%
inblock              10496          # 15.88/sec
onblock              4800           # 7.26/sec
cpu-clock            557684948455   # 557.685 seconds
task-clock           557691700152   # 557.692 seconds
page faults          156021         # 279.762/sec
context switches     7880           # 14.130/sec
cpu migrations       392            # 0.703/sec
major page faults    52             # 0.093/sec
minor page faults    155969         # 279.669/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             122073618847   # 19.314 branches per 1000 inst
branch misses        175355007      # 0.14% branch miss
conditional          122073632095   # 19.314 conditional branches per 1000 inst
indirect             12056510177    # 1.908 indirect branches per 1000 inst
slots                12677475659618 #
retiring             6680431127271  # 52.7% (52.7%)
-- ucode             450464298592   #     3.6%
-- fastpath          6229966828679  #    49.1%
frontend             121467626633   #  1.0% ( 1.0%) low
-- latency           37175507861    #     0.3%
-- bandwidth         84292118772    #     0.7%
backend              5843301211811  # 46.1% (46.1%)
-- cpu               5372679687169  #    42.4%
-- memory            470621524642   #     3.7%
speculation          27230529716    #  0.2% ( 0.2%) low
-- branch mispredict 27046269202    #     0.2%
-- pipeline restart  184260514      #     0.0%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           2113556449501  # 0.20 GHz
instructions         6324122251604  # 2.99 IPC
l2 access            340667126      # 0.054 l2 access per 1000 inst
l2 miss              103730719      # 30.45% l2 miss
cpu-cycles           2113192432479  #  4.9% memory latency
load stalls          102748763011   #  4.8% l1 bound
l1 miss              1018223889     #  0.0% l2 bound
l2 miss              480434465      #  0.0% l3 bound
l3 miss              314612452      #  0.0% dram bound
store_stalls         122368374      #  0.0% store bound

Process summary shows botan is the primary process

394 processes
	 36 botan                  561.84     0.00
	 68 clinfo                  20.17     3.25
	 38 vulkaninfo               0.76     1.52
	  6 glxinfo:gdrv0            0.15     0.04
	  6 glxinfo:gl0              0.15     0.04
	  6 php                      0.08     0.21
	  4 vulkani:disk$0           0.08     0.16
	  2 glxinfo                  0.07     0.02
	  2 glxinfo:cs0              0.07     0.02
	  2 glxinfo:disk$0           0.07     0.02
	  2 glxinfo:sh0              0.07     0.02
	  2 glxinfo:shlo0            0.07     0.02
	  6 clang                    0.04     0.08
	  2 llvmpipe-0               0.04     0.08
	  2 llvmpipe-1               0.04     0.08
	  2 llvmpipe-10              0.04     0.08
	  2 llvmpipe-11              0.04     0.08
	  2 llvmpipe-12              0.04     0.08
	  2 llvmpipe-13              0.04     0.08
	  2 llvmpipe-14              0.04     0.08
	  2 llvmpipe-15              0.04     0.08
	  2 llvmpipe-2               0.04     0.08
	  2 llvmpipe-3               0.04     0.08
	  2 llvmpipe-4               0.04     0.08
	  2 llvmpipe-5               0.04     0.08
	  2 llvmpipe-6               0.04     0.08
	  2 llvmpipe-7               0.04     0.08
	  2 llvmpipe-8               0.04     0.08
	  2 llvmpipe-9               0.04     0.08
	  3 rocminfo                 0.03     0.00
	  1 lspci                    0.00     0.02
	  1 ps                       0.00     0.01
	 92 sh                       0.00     0.00
	 13 gcc                      0.00     0.00
	 12 gsettings                0.00     0.00
	  8 stat                     0.00     0.00
	  8 systemd-detect-          0.00     0.00
	  6 llvm-link                0.00     0.00
	  5 phoronix-test-s          0.00     0.00
	  3 gmain                    0.00     0.00
	  2 cc                       0.00     0.00
	  2 lscpu                    0.00     0.00
	  2 uname                    0.00     0.00
	  2 which                    0.00     0.00
	  2 xset                     0.00     0.00
	  1 date                     0.00     0.00
	  1 dconf worker             0.00     0.00
	  1 dirname                  0.00     0.00
	  1 dmesg                    0.00     0.00
	  1 dmidecode                0.00     0.00
	  1 grep                     0.00     0.00
	  1 ifconfig                 0.00     0.00
	  1 ip                       0.00     0.00
	  1 lsmod                    0.00     0.00
	  1 mktemp                   0.00     0.00
	  1 qdbus                    0.00     0.00
	  1 readlink                 0.00     0.00
	  1 realpath                 0.00     0.00
	  1 sed                      0.00     0.00
	  1 sort                     0.00     0.00
	  1 stty                     0.00     0.00
	  1 systemctl                0.00     0.00
	  1 template.sh              0.00     0.00
	  1 wc                       0.00     0.00
	  1 xrandr                   0.00     0.00
0 processes running
47 maximum processes

Computation structure is simple

  723566) botan            cpu=1 start=5.75  finish=35.85
    723567) botan            cpu=6 start=5.75  finish=35.85
  723569) botan            cpu=6 start=39.85 finish=69.95
    723570) botan            cpu=7 start=39.86 finish=69.95
  723571) botan            cpu=13 start=73.96 finish=104.06
    723572) botan            cpu=14 start=73.96 finish=104.06
  723573) sh               cpu=14 start=104.06 finish=104.06
    723574) sh               cpu=7 start=104.06 finish=104.06