blosc is a data store library for C that compresses binary data. This runs 18 different workloads with a variety of buffer sizes and algorithms. Looks like these run moderately quickly with a variable number of threads but single-threaded is most common.

Topdown profile also shows metrics smeared across with some segments of almost 90% frontend or backend stalls and the retirement rate also variable. There also seem to be occasional stripes of downward retirement rates. Some of this likely easier to see with fewer than the 18 test cases,

AMD metrics provide a composite of numbers above. On average two cores are kept busy. There are a high rate of page faults. The average retirement rate is high and backend memory stalls are the largest culprit. The is some floating point code and some L2 access.

elapsed              987.760
on_cpu               0.162          # 2.58 / 16 cores
utime                2314.586
stime                238.219
nvcsw                4504420        # 99.74%
nivcsw               11524          # 0.26%
inblock              0              # 0.00/sec
onblock              25688          # 26.01/sec
cpu-clock            2556766690317  # 2556.767 seconds
task-clock           2558049210877  # 2558.049 seconds
page faults          120445947      # 47085.078/sec
context switches     4520560        # 1767.190/sec
cpu migrations       79117          # 30.929/sec
major page faults    2              # 0.001/sec
minor page faults    120445945      # 47085.077/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             1709968636406  # 153.759 branches per 1000 inst
branch misses        22626348307    # 1.32% branch miss
conditional          1548644588767  # 139.253 conditional branches per 1000 inst
indirect             1238978398     # 0.111 indirect branches per 1000 inst
cpu-cycles           10248138945269 # 0.70 GHz
instructions         9924240533970  # 0.97 IPC
slots                20670495263052 #
retiring             3187699511326  # 15.4% (16.1%)
-- ucode             1905874547     #     0.0%
-- fastpath          3185793636779  #    15.4%
frontend             4160988424228  # 20.1% (21.0%)
-- latency           1279637073474  #     6.2%
-- bandwidth         2881351350754  #    13.9%
backend              12378049471540 # 59.9% (62.5%)
-- cpu               2368053514869  #    11.5%
-- memory            10009995956671 #    48.4%
speculation          82492660206    #  0.4% ( 0.4%) low
-- branch mispredict 81067482527    #     0.4%
-- pipeline restart  1425177679     #     0.0%
smt-contention       861159583090   #  4.2% ( 0.0%)
cpu-cycles           10255907734990 # 0.71 GHz
instructions         9936437247179  # 0.97 IPC
instructions         3320711390252  # 85.963 l2 access per 1000 inst
l2 hit from l1       180588408594   # 17.83% l2 miss
l2 miss from l1      9329067623     #
l2 hit from l2 pf    63289259950    #
l3 hit from l2 pf    14279437545    #
l3 miss from l2 pf   27299697284    #
instructions         3308296933757  # 89.615 float per 1000 inst
float 512            128            # 0.000 AVX-512 per 1000 inst
float 256            420            # 0.000 AVX-256 per 1000 inst
float 128            296472821770   # 89.615 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         0              # 0.000 scalar per 1000 inst
instructions         2690265        #
opcache              1005701        # 373.830 opcache per 1000 inst
opcache miss         537919         # 53.5% opcache miss rate
l1 dTLB miss         5199           # 1.933 L1 dTLB per 1000 inst
l2 dTLB miss         1138           # 0.423 L2 dTLB per 1000 inst
instructions         2718101        #
icache               1334928        # 491.125 icache per 1000 inst
icache miss          112953         #  8.5% icache miss rate
l1 iTLB miss         9              # 0.003 L1 iTLB per 1000 inst
l2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst
tlb flush            19             # 0.007 TLB flush per 1000 inst

Intel metrics give more clues with L3 and dram the largest contributors on the memory stalls.

elapsed              1185.087
on_cpu               0.224          # 3.59 / 16 cores
utime                4036.438
stime                217.599
nvcsw                3545389        # 97.41%
nivcsw               94223          # 2.59%
inblock              3232           # 2.73/sec
onblock              13408          # 11.31/sec
cpu-clock            4235117712182  # 4235.118 seconds
task-clock           4238670444667  # 4238.670 seconds
page faults          113600453      # 26800.964/sec
context switches     3645262        # 860.001/sec
cpu migrations       236530         # 55.803/sec
major page faults    9              # 0.002/sec
minor page faults    113600444      # 26800.962/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             1565019616139  # 149.293 branches per 1000 inst
branch misses        2447114186     # 0.16% branch miss
conditional          1565019865131  # 149.293 conditional branches per 1000 inst
indirect             205635023039   # 19.616 indirect branches per 1000 inst
slots                21419426156342 #
retiring             7365886196434  # 34.4% (34.4%)
-- ucode             335345158142   #     1.6%
-- fastpath          7030541038292  #    32.8%
frontend             1845138431656  #  8.6% ( 8.6%)
-- latency           807731404302   #     3.8%
-- bandwidth         1037407027354  #     4.8%
backend              11990871392273 # 56.0% (56.0%)
-- cpu               1992105550930  #     9.3%
-- memory            9998765841343  #    46.7%
speculation          328808935666   #  1.5% ( 1.5%)
-- branch mispredict 240387547131   #     1.1%
-- pipeline restart  88421388535    #     0.4%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           7975989603086  # 0.42 GHz
instructions         15222097361603 # 1.91 IPC
l2 access            466905689925   # 60.108 l2 access per 1000 inst
l2 miss              209008506469   # 44.76% l2 miss
cpu-cycles           4128004411313  # 51.2% memory latency
load stalls          1623882878339  #  0.0% l1 bound
l1 miss              1652161569527  # 16.6% l2 bound
l2 miss              966736683768   #  7.7% l3 bound
l3 miss              649440877367   # 15.7% dram bound
store_stalls         488044403173   # 11.8% store bound

Process overview says b2bench is the primary driver.

7780 processes
	7344 b2bench              170576.84 19068.19
	 68 clinfo                  17.20     5.32
	 38 vulkaninfo               0.95     1.33
	  6 php                      0.16     0.42
	  6 glxinfo:gdrv0            0.12     0.06
	  6 glxinfo:gl0              0.12     0.06
	  4 vulkani:disk$0           0.10     0.14
	  2 glxinfo                  0.06     0.02
	  2 glxinfo:cs0              0.06     0.02
	  2 glxinfo:disk$0           0.06     0.02
	  2 glxinfo:sh0              0.06     0.02
	  2 glxinfo:shlo0            0.06     0.02
	  6 clang                    0.05     0.07
	  2 llvmpipe-0               0.05     0.07
	  2 llvmpipe-1               0.05     0.07
	  2 llvmpipe-10              0.05     0.07
	  2 llvmpipe-11              0.05     0.07
	  2 llvmpipe-12              0.05     0.07
	  2 llvmpipe-13              0.05     0.07
	  2 llvmpipe-14              0.05     0.07
	  2 llvmpipe-15              0.05     0.07
	  2 llvmpipe-2               0.05     0.07
	  2 llvmpipe-3               0.05     0.07
	  2 llvmpipe-4               0.05     0.07
	  2 llvmpipe-5               0.05     0.07
	  2 llvmpipe-6               0.05     0.07
	  2 llvmpipe-7               0.05     0.07
	  2 llvmpipe-8               0.05     0.07
	  2 llvmpipe-9               0.05     0.07
	  3 rocminfo                 0.03     0.00
	  1 lspci                    0.00     0.02
	  1 ps                       0.00     0.01
	116 sh                       0.00     0.00
	 54 blosc                    0.00     0.00
	 13 gcc                      0.00     0.00
	  9 gsettings                0.00     0.00
	  8 stat                     0.00     0.00
	  8 systemd-detect-          0.00     0.00
	  6 llvm-link                0.00     0.00
	  5 gmain                    0.00     0.00
	  5 phoronix-test-s          0.00     0.00
	  2 cc                       0.00     0.00
	  2 dconf worker             0.00     0.00
	  2 lscpu                    0.00     0.00
	  2 uname                    0.00     0.00
	  2 which                    0.00     0.00
	  2 xset                     0.00     0.00
	  1 date                     0.00     0.00
	  1 dirname                  0.00     0.00
	  1 dmesg                    0.00     0.00
	  1 dmidecode                0.00     0.00
	  1 grep                     0.00     0.00
	  1 ifconfig                 0.00     0.00
	  1 ip                       0.00     0.00
	  1 lsmod                    0.00     0.00
	  1 mktemp                   0.00     0.00
	  1 qdbus                    0.00     0.00
	  1 readlink                 0.00     0.00
	  1 realpath                 0.00     0.00
	  1 sed                      0.00     0.00
	  1 sort                     0.00     0.00
	  1 stty                     0.00     0.00
	  1 systemctl                0.00     0.00
	  1 template.sh              0.00     0.00
	  1 wc                       0.00     0.00
	  1 xrandr                   0.00     0.00
0 processes running
47 maximum processes

Computation structures have many threads started

      710701) blosc            cpu=8 start=5.71  finish=12.38
        710702) b2bench          cpu=0 start=5.71  finish=12.37
          710703) b2bench          cpu=3 start=6.80  finish=7.34 
          710704) b2bench          cpu=4 start=6.80  finish=7.34 
          710705) b2bench          cpu=13 start=7.34  finish=7.79 
          710706) b2bench          cpu=7 start=7.35  finish=7.79 
          710707) b2bench          cpu=14 start=7.35  finish=7.79 
          710708) b2bench          cpu=11 start=7.79  finish=8.17 
          710709) b2bench          cpu=4 start=7.79  finish=8.17 
          710710) b2bench          cpu=8 start=7.79  finish=8.16 
          710711) b2bench          cpu=1 start=7.79  finish=8.17 
          710714) b2bench          cpu=5 start=8.17  finish=8.53 
          710715) b2bench          cpu=6 start=8.17  finish=8.53 
          710716) b2bench          cpu=15 start=8.17  finish=8.53 
          710717) b2bench          cpu=3 start=8.17  finish=8.53 
          710718) b2bench          cpu=4 start=8.17  finish=8.53 
          710719) b2bench          cpu=9 start=8.53  finish=8.89 
          710720) b2bench          cpu=7 start=8.53  finish=8.89 
          710721) b2bench          cpu=4 start=8.53  finish=8.89 
          710722) b2bench          cpu=5 start=8.53  finish=8.89 
          710723) b2bench          cpu=8 start=8.53  finish=8.89 
          710724) b2bench          cpu=6 start=8.53  finish=8.89 
          710725) b2bench          cpu=11 start=8.89  finish=9.24 
          710726) b2bench          cpu=4 start=8.89  finish=9.24 
          710727) b2bench          cpu=8 start=8.89  finish=9.24 
          710728) b2bench          cpu=12 start=8.89  finish=9.24 
          710729) b2bench          cpu=6 start=8.89  finish=9.24 
          710730) b2bench          cpu=1 start=8.89  finish=9.24 
          710731) b2bench          cpu=5 start=8.89  finish=9.24 
          710732) b2bench          cpu=11 start=9.24  finish=9.56 
          710733) b2bench          cpu=10 start=9.24  finish=9.56 
          710734) b2bench          cpu=13 start=9.24  finish=9.56 
          710735) b2bench          cpu=6 start=9.24  finish=9.56 
          710736) b2bench          cpu=7 start=9.24  finish=9.56 
          710737) b2bench          cpu=1 start=9.24  finish=9.56 
          710738) b2bench          cpu=8 start=9.24  finish=9.56 
          710739) b2bench          cpu=12 start=9.24  finish=9.56 
          710740) b2bench          cpu=0 start=9.56  finish=9.92 
          710741) b2bench          cpu=5 start=9.56  finish=9.92 
          710742) b2bench          cpu=12 start=9.56  finish=9.92 
          710743) b2bench          cpu=9 start=9.56  finish=9.92 
          710744) b2bench          cpu=10 start=9.56  finish=9.92 
          710745) b2bench          cpu=6 start=9.56  finish=9.91 
          710746) b2bench          cpu=10 start=9.56  finish=9.91 
          710747) b2bench          cpu=15 start=9.56  finish=9.91 
          710748) b2bench          cpu=2 start=9.56  finish=9.92 
          710749) b2bench          cpu=15 start=9.92  finish=10.27
          710750) b2bench          cpu=6 start=9.92  finish=10.27
          710751) b2bench          cpu=12 start=9.92  finish=10.27
          710752) b2bench          cpu=5 start=9.92  finish=10.27
          710753) b2bench          cpu=8 start=9.92  finish=10.27
          710754) b2bench          cpu=1 start=9.92  finish=10.27
          710755) b2bench          cpu=0 start=9.92  finish=10.27
          710756) b2bench          cpu=10 start=9.92  finish=10.27
          710757) b2bench          cpu=3 start=9.92  finish=10.27
          710758) b2bench          cpu=4 start=9.92  finish=10.27
          710759) b2bench          cpu=15 start=10.27 finish=10.62
          710760) b2bench          cpu=3 start=10.27 finish=10.61
          710761) b2bench          cpu=4 start=10.27 finish=10.61
          710762) b2bench          cpu=12 start=10.27 finish=10.61
          710763) b2bench          cpu=10 start=10.27 finish=10.62
          710764) b2bench          cpu=9 start=10.27 finish=10.62
          710765) b2bench          cpu=0 start=10.27 finish=10.61
          710766) b2bench          cpu=11 start=10.28 finish=10.61
          710767) b2bench          cpu=14 start=10.28 finish=10.62
          710768) b2bench          cpu=5 start=10.28 finish=10.61
          710769) b2bench          cpu=1 start=10.28 finish=10.62
          710770) b2bench          cpu=1 start=10.62 finish=10.96
          710771) b2bench          cpu=0 start=10.62 finish=10.96
          710772) b2bench          cpu=1 start=10.62 finish=10.96
          710773) b2bench          cpu=10 start=10.62 finish=10.96
          710774) b2bench          cpu=15 start=10.62 finish=10.96
          710775) b2bench          cpu=3 start=10.62 finish=10.96
          710776) b2bench          cpu=4 start=10.62 finish=10.96
          710777) b2bench          cpu=7 start=10.62 finish=10.96
          710778) b2bench          cpu=11 start=10.62 finish=10.96
          710779) b2bench          cpu=6 start=10.62 finish=10.96
          710780) b2bench          cpu=8 start=10.62 finish=10.96
          710781) b2bench          cpu=12 start=10.62 finish=10.96
          710782) b2bench          cpu=1 start=10.96 finish=11.32
          710783) b2bench          cpu=9 start=10.96 finish=11.32
          710784) b2bench          cpu=10 start=10.96 finish=11.32
          710785) b2bench          cpu=13 start=10.96 finish=11.32
          710786) b2bench          cpu=3 start=10.96 finish=11.32
          710787) b2bench          cpu=12 start=10.96 finish=11.32
          710788) b2bench          cpu=11 start=10.96 finish=11.32
          710789) b2bench          cpu=5 start=10.96 finish=11.32
          710790) b2bench          cpu=8 start=10.96 finish=11.32
          710791) b2bench          cpu=15 start=10.96 finish=11.32
          710792) b2bench          cpu=0 start=10.96 finish=11.32
          710793) b2bench          cpu=4 start=10.96 finish=11.32
          710794) b2bench          cpu=14 start=10.96 finish=11.32
          710795) b2bench          cpu=11 start=11.32 finish=11.67
          710796) b2bench          cpu=8 start=11.32 finish=11.66
          710797) b2bench          cpu=6 start=11.32 finish=11.67
          710798) b2bench          cpu=2 start=11.32 finish=11.67
          710799) b2bench          cpu=5 start=11.32 finish=11.67
          710800) b2bench          cpu=1 start=11.32 finish=11.67
          710801) b2bench          cpu=15 start=11.32 finish=11.67
          710802) b2bench          cpu=12 start=11.32 finish=11.67
          710803) b2bench          cpu=9 start=11.32 finish=11.67
          710804) b2bench          cpu=3 start=11.32 finish=11.67
          710805) b2bench          cpu=14 start=11.32 finish=11.67
          710806) b2bench          cpu=4 start=11.32 finish=11.67
          710807) b2bench          cpu=10 start=11.32 finish=11.67
          710808) b2bench          cpu=7 start=11.32 finish=11.67
          710809) b2bench          cpu=13 start=11.67 finish=12.02
          710810) b2bench          cpu=3 start=11.67 finish=12.02
          710811) b2bench          cpu=15 start=11.67 finish=12.02
          710812) b2bench          cpu=0 start=11.67 finish=12.02
          710813) b2bench          cpu=7 start=11.67 finish=12.02
          710814) b2bench          cpu=14 start=11.67 finish=12.02
          710815) b2bench          cpu=5 start=11.67 finish=12.02
          710816) b2bench          cpu=12 start=11.67 finish=12.02
          710817) b2bench          cpu=9 start=11.67 finish=12.02
          710818) b2bench          cpu=11 start=11.67 finish=12.01
          710819) b2bench          cpu=1 start=11.67 finish=12.02
          710820) b2bench          cpu=2 start=11.67 finish=12.01
          710821) b2bench          cpu=6 start=11.67 finish=12.02
          710822) b2bench          cpu=8 start=11.67 finish=12.02
          710823) b2bench          cpu=4 start=11.67 finish=12.02
          710824) b2bench          cpu=1 start=12.02 finish=12.37
          710825) b2bench          cpu=11 start=12.02 finish=12.37
          710826) b2bench          cpu=0 start=12.02 finish=12.37
          710827) b2bench          cpu=6 start=12.02 finish=12.37
          710828) b2bench          cpu=5 start=12.02 finish=12.37
          710829) b2bench          cpu=15 start=12.02 finish=12.37
          710830) b2bench          cpu=14 start=12.02 finish=12.37
          710831) b2bench          cpu=8 start=12.02 finish=12.37
          710832) b2bench          cpu=12 start=12.02 finish=12.37
          710833) b2bench          cpu=10 start=12.02 finish=12.37
          710834) b2bench          cpu=13 start=12.02 finish=12.37
          710835) b2bench          cpu=7 start=12.02 finish=12.37
          710836) b2bench          cpu=3 start=12.02 finish=12.37
          710837) b2bench          cpu=1 start=12.02 finish=12.37
          710838) b2bench          cpu=2 start=12.02 finish=12.37
          710839) b2bench          cpu=4 start=12.02 finish=12.37