blosc is a data store library for C that compresses binary data. This runs 18 different workloads with a variety of buffer sizes and algorithms. Looks like these run moderately quickly with a variable number of threads but single-threaded is most common.

Topdown profile also shows metrics smeared across with some segments of almost 90% frontend or backend stalls and the retirement rate also variable. There also seem to be occasional stripes of downward retirement rates. Some of this likely easier to see with fewer than the 18 test cases,

AMD metrics provide a composite of numbers above. On average two cores are kept busy. There are a high rate of page faults. The average retirement rate is high and backend memory stalls are the largest culprit. The is some floating point code and some L2 access.
elapsed 987.760
on_cpu 0.162 # 2.58 / 16 cores
utime 2314.586
stime 238.219
nvcsw 4504420 # 99.74%
nivcsw 11524 # 0.26%
inblock 0 # 0.00/sec
onblock 25688 # 26.01/sec
cpu-clock 2556766690317 # 2556.767 seconds
task-clock 2558049210877 # 2558.049 seconds
page faults 120445947 # 47085.078/sec
context switches 4520560 # 1767.190/sec
cpu migrations 79117 # 30.929/sec
major page faults 2 # 0.001/sec
minor page faults 120445945 # 47085.077/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 1709968636406 # 153.759 branches per 1000 inst
branch misses 22626348307 # 1.32% branch miss
conditional 1548644588767 # 139.253 conditional branches per 1000 inst
indirect 1238978398 # 0.111 indirect branches per 1000 inst
cpu-cycles 10248138945269 # 0.70 GHz
instructions 9924240533970 # 0.97 IPC
slots 20670495263052 #
retiring 3187699511326 # 15.4% (16.1%)
-- ucode 1905874547 # 0.0%
-- fastpath 3185793636779 # 15.4%
frontend 4160988424228 # 20.1% (21.0%)
-- latency 1279637073474 # 6.2%
-- bandwidth 2881351350754 # 13.9%
backend 12378049471540 # 59.9% (62.5%)
-- cpu 2368053514869 # 11.5%
-- memory 10009995956671 # 48.4%
speculation 82492660206 # 0.4% ( 0.4%) low
-- branch mispredict 81067482527 # 0.4%
-- pipeline restart 1425177679 # 0.0%
smt-contention 861159583090 # 4.2% ( 0.0%)
cpu-cycles 10255907734990 # 0.71 GHz
instructions 9936437247179 # 0.97 IPC
instructions 3320711390252 # 85.963 l2 access per 1000 inst
l2 hit from l1 180588408594 # 17.83% l2 miss
l2 miss from l1 9329067623 #
l2 hit from l2 pf 63289259950 #
l3 hit from l2 pf 14279437545 #
l3 miss from l2 pf 27299697284 #
instructions 3308296933757 # 89.615 float per 1000 inst
float 512 128 # 0.000 AVX-512 per 1000 inst
float 256 420 # 0.000 AVX-256 per 1000 inst
float 128 296472821770 # 89.615 AVX-128 per 1000 inst
float MMX 0 # 0.000 MMX per 1000 inst
float scalar 0 # 0.000 scalar per 1000 inst
instructions 2690265 #
opcache 1005701 # 373.830 opcache per 1000 inst
opcache miss 537919 # 53.5% opcache miss rate
l1 dTLB miss 5199 # 1.933 L1 dTLB per 1000 inst
l2 dTLB miss 1138 # 0.423 L2 dTLB per 1000 inst
instructions 2718101 #
icache 1334928 # 491.125 icache per 1000 inst
icache miss 112953 # 8.5% icache miss rate
l1 iTLB miss 9 # 0.003 L1 iTLB per 1000 inst
l2 iTLB miss 0 # 0.000 L2 iTLB per 1000 inst
tlb flush 19 # 0.007 TLB flush per 1000 inst
Intel metrics give more clues with L3 and dram the largest contributors on the memory stalls.
elapsed 1185.087
on_cpu 0.224 # 3.59 / 16 cores
utime 4036.438
stime 217.599
nvcsw 3545389 # 97.41%
nivcsw 94223 # 2.59%
inblock 3232 # 2.73/sec
onblock 13408 # 11.31/sec
cpu-clock 4235117712182 # 4235.118 seconds
task-clock 4238670444667 # 4238.670 seconds
page faults 113600453 # 26800.964/sec
context switches 3645262 # 860.001/sec
cpu migrations 236530 # 55.803/sec
major page faults 9 # 0.002/sec
minor page faults 113600444 # 26800.962/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 1565019616139 # 149.293 branches per 1000 inst
branch misses 2447114186 # 0.16% branch miss
conditional 1565019865131 # 149.293 conditional branches per 1000 inst
indirect 205635023039 # 19.616 indirect branches per 1000 inst
slots 21419426156342 #
retiring 7365886196434 # 34.4% (34.4%)
-- ucode 335345158142 # 1.6%
-- fastpath 7030541038292 # 32.8%
frontend 1845138431656 # 8.6% ( 8.6%)
-- latency 807731404302 # 3.8%
-- bandwidth 1037407027354 # 4.8%
backend 11990871392273 # 56.0% (56.0%)
-- cpu 1992105550930 # 9.3%
-- memory 9998765841343 # 46.7%
speculation 328808935666 # 1.5% ( 1.5%)
-- branch mispredict 240387547131 # 1.1%
-- pipeline restart 88421388535 # 0.4%
smt-contention 0 # 0.0% ( 0.0%)
cpu-cycles 7975989603086 # 0.42 GHz
instructions 15222097361603 # 1.91 IPC
l2 access 466905689925 # 60.108 l2 access per 1000 inst
l2 miss 209008506469 # 44.76% l2 miss
cpu-cycles 4128004411313 # 51.2% memory latency
load stalls 1623882878339 # 0.0% l1 bound
l1 miss 1652161569527 # 16.6% l2 bound
l2 miss 966736683768 # 7.7% l3 bound
l3 miss 649440877367 # 15.7% dram bound
store_stalls 488044403173 # 11.8% store bound
Process overview says b2bench is the primary driver.
7780 processes
7344 b2bench 170576.84 19068.19
68 clinfo 17.20 5.32
38 vulkaninfo 0.95 1.33
6 php 0.16 0.42
6 glxinfo:gdrv0 0.12 0.06
6 glxinfo:gl0 0.12 0.06
4 vulkani:disk$0 0.10 0.14
2 glxinfo 0.06 0.02
2 glxinfo:cs0 0.06 0.02
2 glxinfo:disk$0 0.06 0.02
2 glxinfo:sh0 0.06 0.02
2 glxinfo:shlo0 0.06 0.02
6 clang 0.05 0.07
2 llvmpipe-0 0.05 0.07
2 llvmpipe-1 0.05 0.07
2 llvmpipe-10 0.05 0.07
2 llvmpipe-11 0.05 0.07
2 llvmpipe-12 0.05 0.07
2 llvmpipe-13 0.05 0.07
2 llvmpipe-14 0.05 0.07
2 llvmpipe-15 0.05 0.07
2 llvmpipe-2 0.05 0.07
2 llvmpipe-3 0.05 0.07
2 llvmpipe-4 0.05 0.07
2 llvmpipe-5 0.05 0.07
2 llvmpipe-6 0.05 0.07
2 llvmpipe-7 0.05 0.07
2 llvmpipe-8 0.05 0.07
2 llvmpipe-9 0.05 0.07
3 rocminfo 0.03 0.00
1 lspci 0.00 0.02
1 ps 0.00 0.01
116 sh 0.00 0.00
54 blosc 0.00 0.00
13 gcc 0.00 0.00
9 gsettings 0.00 0.00
8 stat 0.00 0.00
8 systemd-detect- 0.00 0.00
6 llvm-link 0.00 0.00
5 gmain 0.00 0.00
5 phoronix-test-s 0.00 0.00
2 cc 0.00 0.00
2 dconf worker 0.00 0.00
2 lscpu 0.00 0.00
2 uname 0.00 0.00
2 which 0.00 0.00
2 xset 0.00 0.00
1 date 0.00 0.00
1 dirname 0.00 0.00
1 dmesg 0.00 0.00
1 dmidecode 0.00 0.00
1 grep 0.00 0.00
1 ifconfig 0.00 0.00
1 ip 0.00 0.00
1 lsmod 0.00 0.00
1 mktemp 0.00 0.00
1 qdbus 0.00 0.00
1 readlink 0.00 0.00
1 realpath 0.00 0.00
1 sed 0.00 0.00
1 sort 0.00 0.00
1 stty 0.00 0.00
1 systemctl 0.00 0.00
1 template.sh 0.00 0.00
1 wc 0.00 0.00
1 xrandr 0.00 0.00
0 processes running
47 maximum processes
Computation structures have many threads started
710701) blosc cpu=8 start=5.71 finish=12.38
710702) b2bench cpu=0 start=5.71 finish=12.37
710703) b2bench cpu=3 start=6.80 finish=7.34
710704) b2bench cpu=4 start=6.80 finish=7.34
710705) b2bench cpu=13 start=7.34 finish=7.79
710706) b2bench cpu=7 start=7.35 finish=7.79
710707) b2bench cpu=14 start=7.35 finish=7.79
710708) b2bench cpu=11 start=7.79 finish=8.17
710709) b2bench cpu=4 start=7.79 finish=8.17
710710) b2bench cpu=8 start=7.79 finish=8.16
710711) b2bench cpu=1 start=7.79 finish=8.17
710714) b2bench cpu=5 start=8.17 finish=8.53
710715) b2bench cpu=6 start=8.17 finish=8.53
710716) b2bench cpu=15 start=8.17 finish=8.53
710717) b2bench cpu=3 start=8.17 finish=8.53
710718) b2bench cpu=4 start=8.17 finish=8.53
710719) b2bench cpu=9 start=8.53 finish=8.89
710720) b2bench cpu=7 start=8.53 finish=8.89
710721) b2bench cpu=4 start=8.53 finish=8.89
710722) b2bench cpu=5 start=8.53 finish=8.89
710723) b2bench cpu=8 start=8.53 finish=8.89
710724) b2bench cpu=6 start=8.53 finish=8.89
710725) b2bench cpu=11 start=8.89 finish=9.24
710726) b2bench cpu=4 start=8.89 finish=9.24
710727) b2bench cpu=8 start=8.89 finish=9.24
710728) b2bench cpu=12 start=8.89 finish=9.24
710729) b2bench cpu=6 start=8.89 finish=9.24
710730) b2bench cpu=1 start=8.89 finish=9.24
710731) b2bench cpu=5 start=8.89 finish=9.24
710732) b2bench cpu=11 start=9.24 finish=9.56
710733) b2bench cpu=10 start=9.24 finish=9.56
710734) b2bench cpu=13 start=9.24 finish=9.56
710735) b2bench cpu=6 start=9.24 finish=9.56
710736) b2bench cpu=7 start=9.24 finish=9.56
710737) b2bench cpu=1 start=9.24 finish=9.56
710738) b2bench cpu=8 start=9.24 finish=9.56
710739) b2bench cpu=12 start=9.24 finish=9.56
710740) b2bench cpu=0 start=9.56 finish=9.92
710741) b2bench cpu=5 start=9.56 finish=9.92
710742) b2bench cpu=12 start=9.56 finish=9.92
710743) b2bench cpu=9 start=9.56 finish=9.92
710744) b2bench cpu=10 start=9.56 finish=9.92
710745) b2bench cpu=6 start=9.56 finish=9.91
710746) b2bench cpu=10 start=9.56 finish=9.91
710747) b2bench cpu=15 start=9.56 finish=9.91
710748) b2bench cpu=2 start=9.56 finish=9.92
710749) b2bench cpu=15 start=9.92 finish=10.27
710750) b2bench cpu=6 start=9.92 finish=10.27
710751) b2bench cpu=12 start=9.92 finish=10.27
710752) b2bench cpu=5 start=9.92 finish=10.27
710753) b2bench cpu=8 start=9.92 finish=10.27
710754) b2bench cpu=1 start=9.92 finish=10.27
710755) b2bench cpu=0 start=9.92 finish=10.27
710756) b2bench cpu=10 start=9.92 finish=10.27
710757) b2bench cpu=3 start=9.92 finish=10.27
710758) b2bench cpu=4 start=9.92 finish=10.27
710759) b2bench cpu=15 start=10.27 finish=10.62
710760) b2bench cpu=3 start=10.27 finish=10.61
710761) b2bench cpu=4 start=10.27 finish=10.61
710762) b2bench cpu=12 start=10.27 finish=10.61
710763) b2bench cpu=10 start=10.27 finish=10.62
710764) b2bench cpu=9 start=10.27 finish=10.62
710765) b2bench cpu=0 start=10.27 finish=10.61
710766) b2bench cpu=11 start=10.28 finish=10.61
710767) b2bench cpu=14 start=10.28 finish=10.62
710768) b2bench cpu=5 start=10.28 finish=10.61
710769) b2bench cpu=1 start=10.28 finish=10.62
710770) b2bench cpu=1 start=10.62 finish=10.96
710771) b2bench cpu=0 start=10.62 finish=10.96
710772) b2bench cpu=1 start=10.62 finish=10.96
710773) b2bench cpu=10 start=10.62 finish=10.96
710774) b2bench cpu=15 start=10.62 finish=10.96
710775) b2bench cpu=3 start=10.62 finish=10.96
710776) b2bench cpu=4 start=10.62 finish=10.96
710777) b2bench cpu=7 start=10.62 finish=10.96
710778) b2bench cpu=11 start=10.62 finish=10.96
710779) b2bench cpu=6 start=10.62 finish=10.96
710780) b2bench cpu=8 start=10.62 finish=10.96
710781) b2bench cpu=12 start=10.62 finish=10.96
710782) b2bench cpu=1 start=10.96 finish=11.32
710783) b2bench cpu=9 start=10.96 finish=11.32
710784) b2bench cpu=10 start=10.96 finish=11.32
710785) b2bench cpu=13 start=10.96 finish=11.32
710786) b2bench cpu=3 start=10.96 finish=11.32
710787) b2bench cpu=12 start=10.96 finish=11.32
710788) b2bench cpu=11 start=10.96 finish=11.32
710789) b2bench cpu=5 start=10.96 finish=11.32
710790) b2bench cpu=8 start=10.96 finish=11.32
710791) b2bench cpu=15 start=10.96 finish=11.32
710792) b2bench cpu=0 start=10.96 finish=11.32
710793) b2bench cpu=4 start=10.96 finish=11.32
710794) b2bench cpu=14 start=10.96 finish=11.32
710795) b2bench cpu=11 start=11.32 finish=11.67
710796) b2bench cpu=8 start=11.32 finish=11.66
710797) b2bench cpu=6 start=11.32 finish=11.67
710798) b2bench cpu=2 start=11.32 finish=11.67
710799) b2bench cpu=5 start=11.32 finish=11.67
710800) b2bench cpu=1 start=11.32 finish=11.67
710801) b2bench cpu=15 start=11.32 finish=11.67
710802) b2bench cpu=12 start=11.32 finish=11.67
710803) b2bench cpu=9 start=11.32 finish=11.67
710804) b2bench cpu=3 start=11.32 finish=11.67
710805) b2bench cpu=14 start=11.32 finish=11.67
710806) b2bench cpu=4 start=11.32 finish=11.67
710807) b2bench cpu=10 start=11.32 finish=11.67
710808) b2bench cpu=7 start=11.32 finish=11.67
710809) b2bench cpu=13 start=11.67 finish=12.02
710810) b2bench cpu=3 start=11.67 finish=12.02
710811) b2bench cpu=15 start=11.67 finish=12.02
710812) b2bench cpu=0 start=11.67 finish=12.02
710813) b2bench cpu=7 start=11.67 finish=12.02
710814) b2bench cpu=14 start=11.67 finish=12.02
710815) b2bench cpu=5 start=11.67 finish=12.02
710816) b2bench cpu=12 start=11.67 finish=12.02
710817) b2bench cpu=9 start=11.67 finish=12.02
710818) b2bench cpu=11 start=11.67 finish=12.01
710819) b2bench cpu=1 start=11.67 finish=12.02
710820) b2bench cpu=2 start=11.67 finish=12.01
710821) b2bench cpu=6 start=11.67 finish=12.02
710822) b2bench cpu=8 start=11.67 finish=12.02
710823) b2bench cpu=4 start=11.67 finish=12.02
710824) b2bench cpu=1 start=12.02 finish=12.37
710825) b2bench cpu=11 start=12.02 finish=12.37
710826) b2bench cpu=0 start=12.02 finish=12.37
710827) b2bench cpu=6 start=12.02 finish=12.37
710828) b2bench cpu=5 start=12.02 finish=12.37
710829) b2bench cpu=15 start=12.02 finish=12.37
710830) b2bench cpu=14 start=12.02 finish=12.37
710831) b2bench cpu=8 start=12.02 finish=12.37
710832) b2bench cpu=12 start=12.02 finish=12.37
710833) b2bench cpu=10 start=12.02 finish=12.37
710834) b2bench cpu=13 start=12.02 finish=12.37
710835) b2bench cpu=7 start=12.02 finish=12.37
710836) b2bench cpu=3 start=12.02 finish=12.37
710837) b2bench cpu=1 start=12.02 finish=12.37
710838) b2bench cpu=2 start=12.02 finish=12.37
710839) b2bench cpu=4 start=12.02 finish=12.37
