Benchmarking the sqlite database with five workloads that vary the number of threads operating from 1 to 16 in powers of 2. The number of runable processes only gets to five below.

Topdown profile shows a workload dominated by frontend stalls and with a low retirement rate.

AMD metrics show less than one core of on-cpu on average. There is a moderately high L2 access and L2 miss rate but a low set of memory stalls. There is little floating point code.
elapsed 374.638
on_cpu 0.040 # 0.64 / 16 cores
utime 27.590
stime 212.440
nvcsw 6203372 # 83.77%
nivcsw 1202158 # 16.23%
inblock 0 # 0.00/sec
onblock 33573736 # 89616.45/sec
cpu-clock 234773842345 # 234.774 seconds
task-clock 237874612156 # 237.875 seconds
page faults 322667 # 1356.458/sec
context switches 7406386 # 31135.672/sec
cpu migrations 373657 # 1570.815/sec
major page faults 15 # 0.063/sec
minor page faults 322652 # 1356.395/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 73382349376 # 204.985 branches per 1000 inst
branch misses 9183182583 # 12.51% branch miss
conditional 39856833656 # 111.335 conditional branches per 1000 inst
indirect 735923949 # 2.056 indirect branches per 1000 inst
cpu-cycles 503897140569 # 0.08 GHz
instructions 362772973215 # 0.72 IPC
slots 985859494440 #
retiring 132024456157 # 13.4% (13.6%) low
-- ucode 675731578 # 0.1%
-- fastpath 131348724579 # 13.3%
frontend 743024582218 # 75.4% (76.3%) high
-- latency 632192389680 # 64.1%
-- bandwidth 110832192538 # 11.2%
backend 85070860456 # 8.6% ( 8.7%) low
-- cpu 24113210052 # 2.4%
-- memory 60957650404 # 6.2%
speculation 13412065688 # 1.4% ( 1.4%)
-- branch mispredict 13392597513 # 1.4%
-- pipeline restart 19468175 # 0.0%
smt-contention 12224270726 # 1.2% ( 0.0%)
cpu-cycles 503154312781 # 0.08 GHz
instructions 362440359916 # 0.72 IPC
instructions 117775660458 # 112.538 l2 access per 1000 inst
l2 hit from l1 12244170605 # 32.61% l2 miss
l2 miss from l1 3748255784 #
l2 hit from l2 pf 436472980 #
l3 hit from l2 pf 549331690 #
l3 miss from l2 pf 24246945 #
instructions 117838963317 # 11.287 float per 1000 inst
float 512 339 # 0.000 AVX-512 per 1000 inst
float 256 572 # 0.000 AVX-256 per 1000 inst
float 128 1330103949 # 11.287 AVX-128 per 1000 inst
float MMX 0 # 0.000 MMX per 1000 inst
float scalar 5 # 0.000 scalar per 1000 inst
Intel metrics
elapsed 948.364
on_cpu 0.038 # 0.61 / 16 cores
utime 84.072
stime 496.014
nvcsw 6863881 # 89.83%
nivcsw 776946 # 10.17%
inblock 0 # 0.00/sec
onblock 33562496 # 35389.88/sec
cpu-clock 565310791203 # 565.311 seconds
task-clock 572699669326 # 572.700 seconds
page faults 312550 # 545.749/sec
context switches 7644694 # 13348.522/sec
cpu migrations 1472792 # 2571.666/sec
major page faults 14 # 0.024/sec
minor page faults 312536 # 545.724/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 61483148417 # 177.932 branches per 1000 inst
branch misses 1319964232 # 2.15% branch miss
conditional 61483187617 # 177.933 conditional branches per 1000 inst
indirect 11339675159 # 32.817 indirect branches per 1000 inst
slots 1072578969920 #
retiring 226365385224 # 21.1% (21.1%)
-- ucode 37992131912 # 3.5%
-- fastpath 188373253312 # 17.6%
frontend 505370446250 # 47.1% (47.1%) high
-- latency 364473530709 # 34.0%
-- bandwidth 140896915541 # 13.1%
backend 267722977027 # 25.0% (25.0%)
-- cpu 129424001415 # 12.1%
-- memory 138298975612 # 12.9%
speculation 101927658915 # 9.5% ( 9.5%)
-- branch mispredict 96582518582 # 9.0%
-- pipeline restart 5345140333 # 0.5%
smt-contention 0 # 0.0% ( 0.0%)
cpu-cycles 364913872444 # 0.02 GHz
instructions 390403751528 # 1.07 IPC
l2 access 20396866754 # 92.720 l2 access per 1000 inst
l2 miss 5816501509 # 28.52% l2 miss
Process overview shows the test overhead is almost as much user time as the workload, though there is a much higher amount of system time. Interesting to drill deeper to see where that system time goes.
1205 processes
372 sqlite3 25.15 187.90
68 clinfo 17.18 8.65
38 vulkaninfo 1.71 0.95
4 vulkani:disk$0 0.18 0.10
6 glxinfo:gdrv0 0.16 0.03
6 glxinfo:gl0 0.16 0.03
6 php 0.13 0.12
2 llvmpipe-0 0.09 0.05
2 llvmpipe-1 0.09 0.05
2 llvmpipe-10 0.09 0.05
2 llvmpipe-11 0.09 0.05
2 llvmpipe-12 0.09 0.05
2 llvmpipe-13 0.09 0.05
2 llvmpipe-14 0.09 0.05
2 llvmpipe-15 0.09 0.05
2 llvmpipe-2 0.09 0.05
2 llvmpipe-3 0.09 0.05
2 llvmpipe-4 0.09 0.05
2 llvmpipe-5 0.09 0.05
2 llvmpipe-6 0.09 0.05
2 llvmpipe-7 0.09 0.05
2 llvmpipe-8 0.09 0.05
2 llvmpipe-9 0.09 0.05
2 glxinfo 0.09 0.01
2 glxinfo:cs0 0.08 0.01
2 glxinfo:disk$0 0.08 0.01
2 glxinfo:sh0 0.08 0.01
2 glxinfo:shlo0 0.08 0.01
6 clang 0.07 0.05
3 rocminfo 0.03 0.00
1 lspci 0.01 0.02
1 ps 0.00 0.01
292 cat 0.00 0.00
111 sh 0.00 0.00
108 sqlite-benchmar 0.00 0.00
20 bash 0.00 0.00
20 rm 0.00 0.00
15 seq 0.00 0.00
13 gcc 0.00 0.00
9 gsettings 0.00 0.00
9 stat 0.00 0.00
8 systemd-detect- 0.00 0.00
6 llvm-link 0.00 0.00
5 phoronix-test-s 0.00 0.00
4 gmain 0.00 0.00
3 dconf worker 0.00 0.00
2 cc 0.00 0.00
2 lscpu 0.00 0.00
2 uname 0.00 0.00
2 which 0.00 0.00
2 xset 0.00 0.00
1 date 0.00 0.00
1 dirname 0.00 0.00
1 dmesg 0.00 0.00
1 dmidecode 0.00 0.00
1 grep 0.00 0.00
1 ifconfig 0.00 0.00
1 ip 0.00 0.00
1 lsmod 0.00 0.00
1 mktemp 0.00 0.00
1 qdbus 0.00 0.00
1 readlink 0.00 0.00
1 realpath 0.00 0.00
1 sed 0.00 0.00
1 sort 0.00 0.00
1 stty 0.00 0.00
1 systemctl 0.00 0.00
1 template.sh 0.00 0.00
1 wc 0.00 0.00
1 xrandr 0.00 0.00
0 processes running
58 maximum processes
Computation blocks are as follows
20404) sqlite-benchmar cpu=8 start=5.57 finish=13.19
20405) cat cpu=1 start=5.58 finish=5.58
20406) seq cpu=2 start=5.58 finish=5.58
20407) sqlite-benchmar cpu=4 start=5.58 finish=13.19
20408) sqlite3 cpu=13 start=5.58 finish=5.59
20409) cat cpu=6 start=5.59 finish=7.45
20410) sqlite3 cpu=10 start=5.59 finish=8.07
20411) cat cpu=13 start=8.07 finish=10.01
20412) sqlite3 cpu=12 start=8.07 finish=10.63
20413) cat cpu=13 start=10.63 finish=12.57
20414) sqlite3 cpu=14 start=10.63 finish=13.19
