A benchmark of PostgresSQL operations. There are many different test configurations, I picked two that are closest to what Phoronix used in their benchmark article. scaling factor of 100 with 1000 clients. The read operation followed by read/write. Looks like there are mostly a small number of runnable processes through up to 800 appear at one point.

Topdown overview shows a front-end bound process. The timing looks slightly different in this case than previous one and stretching longer. Perhap additional runs to get stability.

AMD metrics show average on-cpu of only 2.5. Some floating point code and many branches. Frontend latency appears to dominate and this would be a good benchmark to dig deeper on latency.

elapsed              863.433
on_cpu               0.157          # 2.52 / 16 cores
utime                459.890
stime                1713.587
nvcsw                12451489       # 8.45%
nivcsw               134887770      # 91.55%
inblock              8              # 0.01/sec
onblock              16504          # 19.11/sec
cpu-clock            11390349332903 # 11390.349 seconds
task-clock           11403070781759 # 11403.071 seconds
page faults          91224025       # 7999.953/sec
context switches     397659971      # 34873.060/sec
cpu migrations       38233072       # 3352.875/sec
major page faults    799            # 0.070/sec
minor page faults    91223226       # 7999.882/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             5355467484433  # 205.853 branches per 1000 inst
branch misses        292739782623   # 5.47% branch miss
conditional          3407698242743  # 130.985 conditional branches per 1000 inst
indirect             117052096702   # 4.499 indirect branches per 1000 inst
cpu-cycles           45652716363402 # 3.31 GHz
instructions         27213399658895 # 0.60 IPC
slots                90040636490742 #
retiring             9410107725001  # 10.5% (11.7%)
-- ucode             44883771195    #     0.0%
-- fastpath          9365223953806  #    10.4%
frontend             45468404154306 # 50.5% (56.6%)
-- latency           37635521319684 #    41.8%
-- bandwidth         7832882834622  #     8.7%
backend              24909401939477 # 27.7% (31.0%)
-- cpu               2220898318464  #     2.5%
-- memory            22688503621013 #    25.2%
speculation          605948968091   #  0.7% ( 0.8%)
-- branch mispredict 602767658468   #     0.7%
-- pipeline restart  3181309623     #     0.0%
smt-contention       9643199770382  # 10.7% ( 0.0%)
cpu-cycles           45879696338268 # 3.32 GHz
instructions         26657314107701 # 0.58 IPC
instructions         8759583463590  # 79.491 l2 access per 1000 inst
l2 hit from l1       611600487555   # 22.86% l2 miss
l2 miss from l1      107286230319   #
l2 hit from l2 pf    32790526002    #
l3 hit from l2 pf    23400169289    #
l3 miss from l2 pf   28515861814    #
instructions         8759544138410  # 80.986 float per 1000 inst
float 512            70             # 0.000 AVX-512 per 1000 inst
float 256            586            # 0.000 AVX-256 per 1000 inst
float 128            709400728133   # 80.986 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         0              # 0.000 scalar per 1000 inst

Intel metrics

elapsed              2291.272
on_cpu               0.139          # 2.22 / 16 cores
utime                1836.851
stime                3255.948
nvcsw                7011767        # 1.36%
nivcsw               507821164      # 98.64%
inblock              7648           # 3.34/sec
onblock              5576           # 2.43/sec
cpu-clock            27441549967843 # 27441.550 seconds
task-clock           27449294125546 # 27449.294 seconds
page faults          237069157      # 8636.621/sec
context switches     1078148357     # 39277.817/sec
cpu migrations       23548431       # 857.888/sec
major page faults    2234           # 0.081/sec
minor page faults    237066923      # 8636.540/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             9692774844607  # 191.453 branches per 1000 inst
branch misses        57955491662    # 0.60% branch miss
conditional          9692775358495  # 191.453 conditional branches per 1000 inst
indirect             2345563682863  # 46.330 indirect branches per 1000 inst
slots                124996051720466 #
retiring             32187185782664 # 25.8% (25.8%)
-- ucode             4181245792684  #     3.3%
-- fastpath          28005939989980 #    22.4%
frontend             62982740884535 # 50.4% (50.4%)
-- latency           43903284077682 #    35.1%
-- bandwidth         19079456806853 #    15.3%
backend              26797147178785 # 21.4% (21.4%)
-- cpu               4765353392012  #     3.8%
-- memory            22031793786773 #    17.6%
speculation          3176735760047  #  2.5% ( 2.5%)
-- branch mispredict 2355089638929  #     1.9%
-- pipeline restart  821646121118   #     0.7%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           59299293721852 # 1.62 GHz
instructions         45253336959405 # 0.76 IPC
l2 access            2944393364322  # 93.519 l2 access per 1000 inst
l2 miss              859430772105   # 29.19% l2 miss

Process overview shows the many processes launched as clients to run pgbench along with the postgres benchmark. Here would also be useful to separate out topdown metrics for either.

6595 processes
	120 pgbench               7466.61 25775.90
	6086 postgres              6313.98  2805.83
	 68 clinfo                  18.19     5.99
	 38 vulkaninfo               1.32     0.96
	  6 glxinfo:gdrv0            0.15     0.08
	  4 vulkani:disk$0           0.14     0.10
	  6 php                      0.07     0.17
	  2 llvmpipe-0               0.07     0.05
	  2 llvmpipe-1               0.07     0.05
	  2 llvmpipe-10              0.07     0.05
	  2 llvmpipe-11              0.07     0.05
	  2 llvmpipe-12              0.07     0.05
	  2 llvmpipe-13              0.07     0.05
	  2 llvmpipe-14              0.07     0.05
	  2 llvmpipe-15              0.07     0.05
	  2 llvmpipe-2               0.07     0.05
	  2 llvmpipe-3               0.07     0.05
	  2 llvmpipe-4               0.07     0.05
	  2 llvmpipe-5               0.07     0.05
	  2 llvmpipe-6               0.07     0.05
	  2 llvmpipe-7               0.07     0.05
	  2 llvmpipe-8               0.07     0.05
	  2 llvmpipe-9               0.07     0.05
	  2 glxinfo                  0.07     0.04
	  2 glxinfo:cs0              0.07     0.04
	  2 glxinfo:disk$0           0.07     0.04
	  2 glxinfo:sh0              0.07     0.04
	  2 glxinfo:shlo0            0.07     0.04
	  6 clang                    0.04     0.08
	  3 rocminfo                 0.03     0.00
	  1 lspci                    0.00     0.03
	  1 ps                       0.00     0.01
	 90 sh                       0.00     0.00
	 13 gcc                      0.00     0.00
	 12 pg_ctl                   0.00     0.00
	 10 gsettings                0.00     0.00
	  8 stat                     0.00     0.00
	  8 systemd-detect-          0.00     0.00
	  6 bc                       0.00     0.00
	  6 createdb                 0.00     0.00
	  6 llvm-link                0.00     0.00
	  6 sleep                    0.00     0.00
	  5 phoronix-test-s          0.00     0.00
	  4 dropdb                   0.00     0.00
	  4 gmain                    0.00     0.00
	  2 cc                       0.00     0.00
	  2 dconf worker             0.00     0.00
	  2 lscpu                    0.00     0.00
	  2 uname                    0.00     0.00
	  2 which                    0.00     0.00
	  2 xset                     0.00     0.00
	  1 date                     0.00     0.00
	  1 dirname                  0.00     0.00
	  1 dmesg                    0.00     0.00
	  1 dmidecode                0.00     0.00
	  1 grep                     0.00     0.00
	  1 ifconfig                 0.00     0.00
	  1 ip                       0.00     0.00
	  1 lsmod                    0.00     0.00
	  1 mktemp                   0.00     0.00
	  1 qdbus                    0.00     0.00
	  1 readlink                 0.00     0.00
	  1 realpath                 0.00     0.00
	  1 sed                      0.00     0.00
	  1 sort                     0.00     0.00
	  1 stty                     0.00     0.00
	  1 systemctl                0.00     0.00
	  1 template.sh              0.00     0.00
	  1 wc                       0.00     0.00
	  1 xrandr                   0.00     0.00
5 processes running
1037 maximum processes