An open-source in-memory data structure store. This tries to run up to 15 benchmarks but on AMD I have the following with errors saying they don’t run
- pts/redis-1.4.0: Test: GET - Parallel Connections: 50
- pts/redis-1.4.0: Test: SET - Parallel Connections: 50
- pts/redis-1.4.0: Test: GET - Parallel Connections: 500
- pts/redis-1.4.0: Test: LPOP - Parallel Connections: 50
- pts/redis-1.4.0: Test: SADD - Parallel Connections: 50
- pts/redis-1.4.0: Test: SET - Parallel Connections: 500
- pts/redis-1.4.0: Test: LPOP - Parallel Connections: 500
- pts/redis-1.4.0: Test: LPUSH - Parallel Connections: 50
- pts/redis-1.4.0: Test: SADD - Parallel Connections: 500
- pts/redis-1.4.0: Test: LPUSH - Parallel Connections: 500
On Intel the entire process crashes with out of memory killer, even taking the controlling terminal with it as well. So this is a good benchmark to drill on a system with enough memory/swap and see what the actual demand are. These tests have up to 50 concurrent requests, though interesting to see the maximum number of cores used is closer to 10.

Topdown profile shows highest amounts with frontend stalls. Curiously backend is lower, so must be going directly to memory?

AMD metrics suggest this isn’t a compute benchmark with only 0.15 cores used. Even the L2 miss rate isn’t as high as I expected. There are however many page faults and a very high number of branches.
elapsed 2259.922
on_cpu 0.010 # 0.15 / 16 cores
utime 268.567
stime 78.932
nvcsw 162118 # 92.74%
nivcsw 12692 # 7.26%
inblock 0 # 0.00/sec
onblock 16216 # 7.18/sec
cpu-clock 10122237671248 # 10122.238 seconds
task-clock 10122558791444 # 10122.559 seconds
page faults 193723516 # 19137.801/sec
context switches 355145 # 35.085/sec
cpu migrations 10120 # 1.000/sec
major page faults 2 # 0.000/sec
minor page faults 193723514 # 19137.801/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 77198022469055 # 373.947 branches per 1000 inst
branch misses 46781194640 # 0.06% branch miss
conditional 75399169886602 # 365.233 conditional branches per 1000 inst
indirect 266512608553 # 1.291 indirect branches per 1000 inst
cpu-cycles 70964703660624 # 1.33 GHz
instructions 314901062257201 # 4.44 IPC high
slots 141976486254930 #
retiring 68709077340669 # 48.4% (49.6%)
-- ucode 24903181428 # 0.0%
-- fastpath 68684174159241 # 48.4%
frontend 10851835723585 # 7.6% ( 7.8%)
-- latency 6110657230428 # 4.3%
-- bandwidth 4741178493157 # 3.3%
backend 58757120586209 # 41.4% (42.4%)
-- cpu 12458026094976 # 8.8%
-- memory 46299094491233 # 32.6%
speculation 247109410646 # 0.2% ( 0.2%) low
-- branch mispredict 244955058544 # 0.2%
-- pipeline restart 2154352102 # 0.0%
smt-contention 3411236551070 # 2.4% ( 0.0%)
cpu-cycles 939713831540 # 0.15 GHz
instructions 1079058399556 # 1.15 IPC
instructions 360259392636 # 44.335 l2 access per 1000 inst
l2 hit from l1 13251426351 # 12.57% l2 miss
l2 miss from l1 534270581 #
l2 hit from l2 pf 1247616791 #
l3 hit from l2 pf 84447007 #
l3 miss from l2 pf 1388497154 #
instructions 359931132234 # 19.095 float per 1000 inst
float 512 111 # 0.000 AVX-512 per 1000 inst
float 256 624 # 0.000 AVX-256 per 1000 inst
float 128 6872979115 # 19.095 AVX-128 per 1000 inst
float MMX 0 # 0.000 MMX per 1000 inst
float scalar 0 # 0.000 scalar per 1000 inst
This is not running reliably on Intel. In particular, it crashes the running terminal. Looking at syslog I see OOM out of memory errors. Initially it got through many of the cases for 1000 simultaneous but eventually it crashes even when configured with 50 requests. This system has 16Gb of memory. Below are partial metrics when I was able to collect topdown for a run (it crashed on IPC so didn’t get that)
slots 272335914937490 #
retiring 142603219927690 # 52.4% (52.4%)
-- ucode 4271167488209 # 1.6%
-- fastpath 138332052439481 # 50.8%
frontend 66038620923890 # 24.2% (24.2%)
-- latency 7911569467471 # 2.9%
-- bandwidth 58127051456419 # 21.3%
backend 61775042796876 # 22.7% (22.7%)
-- cpu 55815826753567 # 20.5%
-- memory 5959216043309 # 2.2%
speculation 1837174312495 # 0.7% ( 0.7%) low
-- branch mispredict 1244841407692 # 0.5%
-- pipeline restart 592332904803 # 0.2%
smt-contention 0 # 0.0% ( 0.0%)
The process profile also has fewer processes than I expected
796 processes
30 bio_aof_fsync 50.40 137.14
30 bio_close_file 50.40 137.14
30 bio_lazy_free 50.40 137.14
30 io_thd_1 50.40 137.14
30 io_thd_2 50.40 137.14
30 io_thd_3 50.40 137.14
30 io_thd_4 50.40 137.14
30 io_thd_5 50.40 137.14
30 io_thd_6 50.40 137.14
30 io_thd_7 50.40 137.14
30 redis-server 50.40 137.14
68 clinfo 16.20 6.66
38 vulkaninfo 1.13 1.14
6 glxinfo:gdrv0 0.15 0.00
6 glxinfo:gl0 0.15 0.00
4 vulkani:disk$0 0.12 0.12
6 php 0.11 0.19
2 glxinfo 0.08 0.00
2 glxinfo:cs0 0.07 0.00
2 glxinfo:disk$0 0.07 0.00
2 glxinfo:sh0 0.07 0.00
2 glxinfo:shlo0 0.07 0.00
6 clang 0.06 0.06
2 llvmpipe-0 0.06 0.06
2 llvmpipe-1 0.06 0.06
2 llvmpipe-10 0.06 0.06
2 llvmpipe-11 0.06 0.06
2 llvmpipe-12 0.06 0.06
2 llvmpipe-13 0.06 0.06
2 llvmpipe-14 0.06 0.06
2 llvmpipe-15 0.06 0.06
2 llvmpipe-2 0.06 0.06
2 llvmpipe-3 0.06 0.06
2 llvmpipe-4 0.06 0.06
2 llvmpipe-5 0.06 0.06
2 llvmpipe-6 0.06 0.06
2 llvmpipe-7 0.06 0.06
2 llvmpipe-8 0.06 0.06
2 llvmpipe-9 0.06 0.06
3 rocminfo 0.03 0.00
30 redis-benchmark 0.00 0.35
1 lspci 0.00 0.02
1 ps 0.00 0.01
80 sh 0.00 0.00
31 sed 0.00 0.00
30 redis 0.00 0.00
30 sleep 0.00 0.00
13 gcc 0.00 0.00
9 gsettings 0.00 0.00
8 stat 0.00 0.00
8 systemd-detect- 0.00 0.00
6 llvm-link 0.00 0.00
5 gmain 0.00 0.00
5 phoronix-test-s 0.00 0.00
2 cc 0.00 0.00
2 dconf worker 0.00 0.00
2 lscpu 0.00 0.00
2 uname 0.00 0.00
2 which 0.00 0.00
2 xset 0.00 0.00
1 date 0.00 0.00
1 dirname 0.00 0.00
1 dmesg 0.00 0.00
1 dmidecode 0.00 0.00
1 grep 0.00 0.00
1 ifconfig 0.00 0.00
1 ip 0.00 0.00
1 lsmod 0.00 0.00
1 mktemp 0.00 0.00
1 qdbus 0.00 0.00
1 readlink 0.00 0.00
1 realpath 0.00 0.00
1 sort 0.00 0.00
1 stty 0.00 0.00
1 systemctl 0.00 0.00
1 template.sh 0.00 0.00
1 wc 0.00 0.00
1 xrandr 0.00 0.00
0 processes running
47 maximum processes
The computation blocks also look straightforward
963332) redis cpu=5 start=5.22 finish=11.35
963333) redis-server cpu=2 start=5.22 finish=11.41
963335) bio_close_file cpu=8 start=5.22 finish=11.41
963336) bio_aof_fsync cpu=7 start=5.22 finish=11.41
963337) bio_lazy_free cpu=4 start=5.22 finish=11.41
963338) io_thd_1 cpu=10 start=5.22 finish=11.41
963339) io_thd_2 cpu=6 start=5.22 finish=11.41
963340) io_thd_3 cpu=5 start=5.22 finish=11.41
963341) io_thd_4 cpu=11 start=5.22 finish=11.41
963342) io_thd_5 cpu=0 start=5.22 finish=11.41
963343) io_thd_6 cpu=12 start=5.22 finish=11.41
963344) io_thd_7 cpu=13 start=5.22 finish=11.41
963334) sleep cpu=11 start=5.22 finish=11.22
963345) redis-benchmark cpu=6 start=11.22 finish=11.34
963346) sed cpu=1 start=11.35 finish=11.35
In summary, this seems like a workload to better characterize with different analysis. I have some of that in the exit lines for the various processes -and good follow up would be to see what to decorate the tree with, e.g.
- virtual memory size
- resident size
Here for example is a block after I have added a -M option to proctree to print the virtual memory size:
963597) redis cpu=10 start=222.03 finish=228.57 vmsize=2896k
963598) redis-server cpu=3 start=222.03 finish=228.61 vmsize=1923576k
963600) bio_close_file cpu=14 start=222.04 finish=228.61 vmsize=1923576k
963601) bio_aof_fsync cpu=7 start=222.04 finish=228.61 vmsize=1923576k
963602) bio_lazy_free cpu=0 start=222.04 finish=228.61 vmsize=1923576k
963603) io_thd_1 cpu=12 start=222.04 finish=228.61 vmsize=1923576k
963604) io_thd_2 cpu=10 start=222.04 finish=228.61 vmsize=1923576k
963605) io_thd_3 cpu=13 start=222.04 finish=228.61 vmsize=1923576k
963606) io_thd_4 cpu=6 start=222.04 finish=228.61 vmsize=1923576k
963607) io_thd_5 cpu=15 start=222.04 finish=228.61 vmsize=1923576k
963608) io_thd_6 cpu=4 start=222.04 finish=228.61 vmsize=1923576k
963609) io_thd_7 cpu=8 start=222.04 finish=228.61 vmsize=1923576k
963599) sleep cpu=13 start=222.03 finish=228.03 vmsize=8376k
963610) redis-benchmark cpu=12 start=228.03 finish=228.56 vmsize=51856k
963611) sed cpu=13 start=228.57 finish=228.57 vmsize=9300k
I would probably want to get additional information from the /proc/pid/statm file, which is described on the man page as providing:
/proc/[pid]/statm
Provides information about memory usage, measured in pages. The columns are:
size (1) total program size
(same as VmSize in /proc/[pid]/status)
resident (2) resident set size
(inaccurate; same as VmRSS in /proc/[pid]/status)
shared (3) number of resident shared pages
(i.e., backed by a file)
(inaccurate; same as RssFile+RssShmem in
/proc/[pid]/status)
text (4) text (code)
lib (5) library (unused since Linux 2.6; always 0)
data (6) data + stack
dt (7) dirty pages (unused since Linux 2.6; always 0)
