{"id":918,"date":"2024-01-26T01:31:09","date_gmt":"2024-01-26T01:31:09","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=918"},"modified":"2024-01-27T16:11:02","modified_gmt":"2024-01-27T16:11:02","slug":"redis","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/redis\/","title":{"rendered":"redis"},"content":{"rendered":"\n<p>An open-source in-memory data structure store.  This tries to run up to 15 benchmarks but on AMD I have the following with errors saying they don&#8217;t run<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>- pts\/redis-1.4.0: Test: GET - Parallel Connections: 50\n- pts\/redis-1.4.0: Test: SET - Parallel Connections: 50\n- pts\/redis-1.4.0: Test: GET - Parallel Connections: 500\n- pts\/redis-1.4.0: Test: LPOP - Parallel Connections: 50\n- pts\/redis-1.4.0: Test: SADD - Parallel Connections: 50\n- pts\/redis-1.4.0: Test: SET - Parallel Connections: 500\n- pts\/redis-1.4.0: Test: LPOP - Parallel Connections: 500\n- pts\/redis-1.4.0: Test: LPUSH - Parallel Connections: 50\n- pts\/redis-1.4.0: Test: SADD - Parallel Connections: 500\n- pts\/redis-1.4.0: Test: LPUSH - Parallel Connections: 500<\/code><\/pre>\n\n\n\n<p>On Intel the entire process crashes with out of memory killer, even taking the controlling terminal with it as well. So this is a good benchmark to drill on a system with enough memory\/swap and see what the actual demand are.  These tests have up to 50 concurrent requests, though interesting to see the maximum number of cores used is closer to 10.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-70.png\" alt=\"\" class=\"wp-image-976\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-70.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-70-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-70-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows highest amounts with frontend stalls. Curiously backend is lower, so must be going directly to memory?<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-108.png\" alt=\"\" class=\"wp-image-977\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-108.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-108-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-108-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics suggest this isn&#8217;t a compute benchmark with only 0.15 cores used. Even the L2 miss rate isn&#8217;t as high as I expected.  There are however many page faults and a very high number of branches.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              2259.922\non_cpu               0.010          # 0.15 \/ 16 cores\nutime                268.567\nstime                78.932\nnvcsw                162118         # 92.74%\nnivcsw               12692          # 7.26%\ninblock              0              # 0.00\/sec\nonblock              16216          # 7.18\/sec\ncpu-clock            10122237671248 # 10122.238 seconds\ntask-clock           10122558791444 # 10122.559 seconds\npage faults          193723516      # 19137.801\/sec\ncontext switches     355145         # 35.085\/sec\ncpu migrations       10120          # 1.000\/sec\nmajor page faults    2              # 0.000\/sec\nminor page faults    193723514      # 19137.801\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             77198022469055 # 373.947 branches per 1000 inst\nbranch misses        46781194640    # 0.06% branch miss\nconditional          75399169886602 # 365.233 conditional branches per 1000 inst\nindirect             266512608553   # 1.291 indirect branches per 1000 inst\ncpu-cycles           70964703660624 # 1.33 GHz\ninstructions         314901062257201 # 4.44 IPC high\nslots                141976486254930 #\nretiring             68709077340669 # 48.4% (49.6%)\n-- ucode             24903181428    #     0.0%\n-- fastpath          68684174159241 #    48.4%\nfrontend             10851835723585 #  7.6% ( 7.8%)\n-- latency           6110657230428  #     4.3%\n-- bandwidth         4741178493157  #     3.3%\nbackend              58757120586209 # 41.4% (42.4%)\n-- cpu               12458026094976 #     8.8%\n-- memory            46299094491233 #    32.6%\nspeculation          247109410646   #  0.2% ( 0.2%) low\n-- branch mispredict 244955058544   #     0.2%\n-- pipeline restart  2154352102     #     0.0%\nsmt-contention       3411236551070  #  2.4% ( 0.0%)\ncpu-cycles           939713831540   # 0.15 GHz\ninstructions         1079058399556  # 1.15 IPC\ninstructions         360259392636   # 44.335 l2 access per 1000 inst\nl2 hit from l1       13251426351    # 12.57% l2 miss\nl2 miss from l1      534270581      #\nl2 hit from l2 pf    1247616791     #\nl3 hit from l2 pf    84447007       #\nl3 miss from l2 pf   1388497154     #\ninstructions         359931132234   # 19.095 float per 1000 inst\nfloat 512            111            # 0.000 AVX-512 per 1000 inst\nfloat 256            624            # 0.000 AVX-256 per 1000 inst\nfloat 128            6872979115     # 19.095 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>This is not running reliably on Intel. In particular, it crashes the running terminal. Looking at syslog I see OOM out of memory errors. Initially it got through many of the cases for 1000 simultaneous but eventually it crashes even when configured with 50 requests. This system has 16Gb of memory.  Below are partial metrics when I was able to collect topdown for a run (it crashed on IPC so didn&#8217;t get that)<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>slots                272335914937490 #\nretiring             142603219927690 # 52.4% (52.4%)\n-- ucode             4271167488209  #     1.6%\n-- fastpath          138332052439481 #    50.8%\nfrontend             66038620923890 # 24.2% (24.2%)\n-- latency           7911569467471  #     2.9%\n-- bandwidth         58127051456419 #    21.3%\nbackend              61775042796876 # 22.7% (22.7%)\n-- cpu               55815826753567 #    20.5%\n-- memory            5959216043309  #     2.2%\nspeculation          1837174312495  #  0.7% ( 0.7%) low\n-- branch mispredict 1244841407692  #     0.5%\n-- pipeline restart  592332904803   #     0.2%\nsmt-contention       0              #  0.0% ( 0.0%)\n<\/code><\/pre>\n\n\n\n<p>The process profile also has fewer processes than I expected<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>796 processes\n\t 30 bio_aof_fsync           50.40   137.14\n\t 30 bio_close_file          50.40   137.14\n\t 30 bio_lazy_free           50.40   137.14\n\t 30 io_thd_1                50.40   137.14\n\t 30 io_thd_2                50.40   137.14\n\t 30 io_thd_3                50.40   137.14\n\t 30 io_thd_4                50.40   137.14\n\t 30 io_thd_5                50.40   137.14\n\t 30 io_thd_6                50.40   137.14\n\t 30 io_thd_7                50.40   137.14\n\t 30 redis-server            50.40   137.14\n\t 68 clinfo                  16.20     6.66\n\t 38 vulkaninfo               1.13     1.14\n\t  6 glxinfo:gdrv0            0.15     0.00\n\t  6 glxinfo:gl0              0.15     0.00\n\t  4 vulkani:disk$0           0.12     0.12\n\t  6 php                      0.11     0.19\n\t  2 glxinfo                  0.08     0.00\n\t  2 glxinfo:cs0              0.07     0.00\n\t  2 glxinfo:disk$0           0.07     0.00\n\t  2 glxinfo:sh0              0.07     0.00\n\t  2 glxinfo:shlo0            0.07     0.00\n\t  6 clang                    0.06     0.06\n\t  2 llvmpipe-0               0.06     0.06\n\t  2 llvmpipe-1               0.06     0.06\n\t  2 llvmpipe-10              0.06     0.06\n\t  2 llvmpipe-11              0.06     0.06\n\t  2 llvmpipe-12              0.06     0.06\n\t  2 llvmpipe-13              0.06     0.06\n\t  2 llvmpipe-14              0.06     0.06\n\t  2 llvmpipe-15              0.06     0.06\n\t  2 llvmpipe-2               0.06     0.06\n\t  2 llvmpipe-3               0.06     0.06\n\t  2 llvmpipe-4               0.06     0.06\n\t  2 llvmpipe-5               0.06     0.06\n\t  2 llvmpipe-6               0.06     0.06\n\t  2 llvmpipe-7               0.06     0.06\n\t  2 llvmpipe-8               0.06     0.06\n\t  2 llvmpipe-9               0.06     0.06\n\t  3 rocminfo                 0.03     0.00\n\t 30 redis-benchmark          0.00     0.35\n\t  1 lspci                    0.00     0.02\n\t  1 ps                       0.00     0.01\n\t 80 sh                       0.00     0.00\n\t 31 sed                      0.00     0.00\n\t 30 redis                    0.00     0.00\n\t 30 sleep                    0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t  9 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 gmain                    0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 dconf worker             0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>The computation blocks also look straightforward<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      963332) redis            cpu=5 start=5.22  finish=11.35\n        963333) redis-server     cpu=2 start=5.22  finish=11.41\n          963335) bio_close_file   cpu=8 start=5.22  finish=11.41\n          963336) bio_aof_fsync    cpu=7 start=5.22  finish=11.41\n          963337) bio_lazy_free    cpu=4 start=5.22  finish=11.41\n          963338) io_thd_1         cpu=10 start=5.22  finish=11.41\n          963339) io_thd_2         cpu=6 start=5.22  finish=11.41\n          963340) io_thd_3         cpu=5 start=5.22  finish=11.41\n          963341) io_thd_4         cpu=11 start=5.22  finish=11.41\n          963342) io_thd_5         cpu=0 start=5.22  finish=11.41\n          963343) io_thd_6         cpu=12 start=5.22  finish=11.41\n          963344) io_thd_7         cpu=13 start=5.22  finish=11.41\n        963334) sleep            cpu=11 start=5.22  finish=11.22\n        963345) redis-benchmark  cpu=6 start=11.22 finish=11.34\n        963346) sed              cpu=1 start=11.35 finish=11.35\n<\/code><\/pre>\n\n\n\n<p>In summary, this seems like a workload to better characterize with different analysis.  I have some of that in the exit lines for the various processes -and good follow up would be to see what to decorate the tree with, e.g.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>virtual memory size<\/li>\n\n\n\n<li>resident size<\/li>\n<\/ul>\n\n\n\n<p>Here for example is a block after I have added a -M option to proctree to print the virtual memory size:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      963597) redis            cpu=10 start=222.03 finish=228.57 vmsize=2896k\n        963598) redis-server     cpu=3 start=222.03 finish=228.61 vmsize=1923576k\n          963600) bio_close_file   cpu=14 start=222.04 finish=228.61 vmsize=1923576k\n          963601) bio_aof_fsync    cpu=7 start=222.04 finish=228.61 vmsize=1923576k\n          963602) bio_lazy_free    cpu=0 start=222.04 finish=228.61 vmsize=1923576k\n          963603) io_thd_1         cpu=12 start=222.04 finish=228.61 vmsize=1923576k\n          963604) io_thd_2         cpu=10 start=222.04 finish=228.61 vmsize=1923576k\n          963605) io_thd_3         cpu=13 start=222.04 finish=228.61 vmsize=1923576k\n          963606) io_thd_4         cpu=6 start=222.04 finish=228.61 vmsize=1923576k\n          963607) io_thd_5         cpu=15 start=222.04 finish=228.61 vmsize=1923576k\n          963608) io_thd_6         cpu=4 start=222.04 finish=228.61 vmsize=1923576k\n          963609) io_thd_7         cpu=8 start=222.04 finish=228.61 vmsize=1923576k\n        963599) sleep            cpu=13 start=222.03 finish=228.03 vmsize=8376k\n        963610) redis-benchmark  cpu=12 start=228.03 finish=228.56 vmsize=51856k\n        963611) sed              cpu=13 start=228.57 finish=228.57 vmsize=9300k\n<\/code><\/pre>\n\n\n\n<p>I would probably want to get additional information from the \/proc\/pid\/statm file, which is described on the man page as providing:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>       \/proc\/&#91;pid]\/statm\n              Provides information about memory usage, measured in pages.  The columns are:\n\n                  size       (1) total program size\n                             (same as VmSize in \/proc\/&#91;pid]\/status)\n                  resident   (2) resident set size\n                             (inaccurate; same as VmRSS in \/proc\/&#91;pid]\/status)\n                  shared     (3) number of resident shared pages\n                             (i.e., backed by a file)\n                             (inaccurate; same as RssFile+RssShmem in\n                             \/proc\/&#91;pid]\/status)\n                  text       (4) text (code)\n                  lib        (5) library (unused since Linux 2.6; always 0)\n                  data       (6) data + stack\n                  dt         (7) dirty pages (unused since Linux 2.6; always 0)\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>An open-source in-memory data structure store. This tries to run up to 15 benchmarks but on AMD I have the following with errors saying they don&#8217;t run On Intel the entire process crashes with out of memory killer, even taking <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/redis\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-918","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/918","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=918"}],"version-history":[{"count":3,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/918\/revisions"}],"predecessor-version":[{"id":978,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/918\/revisions\/978"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=918"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}