{"id":908,"date":"2024-01-26T01:26:05","date_gmt":"2024-01-26T01:26:05","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=908"},"modified":"2024-01-26T11:43:55","modified_gmt":"2024-01-26T11:43:55","slug":"cassandra","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/cassandra\/","title":{"rendered":"cassandra"},"content":{"rendered":"\n<p>A benchmark of the Apache Cassandra NoSQL database. There are three workloads. First writes, then read\/write in ratio 1:1 and then a ratio of 3:1. Looks like a variable set of runnable processes.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-66.png\" alt=\"\" class=\"wp-image-940\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-66.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-66-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-66-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile is weighted towards frontend stalls. After that some backend memory stalls with overall retirement rate lower than average.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-103.png\" alt=\"\" class=\"wp-image-942\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-103.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-103-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-103-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show not much floating point, a high amount of L2 access including misses and frontend latency almost half the total.Also interesting that only about 1\/4 of the cores busy.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              4332.928\non_cpu               0.258          # 4.13 \/ 16 cores\nutime                8905.936\nstime                8995.891\nnvcsw                601776779      # 94.13%\nnivcsw               37510851       # 5.87%\ninblock              9152           # 2.11\/sec\nonblock              294672         # 68.01\/sec\ncpu-clock            46605354321724 # 46605.354 seconds\ntask-clock           47003106395359 # 47003.106 seconds\npage faults          35135335       # 747.511\/sec\ncontext switches     1432379568     # 30474.147\/sec\ncpu migrations       836923710      # 17805.711\/sec\nmajor page faults    5111418        # 108.746\/sec\nminor page faults    29976133       # 637.748\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             19677696605382 # 169.462 branches per 1000 inst\nbranch misses        1174720045057  # 5.97% branch miss\nconditional          14157392473496 # 121.922 conditional branches per 1000 inst\nindirect             398569475290   # 3.432 indirect branches per 1000 inst\ncpu-cycles           110019767419271 # 2.46 GHz\ninstructions         78514350926593 # 0.71 IPC\nslots                215057315796582 #\nretiring             26727298203613 # 12.4% (14.0%)\n-- ucode             107310119389   #     0.0%\n-- fastpath          26619988084224 #    12.4%\nfrontend             121877473591996 # 56.7% (64.0%)\n-- latency           105576764308332 #    49.1%\n-- bandwidth         16300709283664 #     7.6%\nbackend              37980810293573 # 17.7% (19.9%)\n-- cpu               3183460600102  #     1.5%\n-- memory            34797349693471 #    16.2%\nspeculation          3728650165352  #  1.7% ( 2.0%)\n-- branch mispredict 3700122721449  #     1.7%\n-- pipeline restart  28527443903    #     0.0%\nsmt-contention       24641343322125 # 11.5% ( 0.0%)\ncpu-cycles           118359556086527 # 2.48 GHz\ninstructions         83709827177511 # 0.71 IPC\ninstructions         27330974978530 # 104.378 l2 access per 1000 inst\nl2 hit from l1       2521875844280  # 40.19% l2 miss\nl2 miss from l1      913841730923   #\nl2 hit from l2 pf    98215015831    #\nl3 hit from l2 pf    193261277895   #\nl3 miss from l2 pf   39386943790    #\ninstructions         27344544465946 # 8.199 float per 1000 inst\nfloat 512            308            # 0.000 AVX-512 per 1000 inst\nfloat 256            962            # 0.000 AVX-256 per 1000 inst\nfloat 128            224187156471   # 8.199 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         33788          # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              4097.853\non_cpu               0.265          # 4.24 \/ 16 cores\nutime                10573.212\nstime                6785.551\nnvcsw                549845910      # 94.17%\nnivcsw               34034739       # 5.83%\ninblock              1175312        # 286.81\/sec\nonblock              255152         # 62.26\/sec\ncpu-clock            46773863724106 # 46773.864 seconds\ntask-clock           47010972033464 # 47010.972 seconds\npage faults          28928282       # 615.352\/sec\ncontext switches     1341725858     # 28540.696\/sec\ncpu migrations       871346665      # 18534.964\/sec\nmajor page faults    3548986        # 75.493\/sec\nminor page faults    25312394       # 538.436\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             17578871409480 # 162.943 branches per 1000 inst\nbranch misses        285816420521   # 1.63% branch miss\nconditional          17578871753096 # 162.943 conditional branches per 1000 inst\nindirect             3475864234757  # 32.219 indirect branches per 1000 inst\nslots                260335397251850 #\nretiring             80256736492957 # 30.8% (30.8%)\n-- ucode             6584405418875  #     2.5%\n-- fastpath          73672331074082 #    28.3%\nfrontend             90294915158826 # 34.7% (34.7%)\n-- latency           59016419380680 #    22.7%\n-- bandwidth         31278495778146 #    12.0%\nbackend              61537137199156 # 23.6% (23.6%)\n-- cpu               23158639207958 #     8.9%\n-- memory            38378497991198 #    14.7%\nspeculation          30134034973779 # 11.6% (11.6%)\n-- branch mispredict 28831919130365 #    11.1%\n-- pipeline restart  1302115843414  #     0.5%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           166264750189585 # 2.23 GHz\ninstructions         157451768214939 # 0.95 IPC\nl2 access            7724180485399  # 103.319 l2 access per 1000 inst\nl2 miss              3266744587191  # 42.29% l2 miss\n<\/code><\/pre>\n\n\n\n<p>Interesting set of process names being used. This is a JDK program and many existing processes seem to run throughout the benchmark. We lost some events at the end.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>9103 processes\n\t 85 JMX server conn      1369594.41     0.00\n\t 25 RMI TCP Connect      489389.80     0.00\n\t 64 epollEventLoopG      313049.92 140784.96\n\t1184 cluster1-nio-wo      260370.83 215330.45\n\t 38 Native-Transpor      188366.70 85160.72\n\t  4 RMI TCP Accept-      133587.60     0.00\n\t 23 MemtableFlushWr      93993.37 36670.56\n\t 23 PerDiskMemtable      93988.87 36668.69\n\t 87 VM Periodic Tas      66793.80     0.00\n\t  2 CMS Main Thread      66793.80     0.00\n\t  8 MutationStage-2      39131.24 17598.12\n\t174 java                 35845.62 22259.47\n\t  5 MutationStage-1      23210.37 10214.00\n\t  7 CompactionExecu      22095.39  9798.23\n\t 96 globalEventExec      21541.15 17823.12\n\t  4 LocalPool-Clean      19565.62  8799.06\n\t 87 Finalizer            17922.81 11129.74\n\t 42 Common-Cleaner       17921.81 11129.00\n\t  2 AsyncAppender-W       9782.81  4399.53\n\t  2 BatchlogTasks:1       9782.81  4399.53\n\t  2 COMMIT-LOG-ALLO       9782.81  4399.53\n\t  2 Callback-Map-Re       9782.81  4399.53\n\t  2 ForkJoinPool.co       9782.81  4399.53\n\t  2 GossipTasks:1         9782.81  4399.53\n\t  2 HintsWriteExecu       9782.81  4399.53\n\t  2 IndexSummaryMan       9782.81  4399.53\n\t  2 MemtablePostFlu       9782.81  4399.53\n\t  2 MemtableReclaim       9782.81  4399.53\n\t  2 Messaging-Accep       9782.81  4399.53\n\t  2 MigrationStage:       9782.81  4399.53\n\t  2 NonPeriodicTask       9782.81  4399.53\n\t  2 OptionalTasks:1       9782.81  4399.53\n\t  2 PERIODIC-COMMIT       9782.81  4399.53\n\t  2 PendingRangeCal       9782.81  4399.53\n\t  2 Reference-Reape       9782.81  4399.53\n\t  2 ScheduledFastTa       9782.81  4399.53\n\t  2 ScheduledTasks:       9782.81  4399.53\n\t  2 SecondaryIndexM       9782.81  4399.53\n\t  2 SlabPoolCleaner       9782.81  4399.53\n\t  2 SnapshotCleanup       9782.81  4399.53\n\t  2 logback-1             9782.81  4399.53\n\t  2 logback-2             9782.81  4399.53\n\t  2 logback-3             9782.81  4399.53\n\t  2 logback-4             9782.81  4399.53\n\t  2 logback-5             9782.81  4399.53\n\t  2 logback-6             9782.81  4399.53\n\t  2 logback-7             9782.81  4399.53\n\t  2 logback-8             9782.81  4399.53\n\t  2 read-hotness-tr       9782.81  4399.53\n\t 89 StressMetrics         8901.83  6859.15\n\t 37 ObjectCleanerTh       8136.82  6729.24\n\t 37 JmxCollector:1        8136.80  6729.23\n\t 37 JmxCollector:10       8136.80  6729.23\n\t 37 JmxCollector:11       8136.80  6729.23\n\t 37 JmxCollector:12       8136.80  6729.23\n\t 37 JmxCollector:13       8136.80  6729.23\n\t 37 JmxCollector:14       8136.80  6729.23\n\t 37 JmxCollector:15       8136.80  6729.23\n\t 37 JmxCollector:16       8136.80  6729.23\n\t 37 JmxCollector:2        8136.80  6729.23\n\t 37 JmxCollector:3        8136.80  6729.23\n\t 37 JmxCollector:4        8136.80  6729.23\n\t 37 JmxCollector:5        8136.80  6729.23\n\t 37 JmxCollector:6        8136.80  6729.23\n\t 37 JmxCollector:7        8136.80  6729.23\n\t 37 JmxCollector:8        8136.80  6729.23\n\t 37 JmxCollector:9        8136.80  6729.23\n\t 37 Logging-Cleaner       8136.70  6729.17\n\t 37 Thread-0              8136.67  6729.17\n\t 37 Shutdown-checke       8136.66  6729.14\n\t 37 cluster1-connec       8136.66  6729.14\n\t 37 cluster1-timeou       8136.65  6729.14\n\t 37 cluster1-schedu       8132.43  6728.24\n\t  1 MutationStage-4       6138.06  2984.59\n\t 70 cluster1-worker       6065.67  4522.42\n\t 38 Thread-21             5089.49  4092.63\n\t 38 Thread-22             5089.49  4092.62\n\t 38 Thread-25             5089.47  4092.63\n\t 38 Thread-28             5089.47  4092.63\n\t 38 Thread-32             5089.47  4092.63\n\t 38 Thread-34             5089.47  4092.63\n\t 38 Thread-20             5089.47  4092.62\n\t 38 Thread-35             5089.46  4092.63\n\t 38 Thread-23             5089.46  4092.62\n\t 38 Thread-24             5089.46  4092.62\n\t 38 Thread-26             5089.45  4092.62\n\t 38 Thread-27             5089.45  4092.62\n\t 38 Thread-29             5089.45  4092.62\n\t 38 Thread-31             5089.45  4092.62\n\t 38 Thread-33             5089.45  4092.62\n\t 38 Thread-30             5089.44  4092.62\n\t  1 ReadStage-10          3644.75  1414.94\n\t  1 ReadStage-18          3644.75  1414.94\n\t  1 ReadStage-19          3644.75  1414.94\n\t  1 ReadStage-20          3644.75  1414.94\n\t  1 ReadStage-25          3644.75  1414.94\n\t  1 ReadStage-31          3644.75  1414.94\n\t  1 ReadStage-4           3644.75  1414.94\n\t 13 Thread-37             3287.03  2684.24\n\t 13 Thread-39             3287.03  2684.24\n\t 13 Thread-40             3287.03  2684.24\n\t 13 Thread-41             3287.03  2684.24\n\t 13 Thread-42             3287.03  2684.24\n\t 13 Thread-44             3287.03  2684.24\n\t 13 Thread-46             3287.03  2684.24\n\t 13 Thread-47             3287.03  2684.24\n\t 13 Thread-48             3287.03  2684.24\n\t 13 Thread-51             3287.03  2684.24\n\t 13 Thread-52             3287.03  2684.24\n\t 13 Thread-43             3287.03  2684.23\n\t 13 Thread-45             3287.03  2684.23\n\t 13 Thread-49             3287.03  2684.23\n\t 13 Thread-38             3287.02  2684.24\n\t 13 Thread-50             3287.02  2684.24\n\t 38 Thread-4               523.50    82.11\n\t 38 Thread-3               523.49    82.12\n\t 38 Thread-6               523.49    82.11\n\t 38 Thread-10              523.48    82.12\n\t 38 Thread-5               523.48    82.11\n\t 38 Thread-7               523.47    82.12\n\t 38 Thread-9               523.46    82.11\n\t 38 Thread-12              523.44    82.12\n\t 38 Thread-8               523.44    82.11\n\t 38 Thread-11              523.43    82.11\n\t 38 Thread-14              523.43    82.10\n\t 38 Thread-16              523.43    82.10\n\t 38 Thread-17              523.43    82.10\n\t 38 Thread-13              523.42    82.11\n\t 38 Thread-18              523.42    82.10\n\t 38 Thread-15              523.41    82.11\n\t  9 loadSavedCache:         51.25    25.02\n\t 31 clinfo                   8.41     3.63\n\t 19 vulkaninfo               0.73     0.76\n\t  3 find                     0.36     0.48\n\t  3 glxinfo:gdrv0            0.08     0.03\n\t  3 glxinfo:gl0              0.08     0.03\n\t  6 ldconfig.real            0.07     0.15\n\t  2 vulkani:disk$0           0.07     0.08\n\t  6 clang                    0.06     0.06\n\t  1 llvmpipe-0               0.04     0.04\n\t  1 llvmpipe-1               0.04     0.04\n\t  1 llvmpipe-10              0.04     0.04\n\t  1 llvmpipe-11              0.04     0.04\n\t  1 llvmpipe-12              0.04     0.04\n\t  1 llvmpipe-13              0.04     0.04\n\t  1 llvmpipe-14              0.04     0.04\n\t  1 llvmpipe-15              0.04     0.04\n\t  1 llvmpipe-2               0.04     0.04\n\t  1 llvmpipe-3               0.04     0.04\n\t  1 llvmpipe-4               0.04     0.04\n\t  1 llvmpipe-5               0.04     0.04\n\t  1 llvmpipe-6               0.04     0.04\n\t  1 llvmpipe-7               0.04     0.04\n\t  1 llvmpipe-8               0.04     0.04\n\t  1 llvmpipe-9               0.04     0.04\n\t  1 glxinfo                  0.04     0.01\n\t  1 glxinfo:cs0              0.04     0.01\n\t  1 glxinfo:disk$0           0.04     0.01\n\t  1 glxinfo:sh0              0.04     0.01\n\t  1 glxinfo:shlo0            0.04     0.01\n\t518 C1 CompilerThre          0.00 307397.68\n\t890 C2 CompilerThre          0.00 218273.37\n\t 87 GC Thread#0              0.00 17922.81\n\t 87 Reference Handl          0.00 17922.81\n\t 87 Service Thread           0.00 17922.81\n\t 87 Signal Dispatch          0.00 17922.81\n\t 87 VM Thread                0.00 17922.81\n\t 87 Sweeper thread           0.00 17922.80\n\t 42 GC Thread#1              0.00 17921.81\n\t 42 GC Thread#2              0.00 17921.81\n\t 42 GC Thread#3              0.00 17921.81\n\t 42 GC Thread#4              0.00 17921.81\n\t 42 GC Thread#5              0.00 17921.81\n\t 42 GC Thread#6              0.00 17921.81\n\t 42 GC Thread#7              0.00 17921.81\n\t 42 GC Thread#8              0.00 17921.81\n\t 42 GC Thread#9              0.00 17921.81\n\t 39 RMI Scheduler(0          0.00 17919.62\n\t 39 GC Thread#10             0.00 17919.61\n\t 39 GC Thread#11             0.00 17919.61\n\t 39 GC Thread#12             0.00 17919.61\n\t  2 CMS Thread#0             0.00  9782.81\n\t  2 CMS Thread#1             0.00  9782.81\n\t  2 CMS Thread#2             0.00  9782.81\n\t  2 CMS Thread#3             0.00  9782.81\n\t 85 G1 Conc#0                0.00  8140.00\n\t 85 G1 Refine#0              0.00  8139.88\n\t 37 G1 Conc#1                0.00  8136.82\n\t 37 G1 Conc#2                0.00  8136.82\n\t 37 RMI RenewClean-          0.00  8136.80\n\t 33 G1 Refine#1              0.00  7264.24\n\t 27 G1 Refine#2              0.00  5765.48\n\t  3 process reaper           0.00   373.36\n\t  1 G1 Refine#3              0.00   317.12\n\t  5 rm                       0.00     1.25\n\t113 cassandra                0.00     0.02\n\t327 cassandra-stres          0.00     0.00\n\t228 grep                     0.00     0.00\n\t135 awk                      0.00     0.00\n\t 89 dirname                  0.00     0.00\n\t 87 JMX client hear          0.00     0.00\n\t 85 G1 Main Marker           0.00     0.00\n\t 85 G1 Young RemSet          0.00     0.00\n\t 67 sh                       0.00     0.00\n\t 44 cut                      0.00     0.00\n\t 37 RMI GC Daemon            0.00     0.00\n\t 29 cat                      0.00     0.00\n\t 19 sleep                    0.00     0.00\n\t 12 expr                     0.00     0.00\n\t 12 gcc                      0.00     0.00\n\t  9 gsettings                0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  8 which                    0.00     0.00\n\t  7 stat                     0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  6 ls                       0.00     0.00\n\t  6 tr                       0.00     0.00\n\t  6 uname                    0.00     0.00\n\t  5 bash                     0.00     0.00\n\t  4 gmain                    0.00     0.00\n\t  4 phoronix-test-s          0.00     0.00\n\t  4 sed                      0.00     0.00\n\t  4 sort                     0.00     0.00\n\t  3 free                     0.00     0.00\n\t  3 getopt                   0.00     0.00\n\t  3 head                     0.00     0.00\n\t  3 mkdir                    0.00     0.00\n\t  2 dconf worker             0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lscpu                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 ps                       0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n\t  1 xset                     0.00     0.00\n238 processes running\n248 maximum processes\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>A benchmark of the Apache Cassandra NoSQL database. There are three workloads. First writes, then read\/write in ratio 1:1 and then a ratio of 3:1. Looks like a variable set of runnable processes. Topdown profile is weighted towards frontend stalls. <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/cassandra\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-908","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/908","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=908"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/908\/revisions"}],"predecessor-version":[{"id":943,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/908\/revisions\/943"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=908"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}