{"id":910,"date":"2024-01-26T01:27:10","date_gmt":"2024-01-26T01:27:10","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=910"},"modified":"2024-01-27T03:23:36","modified_gmt":"2024-01-27T03:23:36","slug":"duckdb","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/duckdb\/","title":{"rendered":"duckdb"},"content":{"rendered":"\n<p>An in-progress SQL OLAP database. There are two workloads, though the first one seems very abbreviated on my AMD run and not on the Intel run. The systemtime below also suggests this quick abort didn&#8217;t happen. Instead the first run looks single-threaded and the second runs a mix of single-threaded and once per core.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-68.png\" alt=\"\" class=\"wp-image-963\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-68.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-68-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-68-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown metrics with the timescales not matching the previous graph, also suggests this one could have quit the first workload early and mostly looking at the second workload where the retirement reaches highest point but also a blurring with frontend and backend stalls.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-106.png\" alt=\"\" class=\"wp-image-964\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-106.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-106-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-106-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics also show a shorter runtime so expect mostly the second workload. This has ~1\/5 instructions are branches and less than 1\/4 of the cores kept busy. A relatively low floating point.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              547.152\non_cpu               0.229          # 3.67 \/ 16 cores\nutime                1829.973\nstime                175.594\nnvcsw                724734         # 95.49%\nnivcsw               34194          # 4.51%\ninblock              104            # 0.19\/sec\nonblock              66306048       # 121184.06\/sec\ncpu-clock            2006098320016  # 2006.098 seconds\ntask-clock           2006459295662  # 2006.459 seconds\npage faults          43860238       # 21859.520\/sec\ncontext switches     761385         # 379.467\/sec\ncpu migrations       11771          # 5.867\/sec\nmajor page faults    2              # 0.001\/sec\nminor page faults    43860236       # 21859.519\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             2989521993930  # 209.081 branches per 1000 inst\nbranch misses        44666476992    # 1.49% branch miss\nconditional          1855246304543  # 129.752 conditional branches per 1000 inst\nindirect             336448903110   # 23.531 indirect branches per 1000 inst\ncpu-cycles           8205492318167  # 0.99 GHz\ninstructions         14167935669400 # 1.73 IPC\nslots                16457577130686 #\nretiring             4848133219219  # 29.5% (36.1%)\n-- ucode             12640610594    #     0.1%\n-- fastpath          4835492608625  #    29.4%\nfrontend             4321707575747  # 26.3% (32.2%)\n-- latency           2363442872946  #    14.4%\n-- bandwidth         1958264702801  #    11.9%\nbackend              3675392035681  # 22.3% (27.4%)\n-- cpu               521630866054   #     3.2%\n-- memory            3153761169627  #    19.2%\nspeculation          580918774688   #  3.5% ( 4.3%)\n-- branch mispredict 574059476463   #     3.5%\n-- pipeline restart  6859298225     #     0.0%\nsmt-contention       3031336112693  # 18.4% ( 0.0%)\ncpu-cycles           8220556664269  # 0.99 GHz\ninstructions         14166689794245 # 1.72 IPC\ninstructions         4726827381322  # 11.842 l2 access per 1000 inst\nl2 hit from l1       36650616240    # 28.10% l2 miss\nl2 miss from l1      5214331727     #\nl2 hit from l2 pf    8812526969     #\nl3 hit from l2 pf    3719577488     #\nl3 miss from l2 pf   6793561232     #\ninstructions         4722831156232  # 20.931 float per 1000 inst\nfloat 512            72             # 0.000 AVX-512 per 1000 inst\nfloat 256            642            # 0.000 AVX-256 per 1000 inst\nfloat 128            98853835720    # 20.931 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              1826.836\non_cpu               0.282          # 4.51 \/ 16 cores\nutime                7881.944\nstime                364.632\nnvcsw                2932996        # 94.39%\nnivcsw               174235         # 5.61%\ninblock              276128         # 151.15\/sec\nonblock              70939448       # 38831.86\/sec\ncpu-clock            8239226921074  # 8239.227 seconds\ntask-clock           8240397739741  # 8240.398 seconds\npage faults          116908038      # 14187.184\/sec\ncontext switches     3116153        # 378.156\/sec\ncpu migrations       121550         # 14.751\/sec\nmajor page faults    1481           # 0.180\/sec\nminor page faults    116906557      # 14187.004\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             8685405904523  # 190.950 branches per 1000 inst\nbranch misses        129488118910   # 1.49% branch miss\nconditional          8685406471659  # 190.950 conditional branches per 1000 inst\nindirect             2611251130674  # 57.409 indirect branches per 1000 inst\nslots                56189409146582 #\nretiring             27383968130937 # 48.7% (48.7%)\n-- ucode             2029604665216  #     3.6%\n-- fastpath          25354363465721 #    45.1%\nfrontend             10622806556285 # 18.9% (18.9%)\n-- latency           4873119794808  #     8.7%\n-- bandwidth         5749686761477  #    10.2%\nbackend              11019320981325 # 19.6% (19.6%)\n-- cpu               4663030301016  #     8.3%\n-- memory            6356290680309  #    11.3%\nspeculation          7380984669753  # 13.1% (13.1%)\n-- branch mispredict 7196351812556  #    12.8%\n-- pipeline restart  184632857197   #     0.3%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           27125366791737 # 1.41 GHz\ninstructions         50101367608730 # 1.85 IPC\nl2 access            634116960124   # 22.630 l2 access per 1000 inst\nl2 miss              173861651765   # 27.42% l2 miss\n<\/code><\/pre>\n\n\n\n<p>Process overview shows many benchmark runner processes spending almost all the time.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>15898 processes\n\t15540 benchmark_runne      301174.39 33694.99\n\t 68 clinfo                  16.21     6.98\n\t 38 vulkaninfo               0.57     1.71\n\t  6 glxinfo:gdrv0            0.16     0.03\n\t  6 glxinfo:gl0              0.16     0.03\n\t  2 glxinfo                  0.08     0.01\n\t  2 glxinfo:cs0              0.08     0.01\n\t  2 glxinfo:disk$0           0.08     0.01\n\t  2 glxinfo:sh0              0.08     0.01\n\t  2 glxinfo:shlo0            0.08     0.01\n\t  6 clang                    0.07     0.05\n\t  4 vulkani:disk$0           0.06     0.18\n\t  6 php                      0.04     0.16\n\t  2 llvmpipe-0               0.03     0.09\n\t  2 llvmpipe-1               0.03     0.09\n\t  2 llvmpipe-10              0.03     0.09\n\t  2 llvmpipe-11              0.03     0.09\n\t  2 llvmpipe-12              0.03     0.09\n\t  2 llvmpipe-13              0.03     0.09\n\t  2 llvmpipe-14              0.03     0.09\n\t  2 llvmpipe-15              0.03     0.09\n\t  2 llvmpipe-2               0.03     0.09\n\t  2 llvmpipe-3               0.03     0.09\n\t  2 llvmpipe-4               0.03     0.09\n\t  2 llvmpipe-5               0.03     0.09\n\t  2 llvmpipe-6               0.03     0.09\n\t  2 llvmpipe-7               0.03     0.09\n\t  2 llvmpipe-8               0.03     0.09\n\t  2 llvmpipe-9               0.03     0.09\n\t  3 rocminfo                 0.03     0.00\n\t  1 lspci                    0.00     0.02\n\t  1 ps                       0.00     0.01\n\t 84 sh                       0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t  9 gsettings                0.00     0.00\n\t  8 duckdb                   0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 gmain                    0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 dconf worker             0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>An in-progress SQL OLAP database. There are two workloads, though the first one seems very abbreviated on my AMD run and not on the Intel run. The systemtime below also suggests this quick abort didn&#8217;t happen. Instead the first run <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/duckdb\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-910","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/910","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=910"}],"version-history":[{"count":3,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/910\/revisions"}],"predecessor-version":[{"id":965,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/910\/revisions\/965"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=910"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}