{"id":1454,"date":"2024-02-03T23:23:13","date_gmt":"2024-02-03T23:23:13","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=1454"},"modified":"2024-02-07T19:22:18","modified_gmt":"2024-02-07T19:22:18","slug":"botan","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/botan\/","title":{"rendered":"botan"},"content":{"rendered":"\n<p>botan is a cryptography library. There are six workloads for different algorithms. Looks like they are all single-threaded<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-32.png\" alt=\"\" class=\"wp-image-1568\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-32.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-32-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-32-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows variations among the workloads with retirement rates in the 70s for two workloads and backend stalls limiting other workloads. Frontend stalls look to be low except for the second workload and brief time at start of each run.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-34.png\" alt=\"\" class=\"wp-image-1569\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-34.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-34-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-34-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics confirm this is single-threaded with almost no L2 access. Backend limitation is split between cpu and memory. There is a light amount of floating point<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              669.399\non_cpu               0.053          # 0.84 \/ 16 cores\nutime                562.839\nstime                0.988\nnvcsw                2105           # 44.72%\nnivcsw               2602           # 55.28%\ninblock              0              # 0.00\/sec\nonblock              16152          # 24.13\/sec\ncpu-clock            563924791974   # 563.925 seconds\ntask-clock           563932739166   # 563.933 seconds\npage faults          166765         # 295.718\/sec\ncontext switches     7838           # 13.899\/sec\ncpu migrations       367            # 0.651\/sec\nmajor page faults    2              # 0.004\/sec\nminor page faults    166763         # 295.714\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             139814477227   # 18.870 branches per 1000 inst\nbranch misses        203665490      # 0.15% branch miss\nconditional          97115995872    # 13.107 conditional branches per 1000 inst\nindirect             13663343420    # 1.844 indirect branches per 1000 inst\ncpu-cycles           2610654455476  # 0.24 GHz\ninstructions         7394582566327  # 2.83 IPC\nslots                5229705578412  #\nretiring             2826810653088  # 54.1% (54.1%) high\n-- ucode             8993481439     #     0.2%\n-- fastpath          2817817171649  #    53.9%\nfrontend             325117530676   #  6.2% ( 6.2%)\n-- latency           280235511756   #     5.4%\n-- bandwidth         44882018920    #     0.9%\nbackend              2072076508020  # 39.6% (39.6%)\n-- cpu               847158851484   #    16.2%\n-- memory            1224917656536  #    23.4%\nspeculation          5536343464     #  0.1% ( 0.1%) low\n-- branch mispredict 4117725817     #     0.1%\n-- pipeline restart  1418617647     #     0.0%\nsmt-contention       164221719      #  0.0% ( 0.0%)\ncpu-cycles           3878464228704  # 0.25 GHz\ninstructions         10719706960041 # 2.76 IPC\ninstructions         3578967705036  # 0.116 l2 access per 1000 inst\nl2 hit from l1       394836268      # 6.20% l2 miss\nl2 miss from l1      15707758       #\nl2 hit from l2 pf    11116615       #\nl3 hit from l2 pf    4545213        #\nl3 miss from l2 pf   5561337        #\ninstructions         3572012090278  # 98.084 float per 1000 inst\nfloat 512            86             # 0.000 AVX-512 per 1000 inst\nfloat 256            664            # 0.000 AVX-256 per 1000 inst\nfloat 128            350357518379   # 98.084 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\ninstructions         2688067        #\nopcache              984278         # 366.166 opcache per 1000 inst\nopcache miss         525590         # 53.4% opcache miss rate\nl1 dTLB miss         6562           # 2.441 L1 dTLB per 1000 inst\nl2 dTLB miss         1164           # 0.433 L2 dTLB per 1000 inst\ninstructions         2703815        #\nicache               1321789        # 488.861 icache per 1000 inst\nicache miss          112783         #  8.5% icache miss rate\nl1 iTLB miss         10             # 0.004 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            19             # 0.007 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics confirm memory access is minimal.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              660.876\non_cpu               0.053          # 0.84 \/ 16 cores\nutime                556.986\nstime                0.629\nnvcsw                2047           # 42.81%\nnivcsw               2735           # 57.19%\ninblock              10496          # 15.88\/sec\nonblock              4800           # 7.26\/sec\ncpu-clock            557684948455   # 557.685 seconds\ntask-clock           557691700152   # 557.692 seconds\npage faults          156021         # 279.762\/sec\ncontext switches     7880           # 14.130\/sec\ncpu migrations       392            # 0.703\/sec\nmajor page faults    52             # 0.093\/sec\nminor page faults    155969         # 279.669\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             122073618847   # 19.314 branches per 1000 inst\nbranch misses        175355007      # 0.14% branch miss\nconditional          122073632095   # 19.314 conditional branches per 1000 inst\nindirect             12056510177    # 1.908 indirect branches per 1000 inst\nslots                12677475659618 #\nretiring             6680431127271  # 52.7% (52.7%)\n-- ucode             450464298592   #     3.6%\n-- fastpath          6229966828679  #    49.1%\nfrontend             121467626633   #  1.0% ( 1.0%) low\n-- latency           37175507861    #     0.3%\n-- bandwidth         84292118772    #     0.7%\nbackend              5843301211811  # 46.1% (46.1%)\n-- cpu               5372679687169  #    42.4%\n-- memory            470621524642   #     3.7%\nspeculation          27230529716    #  0.2% ( 0.2%) low\n-- branch mispredict 27046269202    #     0.2%\n-- pipeline restart  184260514      #     0.0%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           2113556449501  # 0.20 GHz\ninstructions         6324122251604  # 2.99 IPC\nl2 access            340667126      # 0.054 l2 access per 1000 inst\nl2 miss              103730719      # 30.45% l2 miss\ncpu-cycles           2113192432479  #  4.9% memory latency\nload stalls          102748763011   #  4.8% l1 bound\nl1 miss              1018223889     #  0.0% l2 bound\nl2 miss              480434465      #  0.0% l3 bound\nl3 miss              314612452      #  0.0% dram bound\nstore_stalls         122368374      #  0.0% store bound\n<\/code><\/pre>\n\n\n\n<p>Process summary shows botan is the primary process<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>394 processes\n\t 36 botan                  561.84     0.00\n\t 68 clinfo                  20.17     3.25\n\t 38 vulkaninfo               0.76     1.52\n\t  6 glxinfo:gdrv0            0.15     0.04\n\t  6 glxinfo:gl0              0.15     0.04\n\t  6 php                      0.08     0.21\n\t  4 vulkani:disk$0           0.08     0.16\n\t  2 glxinfo                  0.07     0.02\n\t  2 glxinfo:cs0              0.07     0.02\n\t  2 glxinfo:disk$0           0.07     0.02\n\t  2 glxinfo:sh0              0.07     0.02\n\t  2 glxinfo:shlo0            0.07     0.02\n\t  6 clang                    0.04     0.08\n\t  2 llvmpipe-0               0.04     0.08\n\t  2 llvmpipe-1               0.04     0.08\n\t  2 llvmpipe-10              0.04     0.08\n\t  2 llvmpipe-11              0.04     0.08\n\t  2 llvmpipe-12              0.04     0.08\n\t  2 llvmpipe-13              0.04     0.08\n\t  2 llvmpipe-14              0.04     0.08\n\t  2 llvmpipe-15              0.04     0.08\n\t  2 llvmpipe-2               0.04     0.08\n\t  2 llvmpipe-3               0.04     0.08\n\t  2 llvmpipe-4               0.04     0.08\n\t  2 llvmpipe-5               0.04     0.08\n\t  2 llvmpipe-6               0.04     0.08\n\t  2 llvmpipe-7               0.04     0.08\n\t  2 llvmpipe-8               0.04     0.08\n\t  2 llvmpipe-9               0.04     0.08\n\t  3 rocminfo                 0.03     0.00\n\t  1 lspci                    0.00     0.02\n\t  1 ps                       0.00     0.01\n\t 92 sh                       0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t 12 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  3 gmain                    0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dconf worker             0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Computation structure is simple<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>  723566) botan            cpu=1 start=5.75  finish=35.85\n    723567) botan            cpu=6 start=5.75  finish=35.85\n  723569) botan            cpu=6 start=39.85 finish=69.95\n    723570) botan            cpu=7 start=39.86 finish=69.95\n  723571) botan            cpu=13 start=73.96 finish=104.06\n    723572) botan            cpu=14 start=73.96 finish=104.06\n  723573) sh               cpu=14 start=104.06 finish=104.06\n    723574) sh               cpu=7 start=104.06 finish=104.06<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>botan is a cryptography library. There are six workloads for different algorithms. Looks like they are all single-threaded Topdown profile shows variations among the workloads with retirement rates in the 70s for two workloads and backend stalls limiting other workloads. <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/botan\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-1454","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1454","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=1454"}],"version-history":[{"count":3,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1454\/revisions"}],"predecessor-version":[{"id":1570,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1454\/revisions\/1570"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=1454"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}