{"id":210,"date":"2024-01-05T01:03:42","date_gmt":"2024-01-05T01:03:42","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=210"},"modified":"2024-01-05T01:04:16","modified_gmt":"2024-01-05T01:04:16","slug":"john-the-ripper","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/john-the-ripper\/","title":{"rendered":"john-the-ripper"},"content":{"rendered":"\n<p>Performance characterization of the john-the-ripper password crackers. There are five workloads with slightly different profiles combined togther. The first (bcrypt) and third (blowfish) workloads have a very hire retire rate while the fourth (HMAC-SHA512) has a lower one and the second (WPA PSK) and fifth (MD5) are in between. So we have an aggregate of all these together. It also looks like playing with compiler options can make a larger difference (<a href=\"https:\/\/www.phoronix.com\/review\/intel-meteorlake-gcc-clang\">https:\/\/www.phoronix.com\/review\/intel-meteorlake-gcc-clang<\/a>).  Those tests seem to have picked just the right compiler options&#8230;<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-14.png\" alt=\"\" class=\"wp-image-214\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-14.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-14-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-14-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Here is the AMD composite profile.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              719.563\non_cpu               0.876          # 14.02 \/ 16 cores\nutime                10088.098\nstime                2.788\nnvcsw                5633           # 6.88%\nnivcsw               76278          # 93.12%\ninblock              17008          # 23.64\/sec\nonblock              6064           # 8.43\/sec\ncpu-clock            10091008298375 # 10091.008 seconds\ntask-clock           10091046378603 # 10091.046 seconds\npage faults          707665         # 70.128\/sec\ncontext switches     85296          # 8.453\/sec\ncpu migrations       230            # 0.023\/sec\nmajor page faults    89             # 0.009\/sec\nminor page faults    707576         # 70.119\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             1181761586204  # 18.861 branches per 1000 inst\nbranch misses        1953540532     # 0.17% branch miss\nconditional          875223584314   # 13.969 conditional branches per 1000 inst\nindirect             31768861224    # 0.507 indirect branches per 1000 inst\ncpu-cycles           42249053159235 # 3.67 GHz\ninstructions         62449360688413 # 1.48 IPC\nslots                84520599603168 #\nretiring             22972486141479 # 27.2% (42.9%)\n-- ucode             19908311589    #     0.0%\n-- fastpath          22952577829890 #    27.2%\nfrontend             649425361657   #  0.8% ( 1.2%)\n-- latency           131863651656   #     0.2%\n-- bandwidth         517561710001   #     0.6%\nbackend              29849704427132 # 35.3% (55.8%)\n-- cpu               22970067699975 #    27.2%\n-- memory            6879636727157  #     8.1%\nspeculation          44758977493    #  0.1% ( 0.1%)\n-- branch mispredict 27856028572    #     0.0%\n-- pipeline restart  16902948921    #     0.0%\nsmt-contention       31004181580785 # 36.7% ( 0.0%)\ncpu-cycles           42471271376087 # 3.69 GHz\ninstructions         62688427445805 # 1.48 IPC\ninstructions         20906863465846 # 5.588 l2 access per 1000 inst\nl2 hit from l1       98937781951    # 7.01% l2 miss\nl2 miss from l1      3949232822     #\nl2 hit from l2 pf    13648867835    #\nl3 hit from l2 pf    4231154370     #\nl3 miss from l2 pf   11181084       #\ninstructions         20900865024390 # 7.953 float per 1000 inst\nfloat 512            99             # 0.000 AVX-512 per 1000 inst\nfloat 256            1244           # 0.000 AVX-256 per 1000 inst\nfloat 128            166214135569   # 7.953 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst<\/code><\/pre>\n\n\n\n<p>Here is the Intel composite profile. One thing that stands out is a high amount of branch misprediction. What also stands out is somehow only was run on two cores?  So overall a somewhat squirrely  test that could use a deeper dive.  It also looks like the sources detect presence of particular ISAs particularly those cryptographic ones.  You can also see from the phoronix run that particular cryptographic libraries are linked in as compiler options.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              323.082\non_cpu               0.131          # 2.10 \/ 16 cores\nutime                531.315\nstime                146.991\nnvcsw                10503543       # 99.85%\nnivcsw               15479          # 0.15%\ninblock              8              # 0.02\/sec\nonblock              2931888        # 9074.76\/sec\ncpu-clock            670147573846   # 670.148 seconds\ntask-clock           671777403316   # 671.777 seconds\npage faults          50351069       # 74952.013\/sec\ncontext switches     10519619       # 15659.382\/sec\ncpu migrations       30361          # 45.195\/sec\nmajor page faults    1              # 0.001\/sec\nminor page faults    50351068       # 74952.012\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             342678860344   # 135.988 branches per 1000 inst\nbranch misses        3385141914     # 0.99% branch miss\nconditional          342679096184   # 135.988 conditional branches per 1000 inst\nindirect             49044340836    # 19.463 indirect branches per 1000 inst\nslots                6042017136368  #\nretiring             2057791788255  # 34.1% (34.1%)\n-- ucode             206973543335   #     3.4%\n-- fastpath          1850818244920  #    30.6%\nfrontend             869596524785   # 14.4% (14.4%)\n-- latency           441086010050   #     7.3%\n-- bandwidth         428510514735   #     7.1%\nbackend              2531687347647  # 41.9% (41.9%)\n-- cpu               625305575545   #    10.3%\n-- memory            1906381772102  #    31.6%\nspeculation          609068467938   # 10.1% (10.1%)\n-- branch mispredict 488700891743   #     8.1%\n-- pipeline restart  120367576195   #     2.0%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           3264854609930  # 0.61 GHz\ninstructions         4152760303612  # 1.27 IPC\nl2 access            47561932109    # 20.313 l2 access per 1000 inst\nl2 miss              20055987975    # 42.17% l2 miss\n<\/code><\/pre>\n\n\n\n<p>Drilling into the speculation amounts, I see occasional bursts of much higher misses. Also apparent in this graph is more time spent in particular benchmarks.  I believe in some cases extra runs to make things converge but then this also adds to the totals where there are different workloads&#8230;So this one as a whole would benefit by breaking into separate cases rather than running them together.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-15.png\" alt=\"\" class=\"wp-image-215\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-15.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-15-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-15-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>Performance characterization of the john-the-ripper password crackers. There are five workloads with slightly different profiles combined togther. The first (bcrypt) and third (blowfish) workloads have a very hire retire rate while the fourth (HMAC-SHA512) has a lower one and the <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/john-the-ripper\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-210","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/210","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=210"}],"version-history":[{"count":3,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/210\/revisions"}],"predecessor-version":[{"id":218,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/210\/revisions\/218"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=210"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}