{"id":916,"date":"2024-01-26T01:30:16","date_gmt":"2024-01-26T01:30:16","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=916"},"modified":"2024-01-27T16:24:41","modified_gmt":"2024-01-27T16:24:41","slug":"nginx","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/nginx\/","title":{"rendered":"nginx"},"content":{"rendered":"\n<p>Benchmarks of the lightweight Nginx HTTP web server. Run with seven different configurations in terms of requests per second. The largest of these crashes on Intel.  Depending on the workload, there is a steady stream of runnable processes and also a steady rate of interrupt processing.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-71.png\" alt=\"\" class=\"wp-image-980\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-71.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-71-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-71-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows a steady trend as clients increase. High frontend stalls and decreasing with lower backend stalls and increasing.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-109.png\" alt=\"\" class=\"wp-image-982\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-109.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-109-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-109-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics are now annotated with &#8220;high&#8221; and &#8220;low&#8221; markers for frontend stalls and speculation misses. Also surprising is the amount of floating point code for a web server?<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              1809.766\non_cpu               0.353          # 5.66 \/ 16 cores\nutime                5668.781\nstime                4566.007\nnvcsw                79619493       # 92.23%\nnivcsw               6704051        # 7.77%\ninblock              0              # 0.00\/sec\nonblock              23120          # 12.78\/sec\ncpu-clock            25121067020532 # 25121.067 seconds\ntask-clock           25136131034095 # 25136.131 seconds\npage faults          461475         # 18.359\/sec\ncontext switches     120417664      # 4790.620\/sec\ncpu migrations       13566460       # 539.719\/sec\nmajor page faults    53             # 0.002\/sec\nminor page faults    461422         # 18.357\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             8640419338678  # 98.384 branches per 1000 inst\nbranch misses        650998923797   # 7.53% branch miss\nconditional          4889857159745  # 55.679 conditional branches per 1000 inst\nindirect             219843594998   # 2.503 indirect branches per 1000 inst\ncpu-cycles           91450317493505 # 3.32 GHz\ninstructions         83227323814403 # 0.91 IPC\nslots                182463323060250 #\nretiring             35008217268307 # 19.2% (23.0%)\n-- ucode             1779794881062  #     1.0%\n-- fastpath          33228422387245 #    18.2%\nfrontend             78788810015191 # 43.2% (51.9%) high\n-- latency           59787818345946 #    32.8%\n-- bandwidth         19000991669245 #    10.4%\nbackend              37626343922638 # 20.6% (24.8%)\n-- cpu               11254685932202 #     6.2%\n-- memory            26371657990436 #    14.5%\nspeculation          489842005426   #  0.3% ( 0.3%) low\n-- branch mispredict 488684516442   #     0.3%\n-- pipeline restart  1157488984     #     0.0%\nsmt-contention       30542150588642 # 16.7% ( 0.0%)\ncpu-cycles           97023657124856 # 3.36 GHz\ninstructions         87906505657257 # 0.91 IPC\ninstructions         29238541691844 # 98.502 l2 access per 1000 inst\nl2 hit from l1       2171586154059  # 15.68% l2 miss\nl2 miss from l1      223036763348   #\nl2 hit from l2 pf    479881576660   #\nl3 hit from l2 pf    191834413982   #\nl3 miss from l2 pf   36744326249    #\ninstructions         29238116608759 # 675.701 float per 1000 inst\nfloat 512            89             # 0.000 AVX-512 per 1000 inst\nfloat 256            613            # 0.000 AVX-256 per 1000 inst\nfloat 128            19756213607600 # 675.701 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>\\Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              1733.443\non_cpu               0.340          # 5.43 \/ 16 cores\nutime                5028.702\nstime                4388.764\nnvcsw                95258431       # 91.83%\nnivcsw               8471196        # 8.17%\ninblock              747360         # 431.14\/sec\nonblock              11880          # 6.85\/sec\ncpu-clock            23833547986990 # 23833.548 seconds\ntask-clock           23845704074536 # 23845.704 seconds\npage faults          484089         # 20.301\/sec\ncontext switches     143167975      # 6003.932\/sec\ncpu migrations       22678228       # 951.040\/sec\nmajor page faults    4188           # 0.176\/sec\nminor page faults    479900         # 20.125\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             7316456010829  # 87.723 branches per 1000 inst\nbranch misses        31563258634    # 0.43% branch miss\nconditional          7316456043405  # 87.723 conditional branches per 1000 inst\nindirect             1729108598933  # 20.732 indirect branches per 1000 inst\nslots                99211728596570 #\nretiring             52293353050524 # 52.7% (52.7%)\n-- ucode             6304853099248  #     6.4%\n-- fastpath          45988499951276 #    46.4%\nfrontend             28850239834340 # 29.1% (29.1%)\n-- latency           14975518404750 #    15.1%\n-- bandwidth         13874721429590 #    14.0%\nbackend              16068350018892 # 16.2% (16.2%) low\n-- cpu               5117974323048  #     5.2%\n-- memory            10950375695844 #    11.0%\nspeculation          1873177440460  #  1.9% ( 1.9%)\n-- branch mispredict 1508068627405  #     1.5%\n-- pipeline restart  365108813055   #     0.4%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           69513696338102 # 2.52 GHz\ninstructions         100793724736328 # 1.45 IPC\nl2 access            3769634327220  # 75.954 l2 access per 1000 inst\nl2 miss              598194587854   # 15.87% l2 miss\n<\/code><\/pre>\n\n\n\n<p>Process tree has a steady stream of processes and fewer than I would expect with many clients. I wonder if these are somehow forked off to a different tree and this is only the core server remaining?<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>903 processes\n\t340 wrk                  96747.17 75918.65\n\t156 nginx                 6933.80  8023.21\n\t 68 clinfo                  19.83     6.66\n\t 38 vulkaninfo               1.14     1.52\n\t  6 glxinfo:gdrv0            0.17     0.05\n\t  6 glxinfo:gl0              0.17     0.05\n\t  6 php                      0.12     0.33\n\t  4 vulkani:disk$0           0.12     0.16\n\t  2 glxinfo                  0.07     0.03\n\t  2 glxinfo:cs0              0.07     0.03\n\t  2 glxinfo:disk$0           0.07     0.03\n\t  2 glxinfo:sh0              0.07     0.03\n\t  2 glxinfo:shlo0            0.07     0.03\n\t  2 llvmpipe-0               0.06     0.08\n\t  2 llvmpipe-1               0.06     0.08\n\t  2 llvmpipe-10              0.06     0.08\n\t  2 llvmpipe-11              0.06     0.08\n\t  2 llvmpipe-12              0.06     0.08\n\t  2 llvmpipe-13              0.06     0.08\n\t  2 llvmpipe-14              0.06     0.08\n\t  2 llvmpipe-15              0.06     0.08\n\t  2 llvmpipe-2               0.06     0.08\n\t  2 llvmpipe-3               0.06     0.08\n\t  2 llvmpipe-4               0.06     0.08\n\t  2 llvmpipe-5               0.06     0.08\n\t  2 llvmpipe-6               0.06     0.08\n\t  2 llvmpipe-7               0.06     0.08\n\t  2 llvmpipe-8               0.06     0.08\n\t  2 llvmpipe-9               0.06     0.08\n\t  6 clang                    0.06     0.06\n\t  3 rocminfo                 0.03     0.00\n\t  1 lspci                    0.01     0.02\n\t106 sh                       0.00     0.00\n\t 14 bash                     0.00     0.00\n\t 14 sleep                    0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t  9 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  7 rm                       0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  4 gmain                    0.00     0.00\n\t  3 dconf worker             0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 ps                       0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>An example of the structure<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      938801) sh               cpu=13 start=4.03  finish=9.04 \n        938802) bash             cpu=13 start=4.03  finish=9.04 \n          938803) nginx            cpu=7 start=4.03  finish=4.03 \n            938804) nginx            cpu=4 start=4.03  finish=19.08\n              938805) nginx            cpu=5 start=4.03  finish=19.08\n              938806) nginx            cpu=8 start=4.03  finish=19.08\n              938807) nginx            cpu=10 start=4.03  finish=19.08\n              938808) nginx            cpu=6 start=4.04  finish=19.08\n              938810) nginx            cpu=15 start=4.04  finish=19.08\n              938811) nginx            cpu=4 start=4.04  finish=19.08\n              938812) nginx            cpu=14 start=4.04  finish=19.08\n              938813) nginx            cpu=13 start=4.04  finish=19.08\n              938814) nginx            cpu=2 start=4.04  finish=19.08\n              938815) nginx            cpu=7 start=4.04  finish=19.08\n              938816) nginx            cpu=9 start=4.04  finish=19.08\n              938817) nginx            cpu=11 start=4.04  finish=19.08\n              938818) nginx            cpu=12 start=4.04  finish=19.08\n              938819) nginx            cpu=0 start=4.04  finish=19.08\n              938820) nginx            cpu=1 start=4.04  finish=19.08\n              938821) nginx            cpu=3 start=4.04  finish=19.08\n          938809) sleep            cpu=7 start=4.04  finish=9.04 \n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Benchmarks of the lightweight Nginx HTTP web server. Run with seven different configurations in terms of requests per second. The largest of these crashes on Intel. Depending on the workload, there is a steady stream of runnable processes and also <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/nginx\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-916","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/916","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=916"}],"version-history":[{"count":3,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/916\/revisions"}],"predecessor-version":[{"id":983,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/916\/revisions\/983"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=916"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}