{"id":626,"date":"2024-01-16T16:22:18","date_gmt":"2024-01-16T16:22:18","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=626"},"modified":"2024-01-16T16:22:19","modified_gmt":"2024-01-16T16:22:19","slug":"apache","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/apache\/","title":{"rendered":"apache"},"content":{"rendered":"\n<p>Apache benchmark has six different workloads with increasing numbers of concurrent requests also shown with more concurrent processes. The overall time spent in interrupts is higher than other workloads, otherwise CPU cores are kept busy.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-26.png\" alt=\"\" class=\"wp-image-627\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-26.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-26-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-26-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown metrics show a lot of time in frontend stalls. A constant and low retirement rate.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-63.png\" alt=\"\" class=\"wp-image-628\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-63.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-63-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-63-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD topdown metrics with on cpu about 4 cores worth and a lot of context switches. Not much floating point.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              1509.346\non_cpu               0.220          # 3.52 \/ 16 cores\nutime                573.840\nstime                4744.594\nnvcsw                44951008       # 45.34%\nnivcsw               54184990       # 54.66%\ninblock              0              # 0.00\/sec\nonblock              29792          # 19.74\/sec\ncpu-clock            19676541323775 # 19676.541 seconds\ntask-clock           19719318674895 # 19719.319 seconds\npage faults          284461013      # 14425.499\/sec\ncontext switches     454358925      # 23041.310\/sec\ncpu migrations       115592313      # 5861.882\/sec\nmajor page faults    942757         # 47.809\/sec\nminor page faults    283518256      # 14377.690\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             6758295849529  # 211.749 branches per 1000 inst\nbranch misses        799095641827   # 11.82% branch miss\nconditional          3637789008753  # 113.978 conditional branches per 1000 inst\nindirect             147995495873   # 4.637 indirect branches per 1000 inst\ncpu-cycles           77915946609317 # 3.23 GHz\ninstructions         32290962771129 # 0.41 IPC\nslots                154254506163852 #\nretiring             12893918613334 #  8.4% ( 9.0%)\n-- ucode             83973017436    #     0.1%\n-- fastpath          12809945595898 #     8.3%\nfrontend             100257449117427 # 65.0% (70.3%)\n-- latency           82212142241388 #    53.3%\n-- bandwidth         18045306876039 #    11.7%\nbackend              28359866429194 # 18.4% (19.9%)\n-- cpu               4582663552209  #     3.0%\n-- memory            23777202876985 #    15.4%\nspeculation          1132261153110  #  0.7% ( 0.8%)\n-- branch mispredict 1128091469818  #     0.7%\n-- pipeline restart  4169683292     #     0.0%\nsmt-contention       11597513599477 #  7.5% ( 0.0%)\ncpu-cycles           77816961127337 # 3.22 GHz\ninstructions         32506652562355 # 0.42 IPC\ninstructions         10709521517503 # 169.776 l2 access per 1000 inst\nl2 hit from l1       1489725697344  # 30.94% l2 miss\nl2 miss from l1      361201826450   #\nl2 hit from l2 pf    127168999107   #\nl3 hit from l2 pf    148824282710   #\nl3 miss from l2 pf   52496129210    #\ninstructions         10712155474180 # 11.051 float per 1000 inst\nfloat 512            99             # 0.000 AVX-512 per 1000 inst\nfloat 256            540            # 0.000 AVX-256 per 1000 inst\nfloat 128            118385183778   # 11.051 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              1791.168\non_cpu               0.224          # 3.59 \/ 16 cores\nutime                678.312\nstime                5748.159\nnvcsw                88888704       # 59.64%\nnivcsw               60148521       # 40.36%\ninblock              96             # 0.05\/sec\nonblock              18616          # 10.39\/sec\ncpu-clock            23302759563125 # 23302.760 seconds\ntask-clock           23352428926479 # 23352.429 seconds\npage faults          348216383      # 14911.356\/sec\ncontext switches     557678073      # 23880.945\/sec\ncpu migrations       199654149      # 8549.610\/sec\nmajor page faults    1369530        # 58.646\/sec\nminor page faults    346846853      # 14852.710\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             6831132246467  # 182.969 branches per 1000 inst\nbranch misses        75286038756    # 1.10% branch miss\nconditional          6831132375619  # 182.969 conditional branches per 1000 inst\nindirect             1731845496944  # 46.387 indirect branches per 1000 inst\nslots                94143096190466 #\nretiring             20164455696645 # 21.4% (21.4%)\n-- ucode             3374369293277  #     3.6%\n-- fastpath          16790086403368 #    17.8%\nfrontend             39152021412138 # 41.6% (41.6%)\n-- latency           27001499422697 #    28.7%\n-- bandwidth         12150521989441 #    12.9%\nbackend              31775642691837 # 33.8% (33.8%)\n-- cpu               8124393282042  #     8.6%\n-- memory            23651249409795 #    25.1%\nspeculation          3892549696503  #  4.1% ( 4.1%)\n-- branch mispredict 3486065436745  #     3.7%\n-- pipeline restart  406484259758   #     0.4%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           62935596128790 # 2.61 GHz\ninstructions         37441908212762 # 0.59 IPC\nl2 access            2517537664684  # 137.746 l2 access per 1000 inst\nl2 miss              848413614451   # 33.70% l2 miss\n<\/code><\/pre>\n\n\n\n<p>Process overview shows we have (and reuse) httpd processes and wrk processes.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>3286 processes\n\t2594 httpd                83589.41 302478.62\n\t258 wrk                  10120.92 81770.17\n\t 68 clinfo                  20.51     5.98\n\t 38 vulkaninfo               1.33     1.33\n\t  6 glxinfo:gdrv0            0.17     0.07\n\t  4 vulkani:disk$0           0.14     0.14\n\t  6 php                      0.10     0.28\n\t  2 glxinfo                  0.09     0.03\n\t  2 glxinfo:cs0              0.08     0.03\n\t  2 glxinfo:disk$0           0.08     0.03\n\t  2 glxinfo:sh0              0.08     0.03\n\t  2 glxinfo:shlo0            0.08     0.03\n\t  2 llvmpipe-0               0.07     0.07\n\t  2 llvmpipe-1               0.07     0.07\n\t  2 llvmpipe-10              0.07     0.07\n\t  2 llvmpipe-11              0.07     0.07\n\t  2 llvmpipe-12              0.07     0.07\n\t  2 llvmpipe-13              0.07     0.07\n\t  2 llvmpipe-14              0.07     0.07\n\t  2 llvmpipe-15              0.07     0.07\n\t  2 llvmpipe-2               0.07     0.07\n\t  2 llvmpipe-3               0.07     0.07\n\t  2 llvmpipe-4               0.07     0.07\n\t  2 llvmpipe-5               0.07     0.07\n\t  2 llvmpipe-6               0.07     0.07\n\t  2 llvmpipe-7               0.07     0.07\n\t  2 llvmpipe-8               0.07     0.07\n\t  2 llvmpipe-9               0.07     0.07\n\t  6 clang                    0.05     0.07\n\t  3 rocminfo                 0.03     0.00\n\t  1 lspci                    0.01     0.02\n\t102 sh                       0.00     0.00\n\t 24 apachectl                0.00     0.00\n\t 18 apache                   0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t 12 bash                     0.00     0.00\n\t 12 sleep                    0.00     0.00\n\t  9 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  6 rm                       0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  4 gmain                    0.00     0.00\n\t  3 dconf worker             0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 ps                       0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n461 maximum processes\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Apache benchmark has six different workloads with increasing numbers of concurrent requests also shown with more concurrent processes. The overall time spent in interrupts is higher than other workloads, otherwise CPU cores are kept busy. Topdown metrics show a lot <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/apache\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-626","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/626","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=626"}],"version-history":[{"count":1,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/626\/revisions"}],"predecessor-version":[{"id":629,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/626\/revisions\/629"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=626"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}