Benchmarks of the lightweight Nginx HTTP web server. Run with seven different configurations in terms of requests per second. The largest of these crashes on Intel. Depending on the workload, there is a steady stream of runnable processes and also a steady rate of interrupt processing.

Topdown profile shows a steady trend as clients increase. High frontend stalls and decreasing with lower backend stalls and increasing.

AMD metrics are now annotated with “high” and “low” markers for frontend stalls and speculation misses. Also surprising is the amount of floating point code for a web server?

elapsed              1809.766
on_cpu               0.353          # 5.66 / 16 cores
utime                5668.781
stime                4566.007
nvcsw                79619493       # 92.23%
nivcsw               6704051        # 7.77%
inblock              0              # 0.00/sec
onblock              23120          # 12.78/sec
cpu-clock            25121067020532 # 25121.067 seconds
task-clock           25136131034095 # 25136.131 seconds
page faults          461475         # 18.359/sec
context switches     120417664      # 4790.620/sec
cpu migrations       13566460       # 539.719/sec
major page faults    53             # 0.002/sec
minor page faults    461422         # 18.357/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             8640419338678  # 98.384 branches per 1000 inst
branch misses        650998923797   # 7.53% branch miss
conditional          4889857159745  # 55.679 conditional branches per 1000 inst
indirect             219843594998   # 2.503 indirect branches per 1000 inst
cpu-cycles           91450317493505 # 3.32 GHz
instructions         83227323814403 # 0.91 IPC
slots                182463323060250 #
retiring             35008217268307 # 19.2% (23.0%)
-- ucode             1779794881062  #     1.0%
-- fastpath          33228422387245 #    18.2%
frontend             78788810015191 # 43.2% (51.9%) high
-- latency           59787818345946 #    32.8%
-- bandwidth         19000991669245 #    10.4%
backend              37626343922638 # 20.6% (24.8%)
-- cpu               11254685932202 #     6.2%
-- memory            26371657990436 #    14.5%
speculation          489842005426   #  0.3% ( 0.3%) low
-- branch mispredict 488684516442   #     0.3%
-- pipeline restart  1157488984     #     0.0%
smt-contention       30542150588642 # 16.7% ( 0.0%)
cpu-cycles           97023657124856 # 3.36 GHz
instructions         87906505657257 # 0.91 IPC
instructions         29238541691844 # 98.502 l2 access per 1000 inst
l2 hit from l1       2171586154059  # 15.68% l2 miss
l2 miss from l1      223036763348   #
l2 hit from l2 pf    479881576660   #
l3 hit from l2 pf    191834413982   #
l3 miss from l2 pf   36744326249    #
instructions         29238116608759 # 675.701 float per 1000 inst
float 512            89             # 0.000 AVX-512 per 1000 inst
float 256            613            # 0.000 AVX-256 per 1000 inst
float 128            19756213607600 # 675.701 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         0              # 0.000 scalar per 1000 inst

\Intel metrics

elapsed              1733.443
on_cpu               0.340          # 5.43 / 16 cores
utime                5028.702
stime                4388.764
nvcsw                95258431       # 91.83%
nivcsw               8471196        # 8.17%
inblock              747360         # 431.14/sec
onblock              11880          # 6.85/sec
cpu-clock            23833547986990 # 23833.548 seconds
task-clock           23845704074536 # 23845.704 seconds
page faults          484089         # 20.301/sec
context switches     143167975      # 6003.932/sec
cpu migrations       22678228       # 951.040/sec
major page faults    4188           # 0.176/sec
minor page faults    479900         # 20.125/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             7316456010829  # 87.723 branches per 1000 inst
branch misses        31563258634    # 0.43% branch miss
conditional          7316456043405  # 87.723 conditional branches per 1000 inst
indirect             1729108598933  # 20.732 indirect branches per 1000 inst
slots                99211728596570 #
retiring             52293353050524 # 52.7% (52.7%)
-- ucode             6304853099248  #     6.4%
-- fastpath          45988499951276 #    46.4%
frontend             28850239834340 # 29.1% (29.1%)
-- latency           14975518404750 #    15.1%
-- bandwidth         13874721429590 #    14.0%
backend              16068350018892 # 16.2% (16.2%) low
-- cpu               5117974323048  #     5.2%
-- memory            10950375695844 #    11.0%
speculation          1873177440460  #  1.9% ( 1.9%)
-- branch mispredict 1508068627405  #     1.5%
-- pipeline restart  365108813055   #     0.4%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           69513696338102 # 2.52 GHz
instructions         100793724736328 # 1.45 IPC
l2 access            3769634327220  # 75.954 l2 access per 1000 inst
l2 miss              598194587854   # 15.87% l2 miss

Process tree has a steady stream of processes and fewer than I would expect with many clients. I wonder if these are somehow forked off to a different tree and this is only the core server remaining?

903 processes
	340 wrk                  96747.17 75918.65
	156 nginx                 6933.80  8023.21
	 68 clinfo                  19.83     6.66
	 38 vulkaninfo               1.14     1.52
	  6 glxinfo:gdrv0            0.17     0.05
	  6 glxinfo:gl0              0.17     0.05
	  6 php                      0.12     0.33
	  4 vulkani:disk$0           0.12     0.16
	  2 glxinfo                  0.07     0.03
	  2 glxinfo:cs0              0.07     0.03
	  2 glxinfo:disk$0           0.07     0.03
	  2 glxinfo:sh0              0.07     0.03
	  2 glxinfo:shlo0            0.07     0.03
	  2 llvmpipe-0               0.06     0.08
	  2 llvmpipe-1               0.06     0.08
	  2 llvmpipe-10              0.06     0.08
	  2 llvmpipe-11              0.06     0.08
	  2 llvmpipe-12              0.06     0.08
	  2 llvmpipe-13              0.06     0.08
	  2 llvmpipe-14              0.06     0.08
	  2 llvmpipe-15              0.06     0.08
	  2 llvmpipe-2               0.06     0.08
	  2 llvmpipe-3               0.06     0.08
	  2 llvmpipe-4               0.06     0.08
	  2 llvmpipe-5               0.06     0.08
	  2 llvmpipe-6               0.06     0.08
	  2 llvmpipe-7               0.06     0.08
	  2 llvmpipe-8               0.06     0.08
	  2 llvmpipe-9               0.06     0.08
	  6 clang                    0.06     0.06
	  3 rocminfo                 0.03     0.00
	  1 lspci                    0.01     0.02
	106 sh                       0.00     0.00
	 14 bash                     0.00     0.00
	 14 sleep                    0.00     0.00
	 13 gcc                      0.00     0.00
	  9 gsettings                0.00     0.00
	  8 stat                     0.00     0.00
	  8 systemd-detect-          0.00     0.00
	  7 rm                       0.00     0.00
	  6 llvm-link                0.00     0.00
	  5 phoronix-test-s          0.00     0.00
	  4 gmain                    0.00     0.00
	  3 dconf worker             0.00     0.00
	  2 cc                       0.00     0.00
	  2 lscpu                    0.00     0.00
	  2 uname                    0.00     0.00
	  2 which                    0.00     0.00
	  2 xset                     0.00     0.00
	  1 date                     0.00     0.00
	  1 dirname                  0.00     0.00
	  1 dmesg                    0.00     0.00
	  1 dmidecode                0.00     0.00
	  1 grep                     0.00     0.00
	  1 ifconfig                 0.00     0.00
	  1 ip                       0.00     0.00
	  1 lsmod                    0.00     0.00
	  1 mktemp                   0.00     0.00
	  1 ps                       0.00     0.00
	  1 qdbus                    0.00     0.00
	  1 readlink                 0.00     0.00
	  1 realpath                 0.00     0.00
	  1 sed                      0.00     0.00
	  1 sort                     0.00     0.00
	  1 stty                     0.00     0.00
	  1 systemctl                0.00     0.00
	  1 template.sh              0.00     0.00
	  1 wc                       0.00     0.00
	  1 xrandr                   0.00     0.00
0 processes running
47 maximum processes

An example of the structure

      938801) sh               cpu=13 start=4.03  finish=9.04 
        938802) bash             cpu=13 start=4.03  finish=9.04 
          938803) nginx            cpu=7 start=4.03  finish=4.03 
            938804) nginx            cpu=4 start=4.03  finish=19.08
              938805) nginx            cpu=5 start=4.03  finish=19.08
              938806) nginx            cpu=8 start=4.03  finish=19.08
              938807) nginx            cpu=10 start=4.03  finish=19.08
              938808) nginx            cpu=6 start=4.04  finish=19.08
              938810) nginx            cpu=15 start=4.04  finish=19.08
              938811) nginx            cpu=4 start=4.04  finish=19.08
              938812) nginx            cpu=14 start=4.04  finish=19.08
              938813) nginx            cpu=13 start=4.04  finish=19.08
              938814) nginx            cpu=2 start=4.04  finish=19.08
              938815) nginx            cpu=7 start=4.04  finish=19.08
              938816) nginx            cpu=9 start=4.04  finish=19.08
              938817) nginx            cpu=11 start=4.04  finish=19.08
              938818) nginx            cpu=12 start=4.04  finish=19.08
              938819) nginx            cpu=0 start=4.04  finish=19.08
              938820) nginx            cpu=1 start=4.04  finish=19.08
              938821) nginx            cpu=3 start=4.04  finish=19.08
          938809) sleep            cpu=7 start=4.04  finish=9.04