{"id":439,"date":"2024-01-13T00:40:09","date_gmt":"2024-01-13T00:40:09","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=439"},"modified":"2024-01-13T03:30:34","modified_gmt":"2024-01-13T03:30:34","slug":"wireguard","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/wireguard\/","title":{"rendered":"wireguard"},"content":{"rendered":"\n<p>Wireguard is a test of network stack and hence not the best for CPU-based metrics. This creates separate network devices and sends traffic through them.  I&#8217;ve added things to &#8211;system to also check for network traffic. However, because these network devices are created after things start, I don&#8217;t record the traffic.  The profile below shows short bursts of CPU activity amidst bursts of IRQ processing activity and a variable number of processes started.<\/p>\n\n\n\n<p>Note this is the one test where my intel box seems to run faster than AMD box (151 seconds vs 193 seconds)<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-2.png\" alt=\"\" class=\"wp-image-457\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-2.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-2-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-2-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Overall graph of topdown metrics show the short bursts of CPU activity are mostly frontend latency related.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-40.png\" alt=\"\" class=\"wp-image-458\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-40.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-40-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-40-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show on average only a little over one core worth of CPU, despite the first graph showing the run queue can  be as high as 25 processes.  There are a lot of branches in this short amount of code.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              648.250\non_cpu               0.070          # 1.12 \/ 16 cores\nutime                49.347\nstime                675.785\nnvcsw                9918151        # 27.06%\nnivcsw               26735996       # 72.94%\ninblock              0              # 0.00\/sec\nonblock              13480          # 20.79\/sec\ncpu-clock            751000202974   # 751.000 seconds\ntask-clock           759804247090   # 759.804 seconds\npage faults          251649         # 331.202\/sec\ncontext switches     36657366       # 48245.803\/sec\ncpu migrations       3114963        # 4099.691\/sec\nmajor page faults    146            # 0.192\/sec\nminor page faults    251503         # 331.010\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             447260141555   # 204.464 branches per 1000 inst\nbranch misses        50398393703    # 11.27% branch miss\nconditional          226128351861   # 103.374 conditional branches per 1000 inst\nindirect             5434863427     # 2.485 indirect branches per 1000 inst\ncpu-cycles           2937207696633  # 0.29 GHz\ninstructions         2243131537504  # 0.76 IPC\nslots                5718144840738  #\nretiring             877099306526   # 15.3% (15.8%)\n-- ucode             5812595978     #     0.1%\n-- fastpath          871286710548   #    15.2%\nfrontend             3942628292756  # 68.9% (71.0%)\n-- latency           3333746448912  #    58.3%\n-- bandwidth         608881843844   #    10.6%\nbackend              704100712113   # 12.3% (12.7%)\n-- cpu               150382607062   #     2.6%\n-- memory            553718105051   #     9.7%\nspeculation          28504097599    #  0.5% ( 0.5%)\n-- branch mispredict 28392540859    #     0.5%\n-- pipeline restart  111556740      #     0.0%\nsmt-contention       164806460573   #  2.9% ( 0.0%)\ncpu-cycles           2965792327103  # 0.29 GHz\ninstructions         2257623287988  # 0.76 IPC\ninstructions         731520942056   # 108.288 l2 access per 1000 inst\nl2 hit from l1       68301882340    # 12.49% l2 miss\nl2 miss from l1      5649904572     #\nl2 hit from l2 pf    6671947470     #\nl3 hit from l2 pf    4075915184     #\nl3 miss from l2 pf   165354559      #\ninstructions         734388269635   # 16.385 float per 1000 inst\nfloat 512            263            # 0.000 AVX-512 per 1000 inst\nfloat 256            148351         # 0.000 AVX-256 per 1000 inst\nfloat 128            12032632291    # 16.385 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              475.534\non_cpu               0.065          # 1.05 \/ 16 cores\nutime                37.293\nstime                460.655\nnvcsw                8453989        # 26.08%\nnivcsw               23964755       # 73.92%\ninblock              424            # 0.89\/sec\nonblock              2008           # 4.22\/sec\ncpu-clock            511848591517   # 511.849 seconds\ntask-clock           516693238705   # 516.693 seconds\npage faults          235690         # 456.151\/sec\ncontext switches     32420944       # 62746.987\/sec\ncpu migrations       6844367        # 13246.481\/sec\nmajor page faults    150            # 0.290\/sec\nminor page faults    235540         # 455.860\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             303243666167   # 170.904 branches per 1000 inst\nbranch misses        969416905      # 0.32% branch miss\nconditional          303243689143   # 170.904 conditional branches per 1000 inst\nindirect             37894263124    # 21.357 indirect branches per 1000 inst\nslots                18241517781626 #\nretiring             5729375951369  # 31.4% (31.4%)\n-- ucode             1100533124254  #     6.0%\n-- fastpath          4628842827115  #    25.4%\nfrontend             5027917599006  # 27.6% (27.6%)\n-- latency           2850238760757  #    15.6%\n-- bandwidth         2177678838249  #    11.9%\nbackend              6724543857759  # 36.9% (36.9%)\n-- cpu               4076434284858  #    22.3%\n-- memory            2648109572901  #    14.5%\nspeculation          921400842474   #  5.1% ( 5.1%)\n-- branch mispredict 841198396395   #     4.6%\n-- pipeline restart  80202446079    #     0.4%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           1756463286214  # 0.20 GHz\ninstructions         2212636517840  # 1.26 IPC\nl2 access            153827325812   # 91.161 l2 access per 1000 inst\nl2 miss              32255204923    # 20.97% l2 miss<\/code><\/pre>\n\n\n\n<p>Process summary<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>693 processes\n\t 96 iperf3                  50.88   626.23\n\t 68 clinfo                  18.84     6.66\n\t 38 vulkaninfo               0.95     1.31\n\t  6 glxinfo:gdrv0            0.19     0.03\n\t  4 vulkani:disk$0           0.10     0.14\n\t  2 glxinfo                  0.10     0.01\n\t  2 glxinfo:cs0              0.10     0.01\n\t  2 glxinfo:disk$0           0.10     0.01\n\t  2 glxinfo:sh0              0.10     0.01\n\t  2 glxinfo:shlo0            0.10     0.01\n\t  6 php                      0.07     0.06\n\t  6 clang                    0.07     0.05\n\t  2 llvmpipe-0               0.05     0.07\n\t  2 llvmpipe-1               0.05     0.07\n\t  2 llvmpipe-10              0.05     0.07\n\t  2 llvmpipe-11              0.05     0.07\n\t  2 llvmpipe-12              0.05     0.07\n\t  2 llvmpipe-13              0.05     0.07\n\t  2 llvmpipe-14              0.05     0.07\n\t  2 llvmpipe-15              0.05     0.07\n\t  2 llvmpipe-2               0.05     0.07\n\t  2 llvmpipe-3               0.05     0.07\n\t  2 llvmpipe-4               0.05     0.07\n\t  2 llvmpipe-5               0.05     0.07\n\t  2 llvmpipe-6               0.05     0.07\n\t  2 llvmpipe-7               0.05     0.07\n\t  2 llvmpipe-8               0.05     0.07\n\t  2 llvmpipe-9               0.05     0.07\n\t 24 bash                     0.03     0.12\n\t  3 rocminfo                 0.03     0.00\n\t 48 ss                       0.00     0.48\n\t  1 lspci                    0.00     0.03\n\t  1 ps                       0.00     0.01\n\t100 ip                       0.00     0.00\n\t 81 sh                       0.00     0.00\n\t 30 wg                       0.00     0.00\n\t 24 ping                     0.00     0.00\n\t 24 ping6                    0.00     0.00\n\t 12 gcc                      0.00     0.00\n\t 11 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  4 gmain                    0.00     0.00\n\t  4 readlink                 0.00     0.00\n\t  3 mount                    0.00     0.00\n\t  3 wireguard                0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 cc                       0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dconf worker             0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>The process structure looks as follows:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      36321) wireguard        cpu=6 start=5.80  finish=204.72\n        36322) bash             cpu=6 start=5.80  finish=204.72\n          36323) readlink         cpu=2 start=5.80  finish=5.80 \n          36324) mount            cpu=3 start=5.81  finish=5.81 \n          36325) ip               cpu=2 start=5.81  finish=5.81 \n          36326) ip               cpu=13 start=5.81  finish=5.82 \n          36327) ip               cpu=15 start=5.82  finish=5.82 \n          36328) ip               cpu=2 start=5.82  finish=5.82 \n          36329) ip               cpu=13 start=5.82  finish=5.83 \n          36330) ip               cpu=15 start=5.83  finish=5.83 \n          36331) ip               cpu=3 start=5.83  finish=5.83 \n          36332) ip               cpu=2 start=5.83  finish=5.83 \n          36334) ip               cpu=15 start=5.83  finish=5.89 \n          36335) ip               cpu=3 start=5.89  finish=5.90 \n          36337) ip               cpu=4 start=5.90  finish=5.96 \n          36338) wg               cpu=3 start=5.96  finish=5.96 \n          36339) wg               cpu=3 start=5.96  finish=5.96 \n          36340) bash             cpu=15 start=5.96  finish=5.96 \n            36341) wg               cpu=4 start=5.96  finish=5.96 \n          36342) bash             cpu=3 start=5.96  finish=5.97 \n            36343) wg               cpu=8 start=5.97  finish=5.97 \n          36344) ip               cpu=4 start=5.97  finish=5.97 \n          36345) ip               cpu=3 start=5.97  finish=5.97 \n          36346) ip               cpu=4 start=5.97  finish=5.98 \n          36347) ip               cpu=8 start=5.98  finish=5.98 \n          36348) bash             cpu=15 start=5.98  finish=5.98 \n          36349) wg               cpu=11 start=5.98  finish=5.98 \n          36350) bash             cpu=15 start=5.99  finish=5.99 \n          36351) wg               cpu=8 start=5.99  finish=5.99 \n          36352) ip               cpu=15 start=5.99  finish=5.99 \n          36353) ip               cpu=11 start=5.99  finish=5.99 \n          36354) ip               cpu=15 start=6.00  finish=6.00 \n          36355) wg               cpu=4 start=6.00  finish=6.00 \n          36356) wg               cpu=15 start=6.00  finish=6.00 \n          36357) ping             cpu=4 start=6.00  finish=6.01 \n          36358) ping             cpu=9 start=6.01  finish=6.02 \n          36359) ping6            cpu=12 start=6.02  finish=6.02 \n          36360) ping6            cpu=11 start=6.02  finish=6.03 \n          36361) iperf3           cpu=1 start=6.03  finish=25.94\n          36362) ss               cpu=4 start=6.03  finish=6.05 \n          36363) iperf3           cpu=2 start=6.05  finish=25.94\n          36365) iperf3           cpu=15 start=25.95 finish=46.30\n          36366) ss               cpu=12 start=25.95 finish=25.96\n          36367) iperf3           cpu=12 start=25.96 finish=46.30\n          36369) iperf3           cpu=5 start=46.30 finish=68.88\n          36370) ss               cpu=12 start=46.30 finish=46.31\n          36371) iperf3           cpu=4 start=46.31 finish=68.88\n          36372) iperf3           cpu=2 start=68.89 finish=93.43\n          36373) ss               cpu=10 start=68.89 finish=68.90\n          36374) iperf3           cpu=0 start=68.90 finish=93.43\n          36375) ip               cpu=11 start=93.43 finish=93.44\n          36376) ip               cpu=10 start=93.44 finish=93.44\n          36377) ping             cpu=7 start=93.44 finish=93.44\n          36378) ping             cpu=9 start=93.44 finish=93.45\n          36379) ping6            cpu=6 start=93.45 finish=93.45\n          36380) ping6            cpu=6 start=93.45 finish=93.45\n          36381) iperf3           cpu=10 start=93.45 finish=96.05\n          36382) ss               cpu=11 start=93.45 finish=93.47\n          36383) iperf3           cpu=8 start=93.47 finish=96.05\n          36386) iperf3           cpu=7 start=96.05 finish=98.67\n          36387) ss               cpu=15 start=96.05 finish=96.07\n          36388) iperf3           cpu=8 start=96.07 finish=98.67\n          36389) iperf3           cpu=9 start=98.67 finish=102.12\n          36390) ss               cpu=7 start=98.67 finish=98.69\n          36391) iperf3           cpu=4 start=98.69 finish=102.12\n          36392) iperf3           cpu=15 start=102.12 finish=105.81\n          36393) ss               cpu=3 start=102.12 finish=102.14\n          36394) iperf3           cpu=12 start=102.14 finish=105.81\n          36395) ip               cpu=3 start=105.81 finish=105.81\n          36396) ip               cpu=3 start=105.82 finish=105.82\n          36397) wg               cpu=3 start=105.82 finish=105.82\n          36398) wg               cpu=3 start=105.82 finish=105.82\n          36399) ping             cpu=3 start=105.82 finish=105.83\n          36400) ping             cpu=12 start=105.83 finish=105.83\n          36401) ping6            cpu=6 start=105.83 finish=105.83\n          36402) ping6            cpu=12 start=105.83 finish=105.84\n          36403) iperf3           cpu=15 start=105.84 finish=125.84\n          36404) ss               cpu=10 start=105.84 finish=105.85\n          36405) iperf3           cpu=14 start=105.86 finish=125.84\n          36406) iperf3           cpu=5 start=125.84 finish=146.22\n          36407) ss               cpu=2 start=125.84 finish=125.86\n          36408) iperf3           cpu=0 start=125.86 finish=146.22\n          36410) iperf3           cpu=4 start=146.22 finish=168.64\n          36411) ss               cpu=1 start=146.23 finish=146.24\n          36412) iperf3           cpu=11 start=146.24 finish=168.64\n          36413) iperf3           cpu=8 start=168.64 finish=192.19\n          36414) ss               cpu=2 start=168.64 finish=168.66\n          36415) iperf3           cpu=2 start=168.66 finish=192.19\n          36416) ip               cpu=1 start=192.19 finish=192.19\n          36417) ip               cpu=1 start=192.19 finish=192.19\n          36418) ping             cpu=1 start=192.19 finish=192.20\n          36419) ping             cpu=7 start=192.20 finish=192.20\n          36420) ping6            cpu=1 start=192.20 finish=192.20\n          36421) ping6            cpu=6 start=192.21 finish=192.21\n          36422) iperf3           cpu=4 start=192.21 finish=194.74\n          36423) ss               cpu=1 start=192.21 finish=192.22\n          36424) iperf3           cpu=2 start=192.22 finish=194.74\n          36425) iperf3           cpu=3 start=194.74 finish=197.24\n          36426) ss               cpu=7 start=194.74 finish=194.75\n          36427) iperf3           cpu=12 start=194.75 finish=197.24\n          36428) iperf3           cpu=6 start=197.24 finish=200.68\n          36429) ss               cpu=10 start=197.24 finish=197.25\n          36430) iperf3           cpu=11 start=197.25 finish=200.68\n          36431) iperf3           cpu=3 start=200.69 finish=204.30\n          36432) ss               cpu=8 start=200.69 finish=200.70\n          36433) iperf3           cpu=12 start=200.70 finish=204.30\n          36434) ip               cpu=9 start=204.30 finish=204.30\n          36435) ip               cpu=6 start=204.30 finish=204.47\n          36436) ip               cpu=2 start=204.47 finish=204.70\n          36437) bash             cpu=7 start=204.70 finish=204.70\n            36438) ip               cpu=1 start=204.70 finish=204.70\n          36439) bash             cpu=3 start=204.70 finish=204.71\n            36440) ip               cpu=12 start=204.70 finish=204.70\n          36441) bash             cpu=13 start=204.71 finish=204.71\n            36442) ip               cpu=7 start=204.71 finish=204.71\n          36443) ip               cpu=1 start=204.71 finish=204.71\n          36444) ip               cpu=12 start=204.71 finish=204.71\n          36445) ip               cpu=1 start=204.71 finish=204.72\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Wireguard is a test of network stack and hence not the best for CPU-based metrics. This creates separate network devices and sends traffic through them. I&#8217;ve added things to &#8211;system to also check for network traffic. However, because these network <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/wireguard\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-439","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/439","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=439"}],"version-history":[{"count":4,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/439\/revisions"}],"predecessor-version":[{"id":479,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/439\/revisions\/479"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=439"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}