{"id":912,"date":"2024-01-26T01:28:09","date_gmt":"2024-01-26T01:28:09","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=912"},"modified":"2024-01-27T03:29:49","modified_gmt":"2024-01-27T03:29:49","slug":"easywave","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/easywave\/","title":{"rendered":"easywave"},"content":{"rendered":"\n<p>Software to simulate tsunami generation and propagation in context of early warning systems. There are three different sizes taking progressively longer time. In the chart below most of the time is spent on the third and very little on the first. This runs mostly on all cores.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-69.png\" alt=\"\" class=\"wp-image-966\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-69.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-69-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-69-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows almost all time spent in backend stalls with a low retirement rate and low frontend stalls.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-107.png\" alt=\"\" class=\"wp-image-968\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-107.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-107-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-107-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics confirm running on all cores, floating point application and 78% of the time in backend memory stalls. There is also a moderate branches.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              1922.175\non_cpu               0.909          # 14.55 \/ 16 cores\nutime                27915.570\nstime                46.870\nnvcsw                39963          # 12.49%\nnivcsw               279964         # 87.51%\ninblock              0              # 0.00\/sec\nonblock              69134864       # 35967.00\/sec\ncpu-clock            27976304339567 # 27976.304 seconds\ntask-clock           27977334574561 # 27977.335 seconds\npage faults          696190         # 24.884\/sec\ncontext switches     329321         # 11.771\/sec\ncpu migrations       2685           # 0.096\/sec\nmajor page faults    2              # 0.000\/sec\nminor page faults    696188         # 24.884\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             6576286403261  # 197.858 branches per 1000 inst\nbranch misses        12811375603    # 0.19% branch miss\nconditional          6436339187318  # 193.648 conditional branches per 1000 inst\nindirect             45656474482    # 1.374 indirect branches per 1000 inst\ncpu-cycles           128164885936695 # 4.13 GHz\ninstructions         33244978034460 # 0.26 IPC\nslots                256292148576894 #\nretiring             11974794179054 #  4.7% ( 5.2%)\n-- ucode             55136358437    #     0.0%\n-- fastpath          11919657820617 #     4.7%\nfrontend             6620260644584  #  2.6% ( 2.9%)\n-- latency           2623166581578  #     1.0%\n-- bandwidth         3997094063006  #     1.6%\nbackend              212819428524789 # 83.0% (91.9%)\n-- cpu               13141066975582 #     5.1%\n-- memory            199678361549207 #    77.9%\nspeculation          220483707629   #  0.1% ( 0.1%)\n-- branch mispredict 213211749567   #     0.1%\n-- pipeline restart  7271958062     #     0.0%\nsmt-contention       24657088493168 #  9.6% ( 0.0%)\ncpu-cycles           128093523958535 # 4.11 GHz\ninstructions         33242235307082 # 0.26 IPC\ninstructions         11083055466426 # 83.023 l2 access per 1000 inst\nl2 hit from l1       525246389780   # 16.29% l2 miss\nl2 miss from l1      15185401164    #\nl2 hit from l2 pf    260187242099   #\nl3 hit from l2 pf    5910338961     #\nl3 miss from l2 pf   128808134580   #\ninstructions         11078878763592 # 357.411 float per 1000 inst\nfloat 512            88             # 0.000 AVX-512 per 1000 inst\nfloat 256            466            # 0.000 AVX-256 per 1000 inst\nfloat 128            3959710616508  # 357.411 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              4510.805\non_cpu               0.894          # 14.31 \/ 16 cores\nutime                64492.705\nstime                48.516\nnvcsw                64519          # 10.62%\nnivcsw               542930         # 89.38%\ninblock              16             # 0.00\/sec\nonblock              108247000      # 23997.27\/sec\ncpu-clock            64550853397335 # 64550.853 seconds\ntask-clock           64551668228515 # 64551.668 seconds\npage faults          1051104        # 16.283\/sec\ncontext switches     629765         # 9.756\/sec\ncpu migrations       25633          # 0.397\/sec\nmajor page faults    0              # 0.000\/sec\nminor page faults    1051104        # 16.283\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             10206746066589 # 196.935 branches per 1000 inst\nbranch misses        14336969272    # 0.14% branch miss\nconditional          10206746087517 # 196.935 conditional branches per 1000 inst\nindirect             1479536149642  # 28.547 indirect branches per 1000 inst\nslots                179247222475460 #\nretiring             21364128798497 # 11.9% (11.9%)\n-- ucode             3236812364270  #     1.8%\n-- fastpath          18127316434227 #    10.1%\nfrontend             9382535179469  #  5.2% ( 5.2%)\n-- latency           7107848453687  #     4.0%\n-- bandwidth         2274686725782  #     1.3%\nbackend              148857931511379 # 83.0% (83.0%)\n-- cpu               50435694393639 #    28.1%\n-- memory            98422237117740 #    54.9%\nspeculation          1685655017579  #  0.9% ( 0.9%)\n-- branch mispredict 1663186096045  #     0.9%\n-- pipeline restart  22468921534    #     0.0%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           106965986680387 # 2.32 GHz\ninstructions         32118256495506 # 0.30 IPC\nl2 access            980416843402   # 55.491 l2 access per 1000 inst\nl2 miss              448501745047   # 45.75% l2 miss\n<\/code><\/pre>\n\n\n\n<p>Process overview shows most time in the easywave processes.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>523 processes\n\t153 easywave             447987.04   640.32\n\t 68 clinfo                  17.44     6.27\n\t 38 vulkaninfo               1.14     1.14\n\t  6 php                      0.32     1.20\n\t  4 vulkani:disk$0           0.12     0.12\n\t  6 glxinfo:gdrv0            0.12     0.05\n\t  6 glxinfo:gl0              0.12     0.05\n\t  2 llvmpipe-0               0.06     0.06\n\t  2 llvmpipe-1               0.06     0.06\n\t  2 llvmpipe-10              0.06     0.06\n\t  2 llvmpipe-11              0.06     0.06\n\t  2 llvmpipe-12              0.06     0.06\n\t  2 llvmpipe-13              0.06     0.06\n\t  2 llvmpipe-14              0.06     0.06\n\t  2 llvmpipe-15              0.06     0.06\n\t  2 llvmpipe-2               0.06     0.06\n\t  2 llvmpipe-3               0.06     0.06\n\t  2 llvmpipe-4               0.06     0.06\n\t  2 llvmpipe-5               0.06     0.06\n\t  2 llvmpipe-6               0.06     0.06\n\t  2 llvmpipe-7               0.06     0.06\n\t  2 llvmpipe-8               0.06     0.06\n\t  2 llvmpipe-9               0.06     0.06\n\t  2 glxinfo                  0.06     0.02\n\t  2 glxinfo:cs0              0.06     0.02\n\t  2 glxinfo:disk$0           0.06     0.02\n\t  2 glxinfo:sh0              0.06     0.02\n\t  2 glxinfo:shlo0            0.06     0.02\n\t  6 clang                    0.03     0.09\n\t 18 rm                       0.00     4.59\n\t  3 rocminfo                 0.00     0.03\n\t  1 lspci                    0.00     0.02\n\t  1 ps                       0.00     0.01\n\t 86 sh                       0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t 11 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  4 gmain                    0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dconf worker             0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Process structure is simple with one process on each core.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      931467) easywave         cpu=3 start=5.58  finish=15.07\n        931468) easywave         cpu=0 start=5.58  finish=15.04\n          931469) easywave         cpu=12 start=6.48  finish=15.04\n          931470) easywave         cpu=14 start=6.48  finish=15.04\n          931471) easywave         cpu=7 start=6.48  finish=15.04\n          931472) easywave         cpu=8 start=6.48  finish=15.04\n          931473) easywave         cpu=9 start=6.48  finish=15.04\n          931474) easywave         cpu=2 start=6.48  finish=15.04\n          931475) easywave         cpu=11 start=6.48  finish=15.04\n          931476) easywave         cpu=4 start=6.48  finish=15.04\n          931477) easywave         cpu=13 start=6.48  finish=15.04\n          931478) easywave         cpu=6 start=6.48  finish=15.04\n          931479) easywave         cpu=15 start=6.48  finish=15.04\n          931480) easywave         cpu=1 start=6.48  finish=15.04\n          931481) easywave         cpu=10 start=6.48  finish=15.04\n          931482) easywave         cpu=3 start=6.48  finish=15.04\n          931483) easywave         cpu=5 start=6.48  finish=15.04\n        931484) rm               cpu=0 start=15.04 finish=15.07\n        931485) rm               cpu=6 start=15.07 finish=15.07\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Software to simulate tsunami generation and propagation in context of early warning systems. There are three different sizes taking progressively longer time. In the chart below most of the time is spent on the third and very little on the <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/easywave\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-912","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/912","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=912"}],"version-history":[{"count":3,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/912\/revisions"}],"predecessor-version":[{"id":969,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/912\/revisions\/969"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=912"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}