{"id":704,"date":"2024-01-20T00:33:43","date_gmt":"2024-01-20T00:33:43","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=704"},"modified":"2024-01-20T00:33:44","modified_gmt":"2024-01-20T00:33:44","slug":"specfem3d","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/specfem3d\/","title":{"rendered":"specfem3d"},"content":{"rendered":"\n<p>An acoustic modeling program with five workloads. Mostly keeps the CPU busy and runs with all cores.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-38.png\" alt=\"\" class=\"wp-image-705\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-38.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-38-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-38-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown metrics. Largest share is backend stalls and not many frontend stalls to make a moderate retirement rate.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-76.png\" alt=\"\" class=\"wp-image-706\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-76.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-76-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-76-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show floating point code with not many L2 accesses and a low number of branches.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              3762.238\non_cpu               0.868          # 13.89 \/ 16 cores\nutime                52056.538\nstime                185.652\nnvcsw                1302099        # 75.38%\nnivcsw               425392         # 24.62%\ninblock              2760           # 0.73\/sec\nonblock              54832224       # 14574.36\/sec\ncpu-clock            52248285770710 # 52248.286 seconds\ntask-clock           52249743951891 # 52249.744 seconds\npage faults          12748154       # 243.985\/sec\ncontext switches     1745081        # 33.399\/sec\ncpu migrations       26810          # 0.513\/sec\nmajor page faults    4149           # 0.079\/sec\nminor page faults    12744005       # 243.906\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             23504693240647 # 68.840 branches per 1000 inst\nbranch misses        13097079745    # 0.06% branch miss\nconditional          17987708794657 # 52.682 conditional branches per 1000 inst\nindirect             655636174293   # 1.920 indirect branches per 1000 inst\ncpu-cycles           210656764084175 # 3.46 GHz\ninstructions         342832809611451 # 1.63 IPC\nslots                421307158364352 #\nretiring             118910637523092 # 28.2% (39.0%)\n-- ucode             620443396647   #     0.1%\n-- fastpath          118290194126445 #    28.1%\nfrontend             19634413076632 #  4.7% ( 6.4%)\n-- latency           7202348283648  #     1.7%\n-- bandwidth         12432064792984 #     3.0%\nbackend              165406079832537 # 39.3% (54.2%)\n-- cpu               50571970115083 #    12.0%\n-- memory            114834109717454 #    27.3%\nspeculation          1024838848568  #  0.2% ( 0.3%)\n-- branch mispredict 254966966749   #     0.1%\n-- pipeline restart  769871881819   #     0.2%\nsmt-contention       116330770486487 # 27.6% ( 0.0%)\ncpu-cycles           176597097461140 # 3.47 GHz\ninstructions         285399042943448 # 1.62 IPC\ninstructions         95129255715027 # 31.569 l2 access per 1000 inst\nl2 hit from l1       1993274139610  # 10.04% l2 miss\nl2 miss from l1      53313585716    #\nl2 hit from l2 pf    761553213145   #\nl3 hit from l2 pf    23601262979    #\nl3 miss from l2 pf   224661217208   #\ninstructions         95092350669095 # 288.511 float per 1000 inst\nfloat 512            378            # 0.000 AVX-512 per 1000 inst\nfloat 256            1468           # 0.000 AVX-256 per 1000 inst\nfloat 128            27435143678913 # 288.511 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         82189          # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              2703.437\non_cpu               0.715          # 11.43 \/ 16 cores\nutime                30811.972\nstime                98.515\nnvcsw                479608         # 86.30%\nnivcsw               76110          # 13.70%\ninblock              5472           # 2.02\/sec\nonblock              23388688       # 8651.47\/sec\ncpu-clock            30911106052904 # 30911.106 seconds\ntask-clock           30911215860555 # 30911.216 seconds\npage faults          7715342        # 249.597\/sec\ncontext switches     568289         # 18.385\/sec\ncpu migrations       63222          # 2.045\/sec\nmajor page faults    2773           # 0.090\/sec\nminor page faults    7712569        # 249.507\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             17379440069501 # 80.716 branches per 1000 inst\nbranch misses        46914036018    # 0.27% branch miss\nconditional          17379440133789 # 80.716 conditional branches per 1000 inst\nindirect             3085933427959  # 14.332 indirect branches per 1000 inst\nslots                192018763461350 #\nretiring             109512682772057 # 57.0% (57.0%)\n-- ucode             8868767675421  #     4.6%\n-- fastpath          100643915096636 #    52.4%\nfrontend             8584569166548  #  4.5% ( 4.5%)\n-- latency           3782079896939  #     2.0%\n-- bandwidth         4802489269609  #     2.5%\nbackend              68661134260813 # 35.8% (35.8%)\n-- cpu               37059672708408 #    19.3%\n-- memory            31601461552405 #    16.5%\nspeculation          5165879176000  #  2.7% ( 2.7%)\n-- branch mispredict 3677065606214  #     1.9%\n-- pipeline restart  1488813569786  #     0.8%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           83175153659116 # 1.92 GHz\ninstructions         284864692479299 # 3.42 IPC\nl2 access            759190674835   # 6.941 l2 access per 1000 inst\nl2 miss              298370738806   # 39.30% l2 miss\n<\/code><\/pre>\n\n\n\n<p>Process overview<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>1805 processes\n\t480 xspecfem3D           107646.66   215.23\n\t360 xgenerate_datab        495.93    68.34\n\t 68 clinfo                  16.53     6.33\n\t180 mpirun                   7.70    20.65\n\t 15 xdecompose_mesh          7.14     0.41\n\t 38 vulkaninfo               0.95     1.34\n\t  3 awk                      0.23     0.02\n\t  6 php                      0.20     0.34\n\t  6 glxinfo:gdrv0            0.13     0.06\n\t  4 vulkani:disk$0           0.11     0.14\n\t  2 glxinfo                  0.07     0.03\n\t  2 glxinfo:cs0              0.07     0.02\n\t  2 glxinfo:disk$0           0.07     0.02\n\t  2 glxinfo:sh0              0.07     0.02\n\t  2 glxinfo:shlo0            0.07     0.02\n\t  6 clang                    0.06     0.06\n\t  2 llvmpipe-0               0.05     0.07\n\t  2 llvmpipe-1               0.05     0.07\n\t  2 llvmpipe-10              0.05     0.07\n\t  2 llvmpipe-11              0.05     0.07\n\t  2 llvmpipe-12              0.05     0.07\n\t  2 llvmpipe-13              0.05     0.07\n\t  2 llvmpipe-14              0.05     0.07\n\t  2 llvmpipe-15              0.05     0.07\n\t  2 llvmpipe-2               0.05     0.07\n\t  2 llvmpipe-3               0.05     0.07\n\t  2 llvmpipe-4               0.05     0.07\n\t  2 llvmpipe-5               0.05     0.07\n\t  2 llvmpipe-6               0.05     0.07\n\t  2 llvmpipe-7               0.05     0.07\n\t  2 llvmpipe-8               0.05     0.07\n\t  2 llvmpipe-9               0.05     0.07\n\t 63 run_this_exampl          0.04     0.02\n\t  3 rocminfo                 0.03     0.00\n\t 45 rm                       0.00     2.01\n\t  1 lspci                    0.00     0.02\n\t 90 sh                       0.00     0.00\n\t 51 mkdir                    0.00     0.00\n\t 49 grep                     0.00     0.00\n\t 45 cp                       0.00     0.00\n\t 45 ln                       0.00     0.00\n\t 33 cut                      0.00     0.00\n\t 31 date                     0.00     0.00\n\t 16 sed                      0.00     0.00\n\t 15 cat                      0.00     0.00\n\t 15 gsettings                0.00     0.00\n\t 15 specfem3d                0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 create_tomograp          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  6 mv                       0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 gmain                    0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 ps                       0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n48 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Computation structure<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      2601293) specfem3d        cpu=5 start=5.84  finish=77.48\n        2601294) rm               cpu=2 start=5.84  finish=5.84 \n        2601295) sed              cpu=6 start=5.84  finish=5.85 \n        2601296) run_this_exampl  cpu=15 start=5.85  finish=77.48\n          2601297) date             cpu=1 start=5.85  finish=5.85 \n          2601298) run_this_exampl  cpu=2 start=5.85  finish=5.85 \n          2601299) mkdir            cpu=3 start=5.85  finish=5.85 \n          2601300) rm               cpu=6 start=5.85  finish=5.92 \n          2601301) mkdir            cpu=3 start=5.92  finish=5.92 \n          2601302) rm               cpu=9 start=5.92  finish=5.92 \n          2601303) ln               cpu=2 start=5.92  finish=5.92 \n          2601304) ln               cpu=9 start=5.92  finish=5.93 \n          2601305) ln               cpu=2 start=5.93  finish=5.93 \n          2601306) cp               cpu=3 start=5.93  finish=5.93 \n          2601307) cp               cpu=9 start=5.93  finish=5.93 \n          2601308) cp               cpu=2 start=5.93  finish=5.93 \n          2601309) run_this_exampl  cpu=3 start=5.93  finish=5.94 \n            2601310) grep             cpu=14 start=5.94  finish=5.94 \n            2601311) grep             cpu=4 start=5.94  finish=5.94 \n            2601312) cut              cpu=0 start=5.94  finish=5.94 \n          2601313) run_this_exampl  cpu=2 start=5.94  finish=5.94 \n            2601314) grep             cpu=9 start=5.94  finish=5.94 \n            2601315) cut              cpu=5 start=5.94  finish=5.94 \n          2601316) mkdir            cpu=3 start=5.94  finish=5.94 \n          2601317) xdecompose_mesh  cpu=0 start=5.95  finish=6.39 \n          2601318) mpirun           cpu=8 start=6.39  finish=8.25 \n            2601322) mpirun           cpu=0 start=6.97  finish=8.25 \n            2601323) mpirun           cpu=11 start=6.97  finish=6.97 \n            2601324) mpirun           cpu=4 start=6.99  finish=8.24 \n            2601325) mpirun           cpu=13 start=7.48  finish=8.24 \n            2601326) mpirun           cpu=1 start=7.48  finish=8.24 \n            2601327) xgenerate_datab  cpu=12 start=7.49  finish=8.23 \n              2601329) xgenerate_datab  cpu=13 start=7.49  finish=8.23 \n              2601331) xgenerate_datab  cpu=7 start=7.50  finish=8.23 \n            2601328) xgenerate_datab  cpu=4 start=7.49  finish=8.23 \n              2601332) xgenerate_datab  cpu=8 start=7.50  finish=8.23 \n              2601335) xgenerate_datab  cpu=14 start=7.50  finish=8.23 \n            2601330) xgenerate_datab  cpu=11 start=7.50  finish=8.23 \n              2601334) xgenerate_datab  cpu=1 start=7.50  finish=8.23 \n              2601338) xgenerate_datab  cpu=5 start=7.51  finish=8.23 \n            2601333) xgenerate_datab  cpu=6 start=7.50  finish=8.23 \n              2601337) xgenerate_datab  cpu=3 start=7.51  finish=8.23 \n              2601340) xgenerate_datab  cpu=15 start=7.51  finish=8.23 \n            2601336) xgenerate_datab  cpu=0 start=7.51  finish=8.23 \n              2601341) xgenerate_datab  cpu=13 start=7.51  finish=8.23 \n              2601344) xgenerate_datab  cpu=12 start=7.52  finish=8.23 \n            2601339) xgenerate_datab  cpu=13 start=7.51  finish=8.23 \n              2601343) xgenerate_datab  cpu=15 start=7.52  finish=8.23 \n              2601347) xgenerate_datab  cpu=3 start=7.53  finish=8.23 \n            2601342) xgenerate_datab  cpu=1 start=7.52  finish=8.23 \n              2601346) xgenerate_datab  cpu=2 start=7.52  finish=8.23 \n              2601349) xgenerate_datab  cpu=2 start=7.53  finish=8.23 \n            2601345) xgenerate_datab  cpu=7 start=7.52  finish=8.23 \n              2601348) xgenerate_datab  cpu=4 start=7.53  finish=8.23 \n              2601350) xgenerate_datab  cpu=8 start=7.54  finish=8.23 \n          2601351) mpirun           cpu=7 start=8.28  finish=77.45\n            2601356) mpirun           cpu=11 start=8.84  finish=77.45\n            2601357) mpirun           cpu=13 start=8.84  finish=8.84 \n            2601358) mpirun           cpu=2 start=8.86  finish=77.44\n            2601360) mpirun           cpu=0 start=9.36  finish=77.44\n            2601361) mpirun           cpu=9 start=9.36  finish=77.45\n            2601362) xspecfem3D       cpu=10 start=9.37  finish=77.44\n              2601364) xspecfem3D       cpu=2 start=9.38  finish=77.44\n              2601367) xspecfem3D       cpu=2 start=9.38  finish=77.44\n              2601389) xspecfem3D       cpu=3 start=9.66  finish=77.44\n            2601363) xspecfem3D       cpu=6 start=9.38  finish=77.44\n              2601366) xspecfem3D       cpu=3 start=9.38  finish=77.44\n              2601370) xspecfem3D       cpu=15 start=9.39  finish=77.44\n              2601388) xspecfem3D       cpu=0 start=9.66  finish=77.44\n            2601365) xspecfem3D       cpu=13 start=9.38  finish=77.44\n              2601369) xspecfem3D       cpu=15 start=9.39  finish=77.44\n              2601373) xspecfem3D       cpu=9 start=9.39  finish=77.44\n              2601386) xspecfem3D       cpu=15 start=9.66  finish=77.44\n            2601368) xspecfem3D       cpu=8 start=9.39  finish=77.44\n              2601372) xspecfem3D       cpu=9 start=9.39  finish=77.44\n              2601376) xspecfem3D       cpu=3 start=9.40  finish=77.44\n              2601392) xspecfem3D       cpu=5 start=9.67  finish=77.44\n            2601371) xspecfem3D       cpu=4 start=9.39  finish=77.44\n              2601375) xspecfem3D       cpu=14 start=9.40  finish=77.44\n              2601380) xspecfem3D       cpu=4 start=9.40  finish=77.44\n              2601391) xspecfem3D       cpu=3 start=9.66  finish=77.44\n            2601374) xspecfem3D       cpu=11 start=9.39  finish=77.44\n              2601378) xspecfem3D       cpu=2 start=9.40  finish=77.44\n              2601382) xspecfem3D       cpu=8 start=9.41  finish=77.44\n              2601393) xspecfem3D       cpu=0 start=9.67  finish=77.44\n            2601377) xspecfem3D       cpu=0 start=9.40  finish=77.44\n              2601381) xspecfem3D       cpu=13 start=9.41  finish=77.44\n              2601384) xspecfem3D       cpu=5 start=9.41  finish=77.43\n              2601390) xspecfem3D       cpu=1 start=9.66  finish=77.44\n            2601379) xspecfem3D       cpu=1 start=9.40  finish=77.44\n              2601383) xspecfem3D       cpu=5 start=9.41  finish=77.44\n              2601385) xspecfem3D       cpu=6 start=9.42  finish=77.44\n              2601387) xspecfem3D       cpu=9 start=9.66  finish=77.44\n          2601396) date             cpu=6 start=77.47 finish=77.47\n        2601397) cat              cpu=1 start=77.48 finish=77.48\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>An acoustic modeling program with five workloads. Mostly keeps the CPU busy and runs with all cores. Topdown metrics. Largest share is backend stalls and not many frontend stalls to make a moderate retirement rate. AMD metrics show floating point <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/specfem3d\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-704","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/704","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=704"}],"version-history":[{"count":1,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/704\/revisions"}],"predecessor-version":[{"id":707,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/704\/revisions\/707"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=704"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}