{"id":481,"date":"2024-01-13T12:12:59","date_gmt":"2024-01-13T12:12:59","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=481"},"modified":"2024-01-13T12:12:59","modified_gmt":"2024-01-13T12:12:59","slug":"openradioss","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/openradioss\/","title":{"rendered":"openradioss"},"content":{"rendered":"\n<p>openradioss is a finite element solver. It has high IPC and relatively high retirement rate. It seems to run on cores w\/o hyperthreading.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-7.png\" alt=\"\" class=\"wp-image-482\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-7.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-7-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-7-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Overall characteristic is a high retirement rate with some backend stalls and lower than average frontend stalls.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-45.png\" alt=\"\" class=\"wp-image-483\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-45.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-45-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-45-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show this is floating point code with low amounts of L2 access and predictable branches.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              531.485\non_cpu               0.464          # 7.42 \/ 16 cores\nutime                3795.661\nstime                147.925\nnvcsw                43472          # 79.88%\nnivcsw               10952          # 20.12%\ninblock              0              # 0.00\/sec\nonblock              731552         # 1376.43\/sec\ncpu-clock            3943618364987  # 3943.618 seconds\ntask-clock           3943649988277  # 3943.650 seconds\npage faults          914997         # 232.018\/sec\ncontext switches     56878          # 14.423\/sec\ncpu migrations       6942           # 1.760\/sec\nmajor page faults    355            # 0.090\/sec\nminor page faults    914642         # 231.928\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             5557698857321  # 121.437 branches per 1000 inst\nbranch misses        29875479009    # 0.54% branch miss\nconditional          4141825327173  # 90.500 conditional branches per 1000 inst\nindirect             268280512784   # 5.862 indirect branches per 1000 inst\ncpu-cycles           16471095334701 # 1.94 GHz\ninstructions         45771448504477 # 2.78 IPC\nslots                32949521879028 #\nretiring             15391256553201 # 46.7% (46.7%)\n-- ucode             9344807545     #     0.0%\n-- fastpath          15381911745656 #    46.7%\nfrontend             3828359236713  # 11.6% (11.6%)\n-- latency           2317004612910  #     7.0%\n-- bandwidth         1511354623803  #     4.6%\nbackend              13351083160137 # 40.5% (40.5%)\n-- cpu               4635525039147  #    14.1%\n-- memory            8715558120990  #    26.5%\nspeculation          362246006081   #  1.1% ( 1.1%)\n-- branch mispredict 336608096267   #     1.0%\n-- pipeline restart  25637909814    #     0.1%\nsmt-contention       16563806243    #  0.1% ( 0.0%)\ncpu-cycles           16546016719780 # 1.94 GHz\ninstructions         46064185230041 # 2.78 IPC\ninstructions         15360678440271 # 33.454 l2 access per 1000 inst\nl2 hit from l1       413590215669   # 13.24% l2 miss\nl2 miss from l1      36696882459    #\nl2 hit from l2 pf    68932385772    #\nl3 hit from l2 pf    23076349381    #\nl3 miss from l2 pf   8277129782     #\ninstructions         15349691378355 # 267.046 float per 1000 inst\nfloat 512            49             # 0.000 AVX-512 per 1000 inst\nfloat 256            434            # 0.000 AVX-256 per 1000 inst\nfloat 128            4099081048403  # 267.046 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics show this runs on all cores even efficiency cores.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              847.283\non_cpu               0.723          # 11.56 \/ 16 cores\nutime                9595.125\nstime                199.603\nnvcsw                60837          # 73.48%\nnivcsw               21957          # 26.52%\ninblock              18800          # 22.19\/sec\nonblock              750528         # 885.81\/sec\ncpu-clock            9794905769811  # 9794.906 seconds\ntask-clock           9794944473121  # 9794.944 seconds\npage faults          1004870        # 102.591\/sec\ncontext switches     86826          # 8.864\/sec\ncpu migrations       10636          # 1.086\/sec\nmajor page faults    834            # 0.085\/sec\nminor page faults    1004036        # 102.506\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             9104033983434  # 135.404 branches per 1000 inst\nbranch misses        26726595189    # 0.29% branch miss\nconditional          9104033999338  # 135.404 conditional branches per 1000 inst\nindirect             2480353523643  # 36.890 indirect branches per 1000 inst\nslots                58096258688948 #\nretiring             37262987672189 # 64.1% (64.1%)\n-- ucode             2394267712513  #     4.1%\n-- fastpath          34868719959676 #    60.0%\nfrontend             6405281429073  # 11.0% (11.0%)\n-- latency           2856116536906  #     4.9%\n-- bandwidth         3549164892167  #     6.1%\nbackend              12246441453518 # 21.1% (21.1%)\n-- cpu               6987019337033  #    12.0%\n-- memory            5259422116485  #     9.1%\nspeculation          2739914923365  #  4.7% ( 4.7%)\n-- branch mispredict 2224112949461  #     3.8%\n-- pipeline restart  515801973904   #     0.9%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           24281497297185 # 1.78 GHz\ninstructions         95579973489733 # 3.94 IPC\nl2 access            466273512227   # 12.072 l2 access per 1000 inst\nl2 miss              105967016868   # 22.73% l2 miss\n<\/code><\/pre>\n\n\n\n<p>Process summary shows the engine_linux64 as main process.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>444 processes\n\t 75 engine_linux64_      11763.26   451.87\n\t 67 clinfo                  18.52     5.25\n\t  3 starter_linux64          1.26     0.41\n\t 38 vulkaninfo               0.95     1.15\n\t 18 mpirun                   0.80     3.64\n\t  6 glxinfo:gdrv0            0.16     0.09\n\t  4 vulkani:disk$0           0.10     0.12\n\t  2 glxinfo                  0.08     0.03\n\t  2 glxinfo:cs0              0.08     0.03\n\t  2 glxinfo:disk$0           0.08     0.03\n\t  2 glxinfo:sh0              0.08     0.03\n\t  2 glxinfo:shlo0            0.08     0.03\n\t  6 clang                    0.07     0.04\n\t  6 php                      0.06     0.13\n\t  2 llvmpipe-0               0.05     0.06\n\t  2 llvmpipe-1               0.05     0.06\n\t  2 llvmpipe-10              0.05     0.06\n\t  2 llvmpipe-11              0.05     0.06\n\t  2 llvmpipe-12              0.05     0.06\n\t  2 llvmpipe-13              0.05     0.06\n\t  2 llvmpipe-14              0.05     0.06\n\t  2 llvmpipe-15              0.05     0.06\n\t  2 llvmpipe-2               0.05     0.06\n\t  2 llvmpipe-3               0.05     0.06\n\t  2 llvmpipe-4               0.05     0.06\n\t  2 llvmpipe-5               0.05     0.06\n\t  2 llvmpipe-6               0.05     0.06\n\t  2 llvmpipe-7               0.05     0.06\n\t  2 llvmpipe-8               0.05     0.06\n\t  2 llvmpipe-9               0.05     0.06\n\t  3 rocminfo                 0.03     0.00\n\t  1 lspci                    0.00     0.02\n\t  1 ps                       0.00     0.01\n\t 81 sh                       0.00     0.00\n\t 12 gcc                      0.00     0.00\n\t 10 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  6 rm                       0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  4 gmain                    0.00     0.00\n\t  3 openradioss              0.00     0.00\n\t  2 dconf worker             0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 cc                       0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>The program runs using mpirun on the cores.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      55998) openradioss      cpu=14 start=5.72  finish=177.15\n        55999) starter_linux64  cpu=10 start=5.73  finish=6.30 \n        56000) mpirun           cpu=4 start=6.31  finish=177.11\n          56003) mpirun           cpu=15 start=6.89  finish=177.11\n          56004) mpirun           cpu=2 start=6.89  finish=6.89 \n          56005) mpirun           cpu=15 start=6.92  finish=177.11\n          56006) mpirun           cpu=1 start=7.40  finish=177.11\n          56007) mpirun           cpu=12 start=7.40  finish=177.11\n          56008) engine_linux64_  cpu=13 start=7.44  finish=177.10\n            56013) engine_linux64_  cpu=6 start=7.46  finish=176.47\n            56019) engine_linux64_  cpu=12 start=7.47  finish=176.47\n            56032) engine_linux64_  cpu=6 start=7.77  finish=176.98\n          56009) engine_linux64_  cpu=1 start=7.44  finish=176.47\n            56015) engine_linux64_  cpu=15 start=7.46  finish=176.47\n            56021) engine_linux64_  cpu=15 start=7.47  finish=176.47\n          56010) engine_linux64_  cpu=4 start=7.45  finish=176.47\n            56017) engine_linux64_  cpu=11 start=7.47  finish=176.47\n            56022) engine_linux64_  cpu=8 start=7.48  finish=176.47\n          56011) engine_linux64_  cpu=2 start=7.45  finish=176.47\n            56020) engine_linux64_  cpu=15 start=7.47  finish=176.47\n            56024) engine_linux64_  cpu=11 start=7.48  finish=176.47\n          56012) engine_linux64_  cpu=9 start=7.46  finish=176.47\n            56023) engine_linux64_  cpu=6 start=7.48  finish=176.47\n            56027) engine_linux64_  cpu=7 start=7.49  finish=176.47\n          56014) engine_linux64_  cpu=0 start=7.46  finish=176.47\n            56025) engine_linux64_  cpu=12 start=7.49  finish=176.47\n            56028) engine_linux64_  cpu=6 start=7.50  finish=176.47\n          56016) engine_linux64_  cpu=3 start=7.47  finish=176.47\n            56026) engine_linux64_  cpu=14 start=7.49  finish=176.47\n            56030) engine_linux64_  cpu=9 start=7.50  finish=176.47\n          56018) engine_linux64_  cpu=10 start=7.47  finish=176.47\n            56029) engine_linux64_  cpu=14 start=7.50  finish=176.47\n            56031) engine_linux64_  cpu=13 start=7.51  finish=176.47\n        56037) rm               cpu=0 start=177.14 finish=177.15\n        56038) rm               cpu=9 start=177.15 finish=177.15\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>openradioss is a finite element solver. It has high IPC and relatively high retirement rate. It seems to run on cores w\/o hyperthreading. Overall characteristic is a high retirement rate with some backend stalls and lower than average frontend stalls. <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/openradioss\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-481","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/481","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=481"}],"version-history":[{"count":1,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/481\/revisions"}],"predecessor-version":[{"id":484,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/481\/revisions\/484"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=481"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}