{"id":569,"date":"2024-01-14T22:34:59","date_gmt":"2024-01-14T22:34:59","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=569"},"modified":"2024-01-14T22:35:01","modified_gmt":"2024-01-14T22:35:01","slug":"mrbayes","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/mrbayes\/","title":{"rendered":"mrbayes"},"content":{"rendered":"\n<p>A test of Bayesian analysis with very high  IPC and retirement rate. Also a case where my AMD chip is more than 2x faster than my Intel chip. Overall looks like half the cores are used.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-18.png\" alt=\"\" class=\"wp-image-570\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-18.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-18-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-18-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown metrics highlight a high retirement rate. The backend stalls are more because of CPU than memory.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-56.png\" alt=\"\" class=\"wp-image-571\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-56.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-56-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-56-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show floating point code, and low L2 access. I expect this is a code that mostly runs inside the smaller caches.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              269.927\non_cpu               0.463          # 7.41 \/ 16 cores\nutime                1977.824\nstime                21.480\nnvcsw                85907          # 93.51%\nnivcsw               5958           # 6.49%\ninblock              0              # 0.00\/sec\nonblock              304448         # 1127.89\/sec\ncpu-clock            1999166570466  # 1999.167 seconds\ntask-clock           1999209718504  # 1999.210 seconds\npage faults          204079         # 102.080\/sec\ncontext switches     93020          # 46.528\/sec\ncpu migrations       3757           # 1.879\/sec\nmajor page faults    21             # 0.011\/sec\nminor page faults    204058         # 102.069\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             3162931731939  # 118.634 branches per 1000 inst\nbranch misses        25621198898    # 0.81% branch miss\nconditional          1856981848725  # 69.651 conditional branches per 1000 inst\nindirect             417142083066   # 15.646 indirect branches per 1000 inst\ncpu-cycles           8074389261324  # 1.87 GHz\ninstructions         26661505226380 # 3.30 IPC\nslots                16157107428102 #\nretiring             9088398115318  # 56.3% (56.3%)\n-- ucode             1179133273     #     0.0%\n-- fastpath          9087218982045  #    56.2%\nfrontend             3084602572199  # 19.1% (19.1%)\n-- latency           1923637763226  #    11.9%\n-- bandwidth         1160964808973  #     7.2%\nbackend              3449463140862  # 21.3% (21.4%)\n-- cpu               2951291320114  #    18.3%\n-- memory            498171820748   #     3.1%\nspeculation          512392460847   #  3.2% ( 3.2%)\n-- branch mispredict 506025917807   #     3.1%\n-- pipeline restart  6366543040     #     0.0%\nsmt-contention       22220399884    #  0.1% ( 0.0%)\ncpu-cycles           8098416904667  # 1.86 GHz\ninstructions         26659435250290 # 3.29 IPC\ninstructions         8890128447153  # 18.809 l2 access per 1000 inst\nl2 hit from l1       144952426458   # 1.71% l2 miss\nl2 miss from l1      1287365557     #\nl2 hit from l2 pf    20689080276    #\nl3 hit from l2 pf    1567049370     #\nl3 miss from l2 pf   7639525        #\ninstructions         8886841340745  # 201.198 float per 1000 inst\nfloat 512            66             # 0.000 AVX-512 per 1000 inst\nfloat 256            2078           # 0.000 AVX-256 per 1000 inst\nfloat 128            1788012564286  # 201.198 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              588.389\non_cpu               0.571          # 9.14 \/ 16 cores\nutime                5356.216\nstime                19.068\nnvcsw                125957         # 90.00%\nnivcsw               14002          # 10.00%\ninblock              6480           # 11.01\/sec\nonblock              439592         # 747.11\/sec\ncpu-clock            5375028742520  # 5375.029 seconds\ntask-clock           5375084266375  # 5375.084 seconds\npage faults          188727         # 35.111\/sec\ncontext switches     142691         # 26.547\/sec\ncpu migrations       4349           # 0.809\/sec\nmajor page faults    87             # 0.016\/sec\nminor page faults    188640         # 35.095\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             4764011541426  # 118.508 branches per 1000 inst\nbranch misses        56769803842    # 1.19% branch miss\nconditional          4764011554770  # 118.508 conditional branches per 1000 inst\nindirect             1535012956749  # 38.184 indirect branches per 1000 inst\nslots                43147684016834 #\nretiring             25314831849978 # 58.7% (58.7%)\n-- ucode             1933281821437  #     4.5%\n-- fastpath          23381550028541 #    54.2%\nfrontend             5572332160739  # 12.9% (12.9%)\n-- latency           1951907585256  #     4.5%\n-- bandwidth         3620424575483  #     8.4%\nbackend              8926440911379  # 20.7% (20.7%)\n-- cpu               8234468738720  #    19.1%\n-- memory            691972172659   #     1.6%\nspeculation          3396035279367  #  7.9% ( 7.9%)\n-- branch mispredict 3310807856172  #     7.7%\n-- pipeline restart  85227423195    #     0.2%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           17676562915344 # 1.89 GHz\ninstructions         61132090382453 # 3.46 IPC\nl2 access            346458995583   # 13.864 l2 access per 1000 inst\nl2 miss              2051152576     # 0.59% l2 miss\n<\/code><\/pre>\n\n\n\n<p>Process summary<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>387 processes\n\t 24 mb                    1971.54    19.39\n\t 68 clinfo                  16.38     5.99\n\t 18 mpiexec                  1.73     6.87\n\t 38 vulkaninfo               1.31     1.13\n\t  6 php                      0.15     0.31\n\t  6 glxinfo:gdrv0            0.14     0.10\n\t  4 vulkani:disk$0           0.13     0.12\n\t  2 llvmpipe-0               0.07     0.06\n\t  2 llvmpipe-1               0.07     0.06\n\t  2 llvmpipe-10              0.07     0.06\n\t  2 llvmpipe-11              0.07     0.06\n\t  2 llvmpipe-12              0.07     0.06\n\t  2 llvmpipe-13              0.07     0.06\n\t  2 llvmpipe-14              0.07     0.06\n\t  2 llvmpipe-15              0.07     0.06\n\t  2 llvmpipe-2               0.07     0.06\n\t  2 llvmpipe-3               0.07     0.06\n\t  2 llvmpipe-4               0.07     0.06\n\t  2 llvmpipe-5               0.07     0.06\n\t  2 llvmpipe-6               0.07     0.06\n\t  2 llvmpipe-7               0.07     0.06\n\t  2 llvmpipe-8               0.07     0.06\n\t  2 llvmpipe-9               0.07     0.06\n\t  6 clang                    0.06     0.06\n\t  2 glxinfo                  0.06     0.04\n\t  2 glxinfo:cs0              0.06     0.04\n\t  2 glxinfo:disk$0           0.06     0.04\n\t  2 glxinfo:sh0              0.06     0.04\n\t  2 glxinfo:shlo0            0.06     0.04\n\t  3 rocminfo                 0.03     0.00\n\t  1 lspci                    0.00     0.02\n\t 82 sh                       0.00     0.00\n\t 14 gsettings                0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  3 mrbayes                  0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dconf worker             0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 gmain                    0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 ps                       0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>The program runs via MPI<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      259388) mrbayes          cpu=2 start=5.74  finish=89.86\n        259389) mpiexec          cpu=0 start=5.74  finish=89.83\n          259393) mpiexec          cpu=12 start=6.32  finish=89.83\n          259394) mpiexec          cpu=14 start=6.32  finish=6.32 \n          259395) mpiexec          cpu=11 start=6.34  finish=89.83\n          259397) mpiexec          cpu=15 start=6.83  finish=89.83\n          259398) mpiexec          cpu=9 start=6.83  finish=89.83\n          259399) mb               cpu=8 start=6.86  finish=89.69\n          259400) mb               cpu=3 start=6.86  finish=89.55\n          259401) mb               cpu=12 start=6.86  finish=89.49\n          259402) mb               cpu=13 start=6.87  finish=89.80\n          259403) mb               cpu=14 start=6.87  finish=89.66\n          259404) mb               cpu=7 start=6.88  finish=89.21\n          259405) mb               cpu=2 start=6.88  finish=89.73\n          259406) mb               cpu=1 start=6.88  finish=89.83\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>A test of Bayesian analysis with very high IPC and retirement rate. Also a case where my AMD chip is more than 2x faster than my Intel chip. Overall looks like half the cores are used. Topdown metrics highlight a <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/mrbayes\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-569","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/569","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=569"}],"version-history":[{"count":1,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/569\/revisions"}],"predecessor-version":[{"id":572,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/569\/revisions\/572"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=569"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}