{"id":1027,"date":"2024-01-28T21:27:35","date_gmt":"2024-01-28T21:27:35","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=1027"},"modified":"2024-01-29T10:19:56","modified_gmt":"2024-01-29T10:19:56","slug":"gmpbench","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/gmpbench\/","title":{"rendered":"gmpbench"},"content":{"rendered":"\n<p>Testing the GNU multi-precision library. A single threaded program that reports a GMPbench score<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-78.png\" alt=\"\" class=\"wp-image-1049\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-78.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-78-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-78-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile suggests multiple sub-tests. Overall a high retirement rate low frontend stalls with backend stalls variable with the test case.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-115.png\" alt=\"\" class=\"wp-image-1051\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-115.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-115-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-115-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show little floating point and little l2 access. The CPU stalls are core-bound and not memory-bound.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              427.784\non_cpu               0.062          # 0.99 \/ 16 cores\nutime                417.815\nstime                4.074\nnvcsw                3074           # 64.89%\nnivcsw               1663           # 35.11%\ninblock              0              # 0.00\/sec\nonblock              14296          # 33.42\/sec\ncpu-clock            421916906517   # 421.917 seconds\ntask-clock           421922454058   # 421.922 seconds\npage faults          2756183        # 6532.440\/sec\ncontext switches     6298           # 14.927\/sec\ncpu migrations       444            # 1.052\/sec\nmajor page faults    3              # 0.007\/sec\nminor page faults    2756180        # 6532.433\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             501523528799   # 82.260 branches per 1000 inst\nbranch misses        3061059185     # 0.61% branch miss\nconditional          382644488334   # 62.761 conditional branches per 1000 inst\nindirect             14876152172    # 2.440 indirect branches per 1000 inst\ncpu-cycles           1965040411588  # 0.29 GHz\ninstructions         6072094172754  # 3.09 IPC high\nslots                3937816545756  #\nretiring             2247636929437  # 57.1% (57.1%) high\n-- ucode             776956035      #     0.0%\n-- fastpath          2246859973402  #    57.1%\nfrontend             116276658154   #  3.0% ( 3.0%) low\n-- latency           80906720016    #     2.1%\n-- bandwidth         35369938138    #     0.9%\nbackend              1509881987155  # 38.3% (38.3%)\n-- cpu               1262260815603  #    32.1%\n-- memory            247621171552   #     6.3%\nspeculation          63519286739    #  1.6% ( 1.6%)\n-- branch mispredict 60232902373    #     1.5%\n-- pipeline restart  3286384366     #     0.1%\nsmt-contention       501252396      #  0.0% ( 0.0%)\ncpu-cycles           1965474699019  # 0.29 GHz\ninstructions         6059932908034  # 3.08 IPC high\ninstructions         2023359162871  # 2.900 l2 access per 1000 inst\nl2 hit from l1       3554993713     # 10.74% l2 miss\nl2 miss from l1      138068096      #\nl2 hit from l2 pf    1820413091     #\nl3 hit from l2 pf    483630493      #\nl3 miss from l2 pf   8709715        #\ninstructions         2021651448545  # 1.140 float per 1000 inst\nfloat 512            213            # 0.000 AVX-512 per 1000 inst\nfloat 256            638            # 0.000 AVX-256 per 1000 inst\nfloat 128            2303804812     # 1.140 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              429.153\non_cpu               0.062          # 0.99 \/ 16 cores\nutime                421.186\nstime                2.284\nnvcsw                2764           # 62.44%\nnivcsw               1663           # 37.56%\ninblock              248            # 0.58\/sec\nonblock              2720           # 6.34\/sec\ncpu-clock            423457398211   # 423.457 seconds\ntask-clock           423463418970   # 423.463 seconds\npage faults          2434289        # 5748.523\/sec\ncontext switches     5991           # 14.148\/sec\ncpu migrations       466            # 1.100\/sec\nmajor page faults    1              # 0.002\/sec\nminor page faults    2434288        # 5748.520\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             444883853274   # 81.002 branches per 1000 inst\nbranch misses        2834417364     # 0.64% branch miss\nconditional          444883878586   # 81.002 conditional branches per 1000 inst\nindirect             14177570230    # 2.581 indirect branches per 1000 inst\nslots                9621835222010  #\nretiring             6303423323147  # 65.5% (65.5%) high\n-- ucode             987932791689   #    10.3%\n-- fastpath          5315490531458  #    55.2%\nfrontend             487924713905   #  5.1% ( 5.1%)\n-- latency           108479845700   #     1.1%\n-- bandwidth         379444868205   #     3.9%\nbackend              2464878156935  # 25.6% (25.6%)\n-- cpu               2307296971385  #    24.0%\n-- memory            157581185550   #     1.6%\nspeculation          366841291386   #  3.8% ( 3.8%)\n-- branch mispredict 355601817820   #     3.7%\n-- pipeline restart  11239473566    #     0.1%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           1602571643660  # 0.23 GHz\ninstructions         5493598362825  # 3.43 IPC high\nl2 access            9495709247     # 1.729 l2 access per 1000 inst\nl2 miss              1735896974     # 18.28% l2 miss\n<\/code><\/pre>\n\n\n\n<p>Process overview suggests different operations being tested in separate processes.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>772 processes\n\t 15 multiply               157.50     1.91\n\t  8 divide                  86.34     0.79\n\t  5 gcd                     53.60     0.01\n\t  5 gcdext                  53.39     0.00\n\t  3 rsa                     33.69     0.00\n\t  3 pi                      31.87     0.25\n\t 67 clinfo                  15.59     7.19\n\t 38 vulkaninfo               0.58     1.71\n\t  6 glxinfo:gdrv0            0.14     0.03\n\t  6 glxinfo:gl0              0.14     0.03\n\t  4 vulkani:disk$0           0.07     0.18\n\t  6 php                      0.07     0.10\n\t  6 clang                    0.07     0.05\n\t  2 glxinfo                  0.07     0.02\n\t  2 glxinfo:cs0              0.06     0.01\n\t  2 glxinfo:disk$0           0.06     0.01\n\t  2 glxinfo:sh0              0.06     0.01\n\t  2 glxinfo:shlo0            0.06     0.01\n\t  2 llvmpipe-0               0.04     0.09\n\t  2 llvmpipe-1               0.04     0.09\n\t  2 llvmpipe-10              0.04     0.09\n\t  2 llvmpipe-11              0.04     0.09\n\t  2 llvmpipe-12              0.04     0.09\n\t  2 llvmpipe-13              0.04     0.09\n\t  2 llvmpipe-14              0.04     0.09\n\t  2 llvmpipe-15              0.04     0.09\n\t  2 llvmpipe-2               0.04     0.09\n\t  2 llvmpipe-3               0.04     0.09\n\t  2 llvmpipe-4               0.04     0.09\n\t  2 llvmpipe-5               0.04     0.09\n\t  2 llvmpipe-6               0.04     0.09\n\t  2 llvmpipe-7               0.04     0.09\n\t  2 llvmpipe-8               0.04     0.09\n\t  2 llvmpipe-9               0.04     0.09\n\t  3 rocminfo                 0.03     0.00\n\t118 runbench                 0.01     0.04\n\t  1 lspci                    0.00     0.03\n\t  1 ps                       0.00     0.01\n\t150 gexpr                    0.00     0.00\n\t 80 sh                       0.00     0.00\n\t 79 sed                      0.00     0.00\n\t 40 grep                     0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  7 gsettings                0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 gmain                    0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  4 dconf worker             0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 cat                      0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 gmpbench                 0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>An example from the computation section<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      1170498) gmpbench         cpu=0 start=5.47  finish=425.53\n        1170499) runbench         cpu=0 start=5.47  finish=425.53\n          1170500) cat              cpu=15 start=5.47  finish=5.47 \n          1170501) runbench         cpu=14 start=5.48  finish=5.48 \n            1170502) runbench         cpu=3 start=5.48  finish=5.48 \n            1170503) sed              cpu=2 start=5.48  finish=5.48 \n          1170504) multiply         cpu=1 start=5.48  finish=13.29\n          1170507) runbench         cpu=14 start=13.29 finish=13.29\n            1170508) grep             cpu=15 start=13.29 finish=13.29\n            1170509) sed              cpu=3 start=13.29 finish=13.29\n          1170510) gexpr            cpu=15 start=13.29 finish=13.29\n          1170511) gexpr            cpu=3 start=13.29 finish=13.29\n          1170512) gexpr            cpu=4 start=13.29 finish=13.29\n          1170513) runbench         cpu=1 start=13.29 finish=13.29\n            1170514) runbench         cpu=4 start=13.29 finish=13.29\n            1170515) sed              cpu=15 start=13.29 finish=13.29\n          1170516) multiply         cpu=14 start=13.29 finish=24.10\n          1170518) runbench         cpu=7 start=24.10 finish=24.10\n            1170519) grep             cpu=1 start=24.10 finish=24.10\n            1170520) sed              cpu=3 start=24.10 finish=24.10\n          1170521) gexpr            cpu=4 start=24.10 finish=24.10\n          1170522) gexpr            cpu=1 start=24.10 finish=24.10\n          1170523) gexpr            cpu=3 start=24.10 finish=24.10\n          1170524) runbench         cpu=14 start=24.10 finish=24.11\n            1170525) runbench         cpu=7 start=24.10 finish=24.10\n            1170526) sed              cpu=2 start=24.10 finish=24.11\n          1170527) multiply         cpu=4 start=24.11 finish=34.75\n          1170528) runbench         cpu=6 start=34.75 finish=34.75\n            1170529) grep             cpu=7 start=34.75 finish=34.75\n            1170530) sed              cpu=1 start=34.75 finish=34.75\n          1170531) gexpr            cpu=10 start=34.76 finish=34.76\n          1170532) gexpr            cpu=6 start=34.76 finish=34.76\n          1170533) gexpr            cpu=7 start=34.76 finish=34.76\n          1170534) runbench         cpu=11 start=34.76 finish=34.76\n            1170535) runbench         cpu=12 start=34.76 finish=34.76\n            1170536) sed              cpu=10 start=34.76 finish=34.76\n          1170537) multiply         cpu=6 start=34.76 finish=45.30\n          1170538) runbench         cpu=6 start=45.30 finish=45.30\n            1170539) grep             cpu=15 start=45.30 finish=45.30\n            1170540) sed              cpu=1 start=45.30 finish=45.30\n          1170541) gexpr            cpu=10 start=45.30 finish=45.30\n          1170542) gexpr            cpu=15 start=45.30 finish=45.30\n          1170543) gexpr            cpu=1 start=45.30 finish=45.30\n          1170544) runbench         cpu=3 start=45.30 finish=45.30\n            1170545) runbench         cpu=12 start=45.30 finish=45.30\n            1170546) sed              cpu=6 start=45.30 finish=45.30\n          1170547) multiply         cpu=1 start=45.31 finish=57.07\n          1170548) runbench         cpu=6 start=57.07 finish=57.07\n            1170549) grep             cpu=7 start=57.07 finish=57.07\n            1170550) sed              cpu=10 start=57.07 finish=57.07\n          1170551) gexpr            cpu=3 start=57.07 finish=57.08\n          1170552) gexpr            cpu=4 start=57.08 finish=57.08\n          1170553) gexpr            cpu=6 start=57.08 finish=57.08\n          1170554) runbench         cpu=1 start=57.08 finish=57.08\n            1170555) runbench         cpu=10 start=57.08 finish=57.08\n            1170556) sed              cpu=3 start=57.08 finish=57.08\n          1170557) multiply         cpu=6 start=57.08 finish=67.69\n          1170558) runbench         cpu=7 start=67.69 finish=67.69\n            1170559) grep             cpu=9 start=67.69 finish=67.69\n            1170560) sed              cpu=10 start=67.69 finish=67.69\n          1170561) gexpr            cpu=3 start=67.69 finish=67.69\n          1170562) gexpr            cpu=6 start=67.69 finish=67.69\n          1170563) gexpr            cpu=7 start=67.69 finish=67.69\n          1170564) runbench         cpu=9 start=67.70 finish=67.70\n            1170565) runbench         cpu=10 start=67.70 finish=67.70\n            1170566) sed              cpu=11 start=67.70 finish=67.70\n          1170567) multiply         cpu=4 start=67.70 finish=78.30\n          1170568) runbench         cpu=14 start=78.30 finish=78.31\n            1170569) grep             cpu=7 start=78.30 finish=78.31\n            1170570) sed              cpu=9 start=78.30 finish=78.31\n          1170571) gexpr            cpu=10 start=78.31 finish=78.31\n          1170572) gexpr            cpu=3 start=78.31 finish=78.31\n          1170573) gexpr            cpu=14 start=78.31 finish=78.31\n          1170574) runbench         cpu=7 start=78.31 finish=78.31\n            1170575) runbench         cpu=9 start=78.31 finish=78.31\n            1170576) sed              cpu=10 start=78.31 finish=78.31\n          1170577) multiply         cpu=1 start=78.31 finish=88.83\n          1170578) runbench         cpu=14 start=88.83 finish=88.83\n            1170579) grep             cpu=15 start=88.83 finish=88.83\n            1170580) sed              cpu=10 start=88.83 finish=88.83\n          1170581) gexpr            cpu=1 start=88.83 finish=88.83\n          1170582) gexpr            cpu=3 start=88.83 finish=88.84\n          1170583) gexpr            cpu=15 start=88.84 finish=88.84\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Testing the GNU multi-precision library. A single threaded program that reports a GMPbench score Topdown profile suggests multiple sub-tests. Overall a high retirement rate low frontend stalls with backend stalls variable with the test case. AMD metrics show little floating <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/gmpbench\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-1027","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1027","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=1027"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1027\/revisions"}],"predecessor-version":[{"id":1052,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1027\/revisions\/1052"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=1027"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}