{"id":1902,"date":"2024-03-02T04:00:20","date_gmt":"2024-03-02T04:00:20","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=1902"},"modified":"2024-03-02T13:21:23","modified_gmt":"2024-03-02T13:21:23","slug":"cpp-perf-bench","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/cpp-perf-bench\/","title":{"rendered":"cpp-perf-bench"},"content":{"rendered":"\n<p>A set of C++ compiler performance benchmarks. There are seven workloads but the 4th workload takes ~75% of the time  and the 3rd workload takes 13% of the time. It looks to be mostly single-threaded.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-8.png\" alt=\"\" class=\"wp-image-1907\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-8.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-8-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-8-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows the longest workload is backend bound.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-8.png\" alt=\"\" class=\"wp-image-1909\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-8.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-8-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-8-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show low frontend stalls, a moderate amount of floating point and low L2 access.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              4152.282\non_cpu               0.061          # 0.97 \/ 16 cores\nutime                4028.006\nstime                0.861\nnvcsw                2304           # 10.11%\nnivcsw               20482          # 89.89%\ninblock              0              # 0.00\/sec\nonblock              13664          # 3.29\/sec\ncpu-clock            4029353979373  # 4029.354 seconds\ntask-clock           4029386424465  # 4029.386 seconds\npage faults          161021         # 39.962\/sec\ncontext switches     43316          # 10.750\/sec\ncpu migrations       570            # 0.141\/sec\nmajor page faults    2              # 0.000\/sec\nminor page faults    161019         # 39.961\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             5053525210204  # 148.848 branches per 1000 inst\nbranch misses        86945679275    # 1.72% branch miss\nconditional          4056265160749  # 119.474 conditional branches per 1000 inst\nindirect             209326902213   # 6.166 indirect branches per 1000 inst\ncpu-cycles           18870987219583 # 0.28 GHz\ninstructions         33949846979899 # 1.80 IPC\nslots                37743699269016 #\nretiring             11352214902392 # 30.1% (30.1%)\n-- ucode             6928674796     #     0.0%\n-- fastpath          11345286227596 #    30.1%\nfrontend             3328560349222  #  8.8% ( 8.8%)\n-- latency           2235953077110  #     5.9%\n-- bandwidth         1092607272112  #     2.9%\nbackend              21313619382131 # 56.5% (56.5%)\n-- cpu               10866513489890 #    28.8%\n-- memory            10447105892241 #    27.7%\nspeculation          1747934988006  #  4.6% ( 4.6%)\n-- branch mispredict 1746614988191  #     4.6%\n-- pipeline restart  1319999815     #     0.0%\nsmt-contention       1368585558     #  0.0% ( 0.0%)\ncpu-cycles           18866636234420 # 0.28 GHz\ninstructions         33951875487897 # 1.80 IPC\ninstructions         11323796838173 # 0.388 l2 access per 1000 inst\nl2 hit from l1       4168381036     # 0.81% l2 miss\nl2 miss from l1      24526213       #\nl2 hit from l2 pf    214160338      #\nl3 hit from l2 pf    6348965        #\nl3 miss from l2 pf   4623318        #\ninstructions         11315727437411 # 166.242 float per 1000 inst\nfloat 512            68             # 0.000 AVX-512 per 1000 inst\nfloat 256            632            # 0.000 AVX-256 per 1000 inst\nfloat 128            1881152882630  # 166.242 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\ninstructions         33950818366915 #\nopcache              4901373308194  # 144.367 opcache per 1000 inst\nopcache miss         2315040492     #  0.0% opcache miss rate\nl1 dTLB miss         31795935       # 0.001 L1 dTLB per 1000 inst\nl2 dTLB miss         5675743        # 0.000 L2 dTLB per 1000 inst\ninstructions         33950996153328 #\nicache               5905104655     # 0.174 icache per 1000 inst\nicache miss          897406472      # 15.2% icache miss rate\nl1 iTLB miss         7905312        # 0.000 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            19119          # 0.000 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              4467.158\non_cpu               0.061          # 0.97 \/ 16 cores\nutime                4342.517\nstime                0.608\nnvcsw                2027           # 9.04%\nnivcsw               20386          # 90.96%\ninblock              48             # 0.01\/sec\nonblock              2416           # 0.54\/sec\ncpu-clock            4343393637491  # 4343.394 seconds\ntask-clock           4343415265454  # 4343.415 seconds\npage faults          152329         # 35.071\/sec\ncontext switches     44509          # 10.247\/sec\ncpu migrations       1413           # 0.325\/sec\nmajor page faults    1              # 0.000\/sec\nminor page faults    152328         # 35.071\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             5050781688168  # 148.766 branches per 1000 inst\nbranch misses        105902632923   # 2.10% branch miss\nconditional          5050781701544  # 148.766 conditional branches per 1000 inst\nindirect             209332837884   # 6.166 indirect branches per 1000 inst\nslots                98709135518408 #\nretiring             36963729109576 # 37.4% (37.4%)\n-- ucode             2799260266597  #     2.8%\n-- fastpath          34164468842979 #    34.6%\nfrontend             2241641598206  #  2.3% ( 2.3%) low\n-- latency           833581705701   #     0.8%\n-- bandwidth         1408059892505  #     1.4%\nbackend              57810001474260 # 58.6% (58.6%)\n-- cpu               47545445041758 #    48.2%\n-- memory            10264556432502 #    10.4%\nspeculation          1590680411824  #  1.6% ( 1.6%)\n-- branch mispredict 1587076175543  #     1.6%\n-- pipeline restart  3604236281     #     0.0%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           16468750380538 # 0.23 GHz\ninstructions         33952046966307 # 2.06 IPC\nl2 access            11355265062    # 0.334 l2 access per 1000 inst\nl2 miss              185670977      # 1.64% l2 miss\ncpu-cycles           16465770641254 #  7.9% memory latency\nload stalls          1286875525462  #  7.7% l1 bound\nl1 miss              11102829142    #  0.1% l2 bound\nl2 miss              800643499      #  0.0% l3 bound\nl3 miss              306221980      #  0.0% dram bound\nstore_stalls         6164687781     #  0.0% store bound\n<\/code><\/pre>\n\n\n\n<p>Process overview crashed part way through but had the most time in the random_numbers application<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>260 processes\n\t  2 random_numbers        1995.58     0.00\n\t  3 mathlib                607.24     0.00\n\t  3 ctype                  116.07     0.00\n\t  3 atol                    90.41     0.00\n\t 34 clinfo                  10.25     3.00\n\t 19 vulkaninfo               0.57     0.94\n\t  2 vulkani:disk$0           0.06     0.10\n\t  6 clang                    0.06     0.06\n\t  3 glxinfo:gdrv0            0.05     0.06\n\t  3 glxinfo:gl0              0.05     0.06\n\t  1 llvmpipe-0               0.03     0.05\n\t  1 llvmpipe-1               0.03     0.05\n\t  1 llvmpipe-10              0.03     0.05\n\t  1 llvmpipe-11              0.03     0.05\n\t  1 llvmpipe-12              0.03     0.05\n\t  1 llvmpipe-13              0.03     0.05\n\t  1 llvmpipe-14              0.03     0.05\n\t  1 llvmpipe-15              0.03     0.05\n\t  1 llvmpipe-2               0.03     0.05\n\t  1 llvmpipe-3               0.03     0.05\n\t  1 llvmpipe-4               0.03     0.05\n\t  1 llvmpipe-5               0.03     0.05\n\t  1 llvmpipe-6               0.03     0.05\n\t  1 llvmpipe-7               0.03     0.05\n\t  1 llvmpipe-8               0.03     0.05\n\t  1 llvmpipe-9               0.03     0.05\n\t  1 glxinfo                  0.03     0.02\n\t  1 glxinfo:cs0              0.03     0.02\n\t  1 glxinfo:disk$0           0.03     0.02\n\t  1 glxinfo:sh0              0.03     0.02\n\t  1 glxinfo:shlo0            0.03     0.02\n\t  1 ps                       0.00     0.01\n\t 64 sh                       0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t 11 cpp-perf-bench           0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  7 gsettings                0.00     0.00\n\t  7 stat                     0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  4 dconf worker             0.00     0.00\n\t  4 gmain                    0.00     0.00\n\t  4 phoronix-test-s          0.00     0.00\n\t  2 which                    0.00     0.00\n\t  1 cc                       0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lscpu                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n\t  1 xset                     0.00     0.00\n11 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Random numbers is also where it crashed<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      841561) cpp-perf-bench   cpu=9 start=874.16 finish=1871.99\n        841562) random_numbers   cpu=3 start=874.16 finish=1871.99\n      842462) cpp-perf-bench   cpu=9 start=1876.00 finish=2873.75\n        842463) random_numbers   cpu=3 start=1876.00 finish=2873.75\n      842636) ?? cpu=0 start=2877.75 finish=0.00 \n        842637) ?? cpu=0 start=2877.76 finish=0.00 \n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>A set of C++ compiler performance benchmarks. There are seven workloads but the 4th workload takes ~75% of the time and the 3rd workload takes 13% of the time. It looks to be mostly single-threaded. Topdown profile shows the longest <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/cpp-perf-bench\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-1902","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1902","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=1902"}],"version-history":[{"count":3,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1902\/revisions"}],"predecessor-version":[{"id":1910,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1902\/revisions\/1910"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=1902"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}