{"id":2174,"date":"2024-03-24T01:30:40","date_gmt":"2024-03-24T01:30:40","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=2174"},"modified":"2024-03-24T14:25:18","modified_gmt":"2024-03-24T14:25:18","slug":"polybench-c","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/polybench-c\/","title":{"rendered":"polybench-c"},"content":{"rendered":"\n<p>A set of C-language polyhedral benchmarks. There are three quick running tests.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-42.png\" alt=\"\" class=\"wp-image-2182\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-42.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-42-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-42-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile is similarly sparse.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-44.png\" alt=\"\" class=\"wp-image-2184\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-44.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-44-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-44-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show less than 1\/4 of core is used<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              67.382\non_cpu               0.015          # 0.24 \/ 16 cores\nutime                14.928\nstime                1.044\nnvcsw                2036           # 88.91%\nnivcsw               254            # 11.09%\ninblock              0              # 0.00\/sec\nonblock              12816          # 190.20\/sec\ncpu-clock            15999961726    # 16.000 seconds\ntask-clock           16003329029    # 16.003 seconds\npage faults          299228         # 18697.860\/sec\ncontext switches     2446           # 152.843\/sec\ncpu migrations       262            # 16.372\/sec\nmajor page faults    2              # 0.125\/sec\nminor page faults    299226         # 18697.735\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             16223912644    # 137.618 branches per 1000 inst\nbranch misses        106135332      # 0.65% branch miss\nconditional          15504802149    # 131.519 conditional branches per 1000 inst\nindirect             46318603       # 0.393 indirect branches per 1000 inst\ncpu-cycles           56802673451    # 0.05 GHz\ninstructions         116563864786   # 2.05 IPC\nslots                115810833066   #\nretiring             34742752204    # 30.0% (30.0%)\n-- ucode             10738983       #     0.0%\n-- fastpath          34732013221    #    30.0%\nfrontend             8382785080     #  7.2% ( 7.2%)\n-- latency           5697719298     #     4.9%\n-- bandwidth         2685065782     #     2.3%\nbackend              72008995376    # 62.2% (62.2%)\n-- cpu               45273184687    #    39.1%\n-- memory            26735810689    #    23.1%\nspeculation          645574399      #  0.6% ( 0.6%) low\n-- branch mispredict 638080901      #     0.6%\n-- pipeline restart  7493498        #     0.0%\nsmt-contention       30442664       #  0.0% ( 0.0%)\ncpu-cycles           56803940933    # 0.05 GHz\ninstructions         116378467062   # 2.05 IPC\ninstructions         39369231008    # 276.827 l2 access per 1000 inst\nl2 hit from l1       7276325564     # 5.84% l2 miss\nl2 miss from l1      361027290      #\nl2 hit from l2 pf    3346812187     #\nl3 hit from l2 pf    266835347      #\nl3 miss from l2 pf   8491441        #\ninstructions         39274623265    # 241.655 float per 1000 inst\nfloat 512            55             # 0.000 AVX-512 per 1000 inst\nfloat 256            538            # 0.000 AVX-256 per 1000 inst\nfloat 128            9490912630     # 241.655 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\ninstructions         117774240339   #\nopcache              4186166650     # 35.544 opcache per 1000 inst\nopcache miss         717023170      # 17.1% opcache miss rate\nl1 dTLB miss         18215253201    # 154.662 L1 dTLB per 1000 inst\nl2 dTLB miss         105778786      # 0.898 L2 dTLB per 1000 inst\ninstructions         117784192075   #\nicache               1513163017     # 12.847 icache per 1000 inst\nicache miss          174490262      # 11.5% icache miss rate\nl1 iTLB miss         8467310        # 0.072 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            17414          # 0.000 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics are similarly quick, in this case showing backend bound nature in L2<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              79.629\non_cpu               0.023          # 0.37 \/ 16 cores\nutime                28.562\nstime                0.560\nnvcsw                1831           # 92.10%\nnivcsw               157            # 7.90%\ninblock              10216          # 128.30\/sec\nonblock              1576           # 19.79\/sec\ncpu-clock            29150212553    # 29.150 seconds\ntask-clock           29153359668    # 29.153 seconds\npage faults          236174         # 8101.090\/sec\ncontext switches     2204           # 75.600\/sec\ncpu migrations       178            # 6.106\/sec\nmajor page faults    40             # 1.372\/sec\nminor page faults    236134         # 8099.718\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             14991982033    # 153.812 branches per 1000 inst\nbranch misses        22893515       # 0.15% branch miss\nconditional          14991992401    # 153.813 conditional branches per 1000 inst\nindirect             21024144       # 0.216 indirect branches per 1000 inst\nslots                1796147433314  #\nretiring             182199830495   # 10.1% (10.1%) low\n-- ucode             21793493705    #     1.2%\n-- fastpath          160406336790   #     8.9%\nfrontend             11742967638    #  0.7% ( 0.7%) low\n-- latency           9138659764     #     0.5%\n-- bandwidth         2604307874     #     0.1%\nbackend              1601029301507  # 89.1% (89.1%) high\n-- cpu               452951150880   #    25.2%\n-- memory            1148078150627  #    63.9%\nspeculation          3731819750     #  0.2% ( 0.2%) low\n-- branch mispredict 2344944033     #     0.1%\n-- pipeline restart  1386875717     #     0.1%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           259085483965   # 0.10 GHz\ninstructions         171145304834   # 0.66 IPC low\nl2 access            53673106299    # 313.859 l2 access per 1000 inst\nl2 miss              4743715242     # 8.84% l2 miss\ncpu-cycles           261475873410   # 66.0% memory latency\nload stalls          172553812136   #  0.0% l1 bound\nl1 miss              173873149616   # 58.8% l2 bound\nl2 miss              20114165538    #  4.8% l3 bound\nl3 miss              7575873524     #  2.9% dram bound\nstore_stalls         114286947      #  0.0% store bound\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>A set of C-language polyhedral benchmarks. There are three quick running tests. Topdown profile is similarly sparse. AMD metrics show less than 1\/4 of core is used Intel metrics are similarly quick, in this case showing backend bound nature in <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/polybench-c\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2174","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2174","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=2174"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2174\/revisions"}],"predecessor-version":[{"id":2185,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2174\/revisions\/2185"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=2174"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}