{"id":137,"date":"2023-12-31T00:47:33","date_gmt":"2023-12-31T00:47:33","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?p=137"},"modified":"2023-12-31T00:47:34","modified_gmt":"2023-12-31T00:47:34","slug":"creating-basic-metrics-and-adding-topdown-plots","status":"publish","type":"post","link":"https:\/\/mvermeulen.org\/perf\/2023\/12\/31\/creating-basic-metrics-and-adding-topdown-plots\/","title":{"rendered":"Creating basic metrics and adding topdown plots"},"content":{"rendered":"\n<p>I have made several enhancements to the topdown tool. I also have some fragile things I still need to sort out along the way.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>I have added metrics for &#8211;topdown2, &#8211;cache2, &#8211;float &#8211;branch and &#8211;opcache.  These behave as I expect on AMD systems.  I am still sorting out things on Intel system, though something acts strange with my topdown2 counters.  If I use them alone, all is well but when I combine them with other counters, the perf_event_open call tells me there is an invalid argument.<\/li>\n\n\n\n<li>I have done a first implementation of level 1 caches (&#8211;dcache,&#8211;icache) and TLB (&#8211;tlb).  All these use the PERF_TYPE_HW_CACHE type from perf_event_open(2).  However, the results don&#8217;t quite seem right &#8211; so I may look at adding corresponding events with PERF_TYPE_RAW events and see if they make more sense.<\/li>\n\n\n\n<li>I did an initial implementation for &#8211;memory using the LS core counters for memory operations.  This is also used for local\/remote memory for likwid. However, the numbers are lower than what stream reports for memory traffic, so not sure this is the right counter recipe. I also have references to the \/sys\/devices\/amd_df counters and can see them after loading the driver. However, not quite sure what counter to use for memory channel read\/writes<\/li>\n\n\n\n<li>I have created an initial summary block &#8220;topdown.txt&#8221; for counters that work as I expect and have both for AMD and Intel processors a high level summary I will show below.<\/li>\n\n\n\n<li>I have implemented the &#8220;&#8211;interval&#8221; option which lets me sample counters periodically. When combined with gnuplot, &#8211;csv and -o options this lets me create some *.png files that plot topdown metrics.<\/li>\n<\/ul>\n\n\n\n<p>The net combination is best seen below where I include both a topdown metrics summary (created from three runs of &#8220;topdown&#8221; with different options) and a topdown chart (created from a fourth run with additional options).  This is a fair step along the way towards having a basic analysis tool for looking at benchmark loads. In addition to clearing up some of the issues above, I also want to add a &#8220;&#8211;tree&#8221; option to plot a process tree. Once I have that, I&#8217;ll have most of the useful bits of the program formerly named &#8220;wspy&#8221; and might also rename my &#8220;topdown&#8221; to also accept the &#8220;wspy&#8221; name.<\/p>\n\n\n\n<p>Here is an AMD summary block with major that includes metrics for coremark:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              83.410\non_cpu               0.747          # 11.95 \/ 16 cores\nutime                996.029\nstime                0.451\nnvcsw                1162           # 12.25%\nnivcsw               8320           # 87.75%\ninblock              0\nonblock              1096\ncpu-clock            996492501279   # 996.493 seconds\ntask-clock           996497240698   # 996.497 seconds\npage faults          49987          # 50.163\/sec\ncontext switches     9695           # 9.729\/sec\ncpu migrations       136            # 0.136\/sec\nmajor page faults    0              # 0.000\/sec\nminor page faults    49985          # 50.161\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             1905721388306  # 189.110 branches per 1000 inst\nbranch misses        3005711443     # 0.16% branch miss\nconditional          1674633740961  # 166.178 conditional branches per 1000 inst\nindirect             9422915848     # 0.935 indirect branches per 1000 inst\ncpu-cycles           4319923640733  # 3.23 GHz\ninstructions         10080742579393 # 2.33 IPC\nslots                8640874657662  #\nretiring             3015427410903  # 34.9% (58.9%)\n-- ucode             6726058        #     0.0%\n-- fastpath          3015420684845  #    34.9%\nfrontend             1175050211309  # 13.6% (22.9%)\n-- latency           530224174536   #     6.1%\n-- bandwidth         644826036773   #     7.5%\nbackend              894468621667   # 10.4% (17.5%)\n-- cpu               270749606784   #     3.1%\n-- memory            623719014883   #     7.2%\nspeculation          36309001429    #  0.4% ( 0.7%)\n-- branch mispredict 34321580391    #     0.4%\n-- pipeline restart  1987421038     #     0.0%\nsmt-contention       3519610791947  # 40.7% ( 0.0%)\ninstructions         5040563575655  # 0.024 l2 access per 1000 inst\nl2 hit from l1       114170557      # 8.80% l2 miss\nl2 miss from l1      7864844        #\nl2 hit from l2 pf    5961997        #\nl3 hit from l2 pf    1759222        #\nl3 miss from l2 pf   1202870        #\ninstructions         5036908689193  # 0.085 float per 1000 inst\nfloat 512            92             # 0.000 AVX-512 per 1000 inst\nfloat 256            852            # 0.000 AVX-256 per 1000 inst\nfloat 128            427687605      # 0.085 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst<\/code><\/pre>\n\n\n\n<p>Here is the corresponding Intel summary block, also for coremark:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              82.626\non_cpu               0.707          # 11.31 \/ 16 cores\nutime                934.350\nstime                0.259\nnvcsw                1122           # 16.56%\nnivcsw               5653           # 83.44%\ninblock              0\nonblock              1064\ncpu-clock            934609836035   # 934.610 seconds\ntask-clock           934612788300   # 934.613 seconds\npage faults          74644          # 79.866\/sec\ncontext switches     6966           # 7.453\/sec\ncpu migrations       190            # 0.203\/sec\nmajor page faults    0              # 0.000\/sec\nminor page faults    74644          # 79.866\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             1487191047680  # 189.103 branches per 1000 inst\nbranch misses        3750608715     # 0.25% branch miss\nconditional          1487191057952  # 189.103 conditional branches per 1000 inst\nindirect             441335072192   # 56.118 indirect branches per 1000 inst\nslots                6076449129938  #\nretiring             3906991250131  # 64.3% (64.3%)\n-- ucode             67666336195    #     1.1%\n-- fastpath          3839324913936  #    63.2%\nfrontend             1246450345074  # 20.5% (20.5%)\n-- latency           751572503238   #    12.4%\n-- bandwidth         494877841836   #     8.1%\nbackend              629022362428   # 10.4% (10.4%)\n-- cpu               335343935853   #     5.5%\n-- memory            293678426575   #     4.8%\nspeculation          272715027078   #  4.5% ( 4.5%)\n-- branch mispredict 256635566653   #     4.2%\n-- pipeline restart  16079460425    #     0.3%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           3907422305230  # 2.65 GHz\ninstructions         9072449306543  # 2.32 IPC\nl2 access            130609511      # 0.029 l2 access per 1000 inst\nl2 miss              41959615       # 32.13% l2 miss<\/code><\/pre>\n\n\n\n<p>Here is the plot file of topdown metrics for coremark followed by the one for stream.  From here you can see the repetition with different benchmarks as well as how the overall pattern (backend bound stream, mostly retiring coremark) show together.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"640\" height=\"480\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2023\/12\/coremark.png\" alt=\"\" class=\"wp-image-138\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"640\" height=\"480\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2023\/12\/stream.png\" alt=\"\" class=\"wp-image-139\"\/><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>I have made several enhancements to the topdown tool. I also have some fragile things I still need to sort out along the way. The net combination is best seen below where I include both a topdown metrics summary (created <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/2023\/12\/31\/creating-basic-metrics-and-adding-topdown-plots\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[5],"tags":[19,7,8],"class_list":["post-137","post","type-post","status-publish","format-standard","hentry","category-tools","tag-gnuplot","tag-performance-counters","tag-topdown"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/posts\/137","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=137"}],"version-history":[{"count":1,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/posts\/137\/revisions"}],"predecessor-version":[{"id":140,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/posts\/137\/revisions\/140"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=137"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/categories?post=137"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/tags?post=137"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}