{"id":2122,"date":"2024-03-20T12:18:30","date_gmt":"2024-03-20T12:18:30","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=2122"},"modified":"2024-03-22T00:32:28","modified_gmt":"2024-03-22T00:32:28","slug":"toybrot","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/toybrot\/","title":{"rendered":"toybrot"},"content":{"rendered":"\n<p>A Mandelbrot fractal generator with four workloads using different parallelism methods: TBB, OpenMP, C++ tasks and C++ threads. Also showing different levels of parallelism with C++ threads the highest and C++ tasks moderately higher and OpenMP\/TBB matching the number of cores.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-37.png\" alt=\"\" class=\"wp-image-2144\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-37.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-37-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-37-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile is surprising in how much the four methods are still similar.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-39.png\" alt=\"\" class=\"wp-image-2145\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-39.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-39-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-39-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics confirm this is floating point code that has very little L2 access. Retirement rate is high.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              534.380\non_cpu               0.863          # 13.81 \/ 16 cores\nutime                7377.820\nstime                2.399\nnvcsw                3143           # 0.44%\nnivcsw               716953         # 99.56%\ninblock              0              # 0.00\/sec\nonblock              14408          # 26.96\/sec\ncpu-clock            7382597325204  # 7382.597 seconds\ntask-clock           7382631028362  # 7382.631 seconds\npage faults          345620         # 46.815\/sec\ncontext switches     722570         # 97.874\/sec\ncpu migrations       2794           # 0.378\/sec\nmajor page faults    2              # 0.000\/sec\nminor page faults    345618         # 46.815\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             8083876110088  # 132.798 branches per 1000 inst\nbranch misses        80137273285    # 0.99% branch miss\nconditional          7915372159838  # 130.030 conditional branches per 1000 inst\nindirect             64437985819    # 1.059 indirect branches per 1000 inst\ncpu-cycles           31071237393434 # 3.63 GHz\ninstructions         60879442564782 # 1.96 IPC\nslots                62176734680100 #\nretiring             22110032250570 # 35.6% (64.9%) high\n-- ucode             10771596099    #     0.0%\n-- fastpath          22099260654471 #    35.5%\nfrontend             1744368334608  #  2.8% ( 5.1%)\n-- latency           894253211706   #     1.4%\n-- bandwidth         850115122902   #     1.4%\nbackend              9010849789479  # 14.5% (26.5%)\n-- cpu               8877823608547  #    14.3%\n-- memory            133026180932   #     0.2%\nspeculation          1192503018886  #  1.9% ( 3.5%)\n-- branch mispredict 1192479459812  #     1.9%\n-- pipeline restart  23559074       #     0.0%\nsmt-contention       28118884517461 # 45.2% ( 0.0%)\ncpu-cycles           31064294398784 # 3.62 GHz\ninstructions         60860623406151 # 1.96 IPC\ninstructions         20295654465016 # 0.039 l2 access per 1000 inst\nl2 hit from l1       717271512      # 9.91% l2 miss\nl2 miss from l1      39658005       #\nl2 hit from l2 pf    31850550       #\nl3 hit from l2 pf    24623451       #\nl3 miss from l2 pf   13728835       #\ninstructions         20287877027354 # 363.504 float per 1000 inst\nfloat 512            59             # 0.000 AVX-512 per 1000 inst\nfloat 256            616            # 0.000 AVX-256 per 1000 inst\nfloat 128            7374729455923  # 363.504 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         1              # 0.000 scalar per 1000 inst\ninstructions         60874438174522 #\nopcache              5446273322626  # 89.467 opcache per 1000 inst\nopcache miss         3328964533     #  0.1% opcache miss rate\nl1 dTLB miss         111328090      # 0.002 L1 dTLB per 1000 inst\nl2 dTLB miss         23225462       # 0.000 L2 dTLB per 1000 inst\ninstructions         60875026269815 #\nicache               7052704318     # 0.116 icache per 1000 inst\nicache miss          652940617      #  9.3% icache miss rate\nl1 iTLB miss         9231663        # 0.000 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            38525          # 0.000 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics also show high retirement rate and low backend stalls.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              638.535\non_cpu               0.858          # 13.72 \/ 16 cores\nutime                8759.240\nstime                1.467\nnvcsw                4116           # 0.49%\nnivcsw               842516         # 99.51%\ninblock              12336          # 19.32\/sec\nonblock              3176           # 4.97\/sec\ncpu-clock            8763097608450  # 8763.098 seconds\ntask-clock           8763121289241  # 8763.121 seconds\npage faults          332556         # 37.949\/sec\ncontext switches     849614         # 96.953\/sec\ncpu migrations       3215           # 0.367\/sec\nmajor page faults    93             # 0.011\/sec\nminor page faults    332463         # 37.939\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             8083416801050  # 132.788 branches per 1000 inst\nbranch misses        56866033035    # 0.70% branch miss\nconditional          8083416852346  # 132.788 conditional branches per 1000 inst\nindirect             1384972721313  # 22.751 indirect branches per 1000 inst\nslots                47671596043970 #\nretiring             31887711190056 # 66.9% (66.9%) high\n-- ucode             15614157267    #     0.0%\n-- fastpath          31872097032789 #    66.9%\nfrontend             9325764960018  # 19.6% (19.6%)\n-- latency           8097339304921  #    17.0%\n-- bandwidth         1228425655097  #     2.6%\nbackend              3227716735681  #  6.8% ( 6.8%) low\n-- cpu               2892801822478  #     6.1%\n-- memory            334914913203   #     0.7%\nspeculation          3365729603445  #  7.1% ( 7.1%)\n-- branch mispredict 3363574411730  #     7.1%\n-- pipeline restart  2155191715     #     0.0%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           25692550556350 # 2.33 GHz\ninstructions         54368912295309 # 2.12 IPC\nl2 access            626319059      # 0.019 l2 access per 1000 inst\nl2 miss              189667419      # 30.28% l2 miss\ncpu-cycles           16897128544726 #  7.4% memory latency\nload stalls          1244125349928  #  7.4% l1 bound\nl1 miss              1996402438     #  0.0% l2 bound\nl2 miss              1032469931     #  0.0% l3 bound\nl3 miss              322943920      #  0.0% dram bound\nstore_stalls         377235985      #  0.0% store bound\n<\/code><\/pre>\n\n\n\n<p>Process overview shows &#8220;rm*&#8221; for each different type of parallelism<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>1602 processes\n\t963 rmSTD_THREADS        526068.01   145.63\n\t195 rmSTD_TASKS          117381.36    19.48\n\t 48 rmOpenMP             29588.00     2.56\n\t 30 rmTBB                18339.68     4.18\n\t 68 clinfo                  16.52     6.33\n\t 38 vulkaninfo               1.34     1.14\n\t  4 vulkani:disk$0           0.15     0.12\n\t  6 php                      0.09     0.09\n\t  2 llvmpipe-0               0.07     0.06\n\t  2 llvmpipe-1               0.07     0.06\n\t  2 llvmpipe-10              0.07     0.06\n\t  2 llvmpipe-11              0.07     0.06\n\t  2 llvmpipe-12              0.07     0.06\n\t  2 llvmpipe-13              0.07     0.06\n\t  2 llvmpipe-14              0.07     0.06\n\t  2 llvmpipe-15              0.07     0.06\n\t  2 llvmpipe-2               0.07     0.06\n\t  2 llvmpipe-3               0.07     0.06\n\t  2 llvmpipe-4               0.07     0.06\n\t  2 llvmpipe-5               0.07     0.06\n\t  2 llvmpipe-6               0.07     0.06\n\t  2 llvmpipe-7               0.07     0.06\n\t  2 llvmpipe-8               0.07     0.06\n\t  2 llvmpipe-9               0.07     0.06\n\t  6 clang                    0.04     0.08\n\t  1 lspci                    0.00     0.02\n\t  3 rocminfo                 0.00     0.01\n\t  1 ps                       0.00     0.01\n\t 89 sh                       0.00     0.00\n\t 14 gsettings                0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t 12 toybrot                  0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  4 glxinfo                  0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 setterm                  0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dconf worker             0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 gmain                    0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n18 processes running\n349 maximum processes\n<\/code><\/pre>\n\n\n\n<p>TBB and OpenMP sections<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      189095) toybrot          cpu=8 start=90.09 finish=128.74\n        189096) rmTBB            cpu=9 start=90.10 finish=128.74\n          189097) ?? cpu=0 start=90.13 finish=0.00 \n            189099) rmTBB            cpu=5 start=90.13 finish=128.74\n              189102) rmTBB            cpu=4 start=90.13 finish=128.74\n                189109) ?? cpu=0 start=90.13 finish=0.00 \n              189105) rmTBB            cpu=14 start=90.13 finish=128.74\n            189100) rmTBB            cpu=8 start=90.13 finish=128.74\n              189108) rmTBB            cpu=12 start=90.13 finish=128.74\n              189111) rmTBB            cpu=9 start=90.13 finish=128.74\n          189098) rmTBB            cpu=3 start=90.13 finish=128.74\n            189101) rmTBB            cpu=6 start=90.13 finish=128.74\n              189104) rmTBB            cpu=13 start=90.13 finish=128.74\n              189107) rmTBB            cpu=15 start=90.13 finish=128.74\n            189103) rmTBB            cpu=7 start=90.13 finish=128.74\n              189106) rmTBB            cpu=0 start=90.13 finish=128.74\n              189110) rmTBB            cpu=11 start=90.13 finish=128.74\n      189112) sh               cpu=5 start=128.74 finish=128.74\n        189113) sh               cpu=2 start=128.74 finish=128.74\n      189114) toybrot          cpu=12 start=138.95 finish=177.74\n        189115) rmOpenMP         cpu=6 start=138.96 finish=177.74\n          189116) rmOpenMP         cpu=9 start=138.99 finish=177.74\n          189117) rmOpenMP         cpu=15 start=138.99 finish=177.74\n          189118) rmOpenMP         cpu=8 start=138.99 finish=177.74\n          189119) rmOpenMP         cpu=5 start=138.99 finish=177.74\n          189120) rmOpenMP         cpu=14 start=138.99 finish=177.74\n          189121) rmOpenMP         cpu=3 start=138.99 finish=177.74\n          189122) rmOpenMP         cpu=4 start=138.99 finish=177.74\n          189123) rmOpenMP         cpu=13 start=138.99 finish=177.74\n          189124) rmOpenMP         cpu=7 start=138.99 finish=177.74\n          189125) rmOpenMP         cpu=0 start=138.99 finish=177.74\n          189126) rmOpenMP         cpu=2 start=138.99 finish=177.74\n          189127) rmOpenMP         cpu=10 start=138.99 finish=177.74\n          189128) rmOpenMP         cpu=1 start=138.99 finish=177.74\n          189129) rmOpenMP         cpu=11 start=138.99 finish=177.74\n          189130) rmOpenMP         cpu=12 start=138.99 finish=177.74\n<\/code><\/pre>\n\n\n\n<p>Tasks (and threads) look more like<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      189167) toybrot          cpu=12 start=273.86 finish=312.40\n        189168) rmSTD_TASKS      cpu=7 start=273.86 finish=312.40\n          189169) rmSTD_TASKS      cpu=5 start=273.90 finish=311.35\n          189170) rmSTD_TASKS      cpu=5 start=273.90 finish=311.38\n          189171) rmSTD_TASKS      cpu=10 start=273.90 finish=310.99\n          189172) rmSTD_TASKS      cpu=1 start=273.90 finish=310.67\n          189173) rmSTD_TASKS      cpu=2 start=273.90 finish=310.94\n          189174) rmSTD_TASKS      cpu=11 start=273.90 finish=310.64\n          189175) rmSTD_TASKS      cpu=4 start=273.90 finish=310.25\n          189176) rmSTD_TASKS      cpu=13 start=273.90 finish=309.85\n          189177) rmSTD_TASKS      cpu=8 start=273.90 finish=309.94\n          189178) rmSTD_TASKS      cpu=9 start=273.90 finish=310.13\n          189179) rmSTD_TASKS      cpu=10 start=273.90 finish=310.34\n          189180) rmSTD_TASKS      cpu=5 start=273.90 finish=310.51\n          189181) rmSTD_TASKS      cpu=11 start=273.90 finish=311.01\n          189182) rmSTD_TASKS      cpu=6 start=273.90 finish=310.93\n          189183) rmSTD_TASKS      cpu=4 start=273.90 finish=310.99\n          189184) rmSTD_TASKS      cpu=5 start=273.90 finish=310.71\n          189185) rmSTD_TASKS      cpu=5 start=273.90 finish=310.55\n          189186) rmSTD_TASKS      cpu=13 start=273.90 finish=310.43\n          189187) rmSTD_TASKS      cpu=13 start=273.90 finish=310.40\n          189188) rmSTD_TASKS      cpu=5 start=273.90 finish=310.53\n          189189) rmSTD_TASKS      cpu=0 start=273.90 finish=311.49\n          189190) rmSTD_TASKS      cpu=7 start=273.90 finish=311.45\n          189191) rmSTD_TASKS      cpu=2 start=273.90 finish=311.15\n          189192) rmSTD_TASKS      cpu=11 start=273.90 finish=311.49\n          189193) rmSTD_TASKS      cpu=14 start=273.90 finish=311.55\n          189194) rmSTD_TASKS      cpu=5 start=273.90 finish=311.49\n          189195) rmSTD_TASKS      cpu=12 start=273.90 finish=311.50\n          189196) rmSTD_TASKS      cpu=8 start=273.90 finish=311.93\n          189197) rmSTD_TASKS      cpu=9 start=273.90 finish=311.96\n          189198) rmSTD_TASKS      cpu=10 start=273.90 finish=311.89\n          189199) rmSTD_TASKS      cpu=14 start=273.90 finish=312.27\n          189200) rmSTD_TASKS      cpu=6 start=273.90 finish=312.13\n          189201) rmSTD_TASKS      cpu=13 start=273.90 finish=312.13\n          189202) rmSTD_TASKS      cpu=10 start=273.90 finish=312.14\n          189203) rmSTD_TASKS      cpu=1 start=273.90 finish=312.21\n          189204) rmSTD_TASKS      cpu=2 start=273.90 finish=312.27\n          189205) rmSTD_TASKS      cpu=5 start=273.90 finish=311.94\n          189206) rmSTD_TASKS      cpu=5 start=273.90 finish=312.32\n          189207) rmSTD_TASKS      cpu=0 start=273.90 finish=312.33\n          189208) rmSTD_TASKS      cpu=15 start=273.90 finish=312.39\n          189209) rmSTD_TASKS      cpu=2 start=273.90 finish=312.08\n          189210) rmSTD_TASKS      cpu=2 start=273.90 finish=312.16\n          189211) rmSTD_TASKS      cpu=9 start=273.90 finish=312.29\n          189212) rmSTD_TASKS      cpu=4 start=273.90 finish=312.31\n          189213) rmSTD_TASKS      cpu=11 start=273.90 finish=312.29\n          189214) rmSTD_TASKS      cpu=8 start=273.90 finish=312.25\n          189215) rmSTD_TASKS      cpu=10 start=273.90 finish=311.96\n          189216) rmSTD_TASKS      cpu=4 start=273.90 finish=312.10\n          189217) rmSTD_TASKS      cpu=14 start=273.90 finish=312.01\n          189218) rmSTD_TASKS      cpu=7 start=273.90 finish=312.13\n          189219) rmSTD_TASKS      cpu=0 start=273.90 finish=312.11\n          189220) rmSTD_TASKS      cpu=12 start=273.90 finish=312.16\n          189221) rmSTD_TASKS      cpu=12 start=273.90 finish=312.03\n          189222) rmSTD_TASKS      cpu=14 start=273.90 finish=312.03\n          189223) rmSTD_TASKS      cpu=10 start=273.90 finish=311.37\n          189224) rmSTD_TASKS      cpu=13 start=273.90 finish=311.55\n          189225) rmSTD_TASKS      cpu=9 start=273.90 finish=311.38\n          189226) rmSTD_TASKS      cpu=0 start=273.90 finish=311.51\n          189227) rmSTD_TASKS      cpu=3 start=273.90 finish=311.26\n          189228) rmSTD_TASKS      cpu=1 start=273.90 finish=311.36\n          189229) rmSTD_TASKS      cpu=11 start=273.90 finish=311.53\n          189230) rmSTD_TASKS      cpu=2 start=273.90 finish=310.92\n          189231) rmSTD_TASKS      cpu=14 start=273.90 finish=310.84\n          189232) rmSTD_TASKS      cpu=6 start=273.90 finish=310.83\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>A Mandelbrot fractal generator with four workloads using different parallelism methods: TBB, OpenMP, C++ tasks and C++ threads. Also showing different levels of parallelism with C++ threads the highest and C++ tasks moderately higher and OpenMP\/TBB matching the number of <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/toybrot\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2122","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2122","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=2122"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2122\/revisions"}],"predecessor-version":[{"id":2146,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2122\/revisions\/2146"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=2122"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}