{"id":2333,"date":"2024-06-02T23:40:03","date_gmt":"2024-06-02T23:40:03","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=2333"},"modified":"2024-06-04T12:10:36","modified_gmt":"2024-06-04T12:10:36","slug":"526-blender_r","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/cpu2017\/526-blender_r\/","title":{"rendered":"526.blender_r"},"content":{"rendered":"\n<p>blender is a SPEC CPU(R) benchmark described&nbsp;<a href=\"https:\/\/spec.org\/cpu2017\/Docs\/benchmarks\/526.blender_r.html\">here<\/a>&nbsp;and written in C and C++. The workload runs on all logical cores.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-12.png\" alt=\"\" class=\"wp-image-2371\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-12.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-12-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-12-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows moderate retirement rate with backend stalls. Also a somewhat higher branch mis-prediction ratio.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-13.png\" alt=\"\" class=\"wp-image-2372\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-13.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-13-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-13-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show a large number of conditional branches, a lot of floating point.  L2 access in contrast is lower.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              1003.543\non_cpu               0.976          # 15.62 \/ 16 cores\nutime                15577.792\nstime                93.492\nnvcsw                25699          # 14.13%\nnivcsw               156210         # 85.87%\ninblock              0              # 0.00\/sec\nonblock              657472         # 655.15\/sec\ncpu-clock            15672524593811 # 15672.525 seconds\ntask-clock           15672677832025 # 15672.678 seconds\npage faults          31056598       # 1981.576\/sec\ncontext switches     181238         # 11.564\/sec\ncpu migrations       217            # 0.014\/sec\nmajor page faults    3799           # 0.242\/sec\nminor page faults    31052799       # 1981.333\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             9162902970976  # 134.627 branches per 1000 inst\nbranch misses        191136129020   # 2.09% branch miss\nconditional          8195526125542  # 120.414 conditional branches per 1000 inst\nindirect             85457359205    # 1.256 indirect branches per 1000 inst\ncpu-cycles           62796772217200 # 3.88 GHz\ninstructions         68050948821278 # 1.08 IPC\nslots                125582309615328 #\nretiring             22462534406699 # 17.9% (24.9%)\n-- ucode             4511747160     #     0.0%\n-- fastpath          22458022659539 #    17.9%\nfrontend             5992219329471  #  4.8% ( 6.6%)\n-- latency           4651956387924  #     3.7%\n-- bandwidth         1340262941547  #     1.1%\nbackend              56277781669765 # 44.8% (62.3%)\n-- cpu               17120809186613 #    13.6%\n-- memory            39156972483152 #    31.2%\nspeculation          5623383914045  #  4.5% ( 6.2%)\n-- branch mispredict 5590516227249  #     4.5%\n-- pipeline restart  32867686796    #     0.0%\nsmt-contention       35226327822583 # 28.1% ( 0.0%)\ncpu-cycles           62413646220576 # 3.88 GHz\ninstructions         68057299082957 # 1.09 IPC\ninstructions         22689634557097 # 31.225 l2 access per 1000 inst\nl2 hit from l1       625483301949   # 15.56% l2 miss\nl2 miss from l1      61732557357    #\nl2 hit from l2 pf    34508483022    #\nl3 hit from l2 pf    15862924695    #\nl3 miss from l2 pf   32620726863    #\ninstructions         22676887930929 # 397.608 float per 1000 inst\nfloat 512            236            # 0.000 AVX-512 per 1000 inst\nfloat 256            194075558      # 0.009 AVX-256 per 1000 inst\nfloat 128            9016321382594  # 397.600 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\ninstructions         68048780839108 #\nopcache              11281642777070 # 165.788 opcache per 1000 inst\nopcache miss         110863695024   #  1.0% opcache miss rate\nl1 dTLB miss         795057958815   # 11.684 L1 dTLB per 1000 inst\nl2 dTLB miss         20403293260    # 0.300 L2 dTLB per 1000 inst\ninstructions         68048507612293 #\nicache               177196855525   # 2.604 icache per 1000 inst\nicache miss          50061354268    # 28.3% icache miss rate\nl1 iTLB miss         4181404456     # 0.061 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            10990627       # 0.000 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Process overview shows blender_r_base. as the primary process.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>691 processes\n\t 48 blender_r_base.      15538.72    89.32\n\t 71 specperl                17.90     1.95\n\t 48 imagevalidate_5          1.20     0.10\n\t  2 clang                    0.02     0.00\n\t 10 ps                       0.01     0.01\n\t  2 clang++                  0.01     0.01\n\t  1 lsb_release              0.01     0.00\n\t225 sh                       0.00     0.00\n\t 54 specrxp                  0.00     0.00\n\t 48 bash                     0.00     0.00\n\t 41 specinvoke               0.00     0.00\n\t 22 cat                      0.00     0.00\n\t 21 grep                     0.00     0.00\n\t 12 uniq                     0.00     0.00\n\t 11 sort                     0.00     0.00\n\t 10 expand                   0.00     0.00\n\t  7 specmake                 0.00     0.00\n\t  6 pwd                      0.00     0.00\n\t  5 basename                 0.00     0.00\n\t  5 systemctl                0.00     0.00\n\t  4 rm                       0.00     0.00\n\t  4 specpp                   0.00     0.00\n\t  4 uname                    0.00     0.00\n\t  3 dirname                  0.00     0.00\n\t  3 dmidecode                0.00     0.00\n\t  3 lscpu                    0.00     0.00\n\t  2 df                       0.00     0.00\n\t  2 dpkg                     0.00     0.00\n\t  2 runcpu                   0.00     0.00\n\t  2 specsha512sum            0.00     0.00\n\t  2 specxz                   0.00     0.00\n\t  2 who                      0.00     0.00\n\t  1 cpupower                 0.00     0.00\n\t  1 head                     0.00     0.00\n\t  1 logname                  0.00     0.00\n\t  1 ls                       0.00     0.00\n\t  1 numactl                  0.00     0.00\n\t  1 sysctl                   0.00     0.00\n\t  1 w                        0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 which                    0.00     0.00\n0 processes running\n53 maximum processes\n<\/code><\/pre>\n\n\n\n<p>specinvoke fires up the blender processes in parallel<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>    430775) specinvoke       cpu=5 start=333.05 finish=660.87\n      430777) sh               cpu=0 start=333.05 finish=656.15\n        430785) bash             cpu=0 start=333.06 finish=656.15\n          430809) blender_r_base.  cpu=0 start=333.06 finish=656.10\n      430778) sh               cpu=5 start=333.05 finish=659.52\n        430786) bash             cpu=1 start=333.06 finish=659.52\n          430810) blender_r_base.  cpu=1 start=333.06 finish=659.47\n      430779) sh               cpu=0 start=333.05 finish=659.91\n        430787) bash             cpu=2 start=333.06 finish=659.91\n          430811) blender_r_base.  cpu=2 start=333.06 finish=659.87\n      430780) sh               cpu=5 start=333.05 finish=659.62\n        430788) bash             cpu=3 start=333.06 finish=659.62\n          430813) blender_r_base.  cpu=3 start=333.06 finish=659.57\n      430781) sh               cpu=10 start=333.05 finish=660.86\n        430796) bash             cpu=4 start=333.06 finish=660.86\n          430816) blender_r_base.  cpu=4 start=333.06 finish=660.81\n      430782) sh               cpu=5 start=333.05 finish=658.54\n        430794) bash             cpu=5 start=333.06 finish=658.54\n          430815) blender_r_base.  cpu=5 start=333.06 finish=658.50\n      430783) sh               cpu=13 start=333.05 finish=659.24\n        430791) bash             cpu=6 start=333.06 finish=659.24\n          430812) blender_r_base.  cpu=6 start=333.06 finish=659.19\n      430784) sh               cpu=6 start=333.06 finish=659.63\n        430797) bash             cpu=7 start=333.06 finish=659.63\n          430818) blender_r_base.  cpu=7 start=333.06 finish=659.57\n      430789) sh               cpu=8 start=333.06 finish=655.29\n        430799) bash             cpu=8 start=333.06 finish=655.29\n          430814) blender_r_base.  cpu=8 start=333.06 finish=655.23\n      430790) sh               cpu=7 start=333.06 finish=660.06\n        430800) bash             cpu=9 start=333.06 finish=660.06\n          430817) blender_r_base.  cpu=9 start=333.06 finish=660.02\n      430792) sh               cpu=3 start=333.06 finish=659.83\n        430802) bash             cpu=10 start=333.06 finish=659.83\n          430819) blender_r_base.  cpu=10 start=333.06 finish=659.77\n      430793) sh               cpu=13 start=333.06 finish=660.11\n        430804) bash             cpu=11 start=333.06 finish=660.11\n          430820) blender_r_base.  cpu=11 start=333.06 finish=660.08\n      430795) sh               cpu=11 start=333.06 finish=660.87\n        430805) bash             cpu=12 start=333.06 finish=660.87\n          430821) blender_r_base.  cpu=12 start=333.06 finish=660.82\n      430798) sh               cpu=8 start=333.06 finish=657.84\n        430806) bash             cpu=13 start=333.06 finish=657.84\n          430822) blender_r_base.  cpu=13 start=333.06 finish=657.78\n      430801) sh               cpu=14 start=333.06 finish=658.27\n        430807) bash             cpu=14 start=333.06 finish=658.27\n          430823) blender_r_base.  cpu=14 start=333.06 finish=658.21\n      430803) sh               cpu=14 start=333.06 finish=659.98\n        430808) bash             cpu=15 start=333.06 finish=659.98\n          430824) blender_r_base.  cpu=15 start=333.06 finish=659.94\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>blender is a SPEC CPU(R) benchmark described&nbsp;here&nbsp;and written in C and C++. The workload runs on all logical cores. Topdown profile shows moderate retirement rate with backend stalls. Also a somewhat higher branch mis-prediction ratio. AMD metrics show a large <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/cpu2017\/526-blender_r\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":2297,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2333","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2333","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=2333"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2333\/revisions"}],"predecessor-version":[{"id":2374,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2333\/revisions\/2374"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2297"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=2333"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}