{"id":2318,"date":"2024-06-02T19:30:27","date_gmt":"2024-06-02T19:30:27","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=2318"},"modified":"2024-06-03T23:51:43","modified_gmt":"2024-06-03T23:51:43","slug":"511-povray_r","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/cpu2017\/511-povray_r\/","title":{"rendered":"511.povray_r"},"content":{"rendered":"\n<p>povray is a SPEC CPU(R) benchmark written in C and C++ and described&nbsp;<a href=\"https:\/\/spec.org\/cpu2017\/Docs\/benchmarks\/511.povray_r.html\">here<\/a>.&nbsp;The workload runs on all logical cores.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-9.png\" alt=\"\" class=\"wp-image-2348\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-9.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-9-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-9-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows a high retirement rate with some backend stalls.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-10.png\" alt=\"\" class=\"wp-image-2350\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-10.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-10-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-10-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics confirms this runs on all cores. The backend stalls are a mixture of memory and CPU. There is a moderate level of L2 access and almost no L2 misses.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              1207.797\non_cpu               0.975          # 15.60 \/ 16 cores\nutime                18835.978\nstime                7.810\nnvcsw                26880          # 13.67%\nnivcsw               169808         # 86.33%\ninblock              0              # 0.00\/sec\nonblock              1750920        # 1449.68\/sec\ncpu-clock            18843959086829 # 18843.959 seconds\ntask-clock           18844041233899 # 18844.041 seconds\npage faults          1200044        # 63.683\/sec\ncontext switches     196017         # 10.402\/sec\ncpu migrations       187            # 0.010\/sec\nmajor page faults    848            # 0.045\/sec\nminor page faults    1199196        # 63.638\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             20597032052711 # 157.689 branches per 1000 inst\nbranch misses        52441407427    # 0.25% branch miss\nconditional          14324263755382 # 109.665 conditional branches per 1000 inst\nindirect             1423684510013  # 10.900 indirect branches per 1000 inst\ncpu-cycles           72261524343074 # 3.78 GHz\ninstructions         130633370200321 # 1.81 IPC\nslots                144529562538048 #\nretiring             45790610884003 # 31.7% (52.4%)\n-- ucode             452654600211   #     0.3%\n-- fastpath          45337956283792 #    31.4%\nfrontend             5742560763746  #  4.0% ( 6.6%)\n-- latency           3599231892750  #     2.5%\n-- bandwidth         2143328870996  #     1.5%\nbackend              34057388037444 # 23.6% (38.9%)\n-- cpu               16179137901330 #    11.2%\n-- memory            17878250136114 #    12.4%\nspeculation          1877350050617  #  1.3% ( 2.1%)\n-- branch mispredict 1551312091401  #     1.1%\n-- pipeline restart  326037959216   #     0.2%\nsmt-contention       57061496732727 # 39.5% ( 0.0%)\ncpu-cycles           72310084126317 # 3.76 GHz\ninstructions         130637928317245 # 1.81 IPC\ninstructions         43547668810124 # 63.187 l2 access per 1000 inst\nl2 hit from l1       2423517159190  # 0.06% l2 miss\nl2 miss from l1      873514352      #\nl2 hit from l2 pf    327458307998   #\nl3 hit from l2 pf    606029657      #\nl3 miss from l2 pf   59682300       #\ninstructions         43529508664435 # 244.832 float per 1000 inst\nfloat 512            289            # 0.000 AVX-512 per 1000 inst\nfloat 256            2219972971     # 0.051 AVX-256 per 1000 inst\nfloat 128            10655211619067 # 244.781 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\ninstructions         130617322132691 #\nopcache              20744462326405 # 158.819 opcache per 1000 inst\nopcache miss         771836069050   #  3.7% opcache miss rate\nl1 dTLB miss         658886665708   # 5.044 L1 dTLB per 1000 inst\nl2 dTLB miss         8891149600     # 0.068 L2 dTLB per 1000 inst\ninstructions         130617394601927 #\nicache               1101558221297  # 8.433 icache per 1000 inst\nicache miss          230432987096   # 20.9% icache miss rate\nl1 iTLB miss         73327277129    # 0.561 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            105996         # 0.000 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Process overview shows almost all time spent in povray_r_base.m<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>691 processes\n\t 48 povray_r_base.m      18713.39     2.26\n\t 71 specperl                41.33     1.60\n\t 48 imagevalidate_5          8.77     1.21\n\t  2 clang++                  0.02     0.01\n\t  2 clang                    0.01     0.01\n\t 10 ps                       0.00     0.01\n\t225 sh                       0.00     0.00\n\t 54 specrxp                  0.00     0.00\n\t 48 bash                     0.00     0.00\n\t 41 specinvoke               0.00     0.00\n\t 22 cat                      0.00     0.00\n\t 21 grep                     0.00     0.00\n\t 12 uniq                     0.00     0.00\n\t 11 sort                     0.00     0.00\n\t 10 expand                   0.00     0.00\n\t  7 specmake                 0.00     0.00\n\t  6 pwd                      0.00     0.00\n\t  5 basename                 0.00     0.00\n\t  5 systemctl                0.00     0.00\n\t  4 rm                       0.00     0.00\n\t  4 specpp                   0.00     0.00\n\t  4 uname                    0.00     0.00\n\t  3 dirname                  0.00     0.00\n\t  3 dmidecode                0.00     0.00\n\t  3 lscpu                    0.00     0.00\n\t  2 df                       0.00     0.00\n\t  2 dpkg                     0.00     0.00\n\t  2 runcpu                   0.00     0.00\n\t  2 specsha512sum            0.00     0.00\n\t  2 specxz                   0.00     0.00\n\t  2 who                      0.00     0.00\n\t  1 cpupower                 0.00     0.00\n\t  1 head                     0.00     0.00\n\t  1 logname                  0.00     0.00\n\t  1 ls                       0.00     0.00\n\t  1 lsb_release              0.00     0.00\n\t  1 numactl                  0.00     0.00\n\t  1 sysctl                   0.00     0.00\n\t  1 w                        0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 which                    0.00     0.00\n0 processes running\n53 maximum processes\n<\/code><\/pre>\n\n\n\n<p>specinvoke fires up separate processes for each logical core.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>    400740) specinvoke       cpu=2 start=3.81  finish=395.96\n      400742) sh               cpu=13 start=3.81  finish=393.80\n        400749) bash             cpu=0 start=3.81  finish=393.80\n          400775) povray_r_base.m  cpu=0 start=3.81  finish=393.79\n      400743) sh               cpu=2 start=3.81  finish=393.56\n        400752) bash             cpu=1 start=3.81  finish=393.56\n          400776) povray_r_base.m  cpu=1 start=3.81  finish=393.56\n      400744) sh               cpu=2 start=3.81  finish=391.13\n        400751) bash             cpu=2 start=3.81  finish=391.13\n          400777) povray_r_base.m  cpu=2 start=3.81  finish=391.13\n      400745) sh               cpu=15 start=3.81  finish=393.58\n        400753) bash             cpu=3 start=3.81  finish=393.58\n          400774) povray_r_base.m  cpu=3 start=3.81  finish=393.58\n      400746) sh               cpu=2 start=3.81  finish=392.22\n        400759) bash             cpu=4 start=3.81  finish=392.22\n          400779) povray_r_base.m  cpu=4 start=3.81  finish=392.22\n      400747) sh               cpu=8 start=3.81  finish=395.02\n        400754) bash             cpu=5 start=3.81  finish=395.02\n          400778) povray_r_base.m  cpu=5 start=3.81  finish=395.02\n      400748) sh               cpu=9 start=3.81  finish=395.96\n        400757) bash             cpu=6 start=3.81  finish=395.96\n          400780) povray_r_base.m  cpu=6 start=3.82  finish=395.96\n      400750) sh               cpu=7 start=3.81  finish=393.05\n        400764) bash             cpu=7 start=3.81  finish=393.05\n          400782) povray_r_base.m  cpu=7 start=3.82  finish=393.05\n      400755) sh               cpu=9 start=3.81  finish=394.66\n        400765) bash             cpu=8 start=3.81  finish=394.65\n          400781) povray_r_base.m  cpu=8 start=3.82  finish=394.65\n      400756) sh               cpu=2 start=3.81  finish=393.30\n        400763) bash             cpu=9 start=3.81  finish=393.30\n          400783) povray_r_base.m  cpu=9 start=3.82  finish=393.30\n      400758) sh               cpu=12 start=3.81  finish=394.22\n        400768) bash             cpu=10 start=3.81  finish=394.22\n          400784) povray_r_base.m  cpu=10 start=3.82  finish=394.22\n      400760) sh               cpu=2 start=3.81  finish=392.57\n        400769) bash             cpu=11 start=3.81  finish=392.57\n          400787) povray_r_base.m  cpu=11 start=3.82  finish=392.57\n      400761) sh               cpu=2 start=3.81  finish=393.60\n        400770) bash             cpu=12 start=3.81  finish=393.60\n          400785) povray_r_base.m  cpu=12 start=3.82  finish=393.60\n      400762) sh               cpu=15 start=3.81  finish=393.13\n        400771) bash             cpu=13 start=3.81  finish=393.13\n          400789) povray_r_base.m  cpu=13 start=3.82  finish=393.13\n      400766) sh               cpu=10 start=3.81  finish=395.03\n        400772) bash             cpu=14 start=3.81  finish=395.03\n          400786) povray_r_base.m  cpu=14 start=3.82  finish=395.03\n      400767) sh               cpu=2 start=3.81  finish=393.01\n        400773) bash             cpu=15 start=3.81  finish=393.01\n          400788) povray_r_base.m  cpu=15 start=3.82  finish=393.01\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>povray is a SPEC CPU(R) benchmark written in C and C++ and described&nbsp;here.&nbsp;The workload runs on all logical cores. Topdown profile shows a high retirement rate with some backend stalls. AMD metrics confirms this runs on all cores. The backend <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/cpu2017\/511-povray_r\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":2297,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2318","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2318","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=2318"}],"version-history":[{"count":3,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2318\/revisions"}],"predecessor-version":[{"id":2351,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2318\/revisions\/2351"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2297"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=2318"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}