{"id":2316,"date":"2024-06-02T19:28:11","date_gmt":"2024-06-02T19:28:11","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=2316"},"modified":"2024-06-03T10:22:23","modified_gmt":"2024-06-03T10:22:23","slug":"510-parest_r","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/cpu2017\/510-parest_r\/","title":{"rendered":"510.parest_r"},"content":{"rendered":"\n<p>parest is a SPEC CPU(R) benchmark written in C++ and described <a href=\"https:\/\/spec.org\/cpu2017\/Docs\/benchmarks\/510.parest_r.html\">here.<\/a> The workload runs on all logical cores.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-8.png\" alt=\"\" class=\"wp-image-2337\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-8.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-8-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-8-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows a backend-bound workload with several transition periods.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-9.png\" alt=\"\" class=\"wp-image-2339\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-9.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-9-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-9-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics for 7840 processor confirm topdown memory bound workload.  Approximately 100 L2 accesses but a 35% L2 miss rate.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              4550.517\non_cpu               0.987          # 15.80 \/ 16 cores\nutime                71807.447\nstime                70.527\nnvcsw                97192          # 12.30%\nnivcsw               693175         # 87.70%\ninblock              0              # 0.00\/sec\nonblock              76808          # 16.88\/sec\ncpu-clock            71894556471080 # 71894.556 seconds\ntask-clock           71895682964243 # 71895.683 seconds\npage faults          15322746       # 213.125\/sec\ncontext switches     788968         # 10.974\/sec\ncpu migrations       575            # 0.008\/sec\nmajor page faults    3608           # 0.050\/sec\nminor page faults    15319138       # 213.075\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             9946444006569  # 106.362 branches per 1000 inst\nbranch misses        158093543610   # 1.59% branch miss\nconditional          8506498396961  # 90.964 conditional branches per 1000 inst\nindirect             303832503081   # 3.249 indirect branches per 1000 inst\ncpu-cycles           323469504283127 # 4.41 GHz\ninstructions         93523365819643 # 0.29 IPC low\nslots                646833742714674 #\nretiring             31158742609143 #  4.8% ( 5.8%) low\n-- ucode             44362326367    #     0.0%\n-- fastpath          31114380282776 #     4.8%\nfrontend             22505474246514 #  3.5% ( 4.2%) low\n-- latency           12087807418272 #     1.9%\n-- bandwidth         10417666828242 #     1.6%\nbackend              479910375744452 # 74.2% (89.7%) high\n-- cpu               44026151893931 #     6.8%\n-- memory            435884223850521 #    67.4%\nspeculation          1582705091260  #  0.2% ( 0.3%) low\n-- branch mispredict 1526293395642  #     0.2%\n-- pipeline restart  56411695618    #     0.0%\nsmt-contention       111676232212003 # 17.3% ( 0.0%)\ncpu-cycles           322909243440924 # 4.41 GHz\ninstructions         93529964190307 # 0.29 IPC low\ninstructions         31187453335685 # 93.116 l2 access per 1000 inst\nl2 hit from l1       2031473661222  # 34.77% l2 miss\nl2 miss from l1      363444129199   #\nl2 hit from l2 pf    226198887767   #\nl3 hit from l2 pf    141275266043   #\nl3 miss from l2 pf   505093333915   #\ninstructions         31162917306696 # 338.504 float per 1000 inst\nfloat 512            465            # 0.000 AVX-512 per 1000 inst\nfloat 256            6475912        # 0.000 AVX-256 per 1000 inst\nfloat 128            10548758414246 # 338.504 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\ninstructions         93509220691832 #\nopcache              12838123147590 # 137.293 opcache per 1000 inst\nopcache miss         185854581876   #  1.4% opcache miss rate\nl1 dTLB miss         94458089967    # 1.010 L1 dTLB per 1000 inst\nl2 dTLB miss         7505584655     # 0.080 L2 dTLB per 1000 inst\ninstructions         93521241174553 #\nicache               277631017236   # 2.969 icache per 1000 inst\nicache miss          64769736080    # 23.3% icache miss rate\nl1 iTLB miss         2459809558     # 0.026 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            1618639        # 0.000 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>The process overview shows parest_r_base.m as the primary process.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>1064 processes\n\t 32 parest_r_base.m      47764.64    29.12\n\t334 specperl                43.05     7.08\n\t 10 ps                       0.01     0.01\n\t  1 lsb_release              0.01     0.00\n\t 33 specinvoke               0.00     0.05\n\t  1 clang++                  0.00     0.01\n\t428 sh                       0.00     0.00\n\t 32 bash                     0.00     0.00\n\t 21 grep                     0.00     0.00\n\t 20 cat                      0.00     0.00\n\t 12 uniq                     0.00     0.00\n\t 11 sort                     0.00     0.00\n\t 10 expand                   0.00     0.00\n\t  6 pwd                      0.00     0.00\n\t  5 basename                 0.00     0.00\n\t  5 specmake                 0.00     0.00\n\t  5 specrxp                  0.00     0.00\n\t  5 systemctl                0.00     0.00\n\t  4 specpp                   0.00     0.00\n\t  4 uname                    0.00     0.00\n\t  3 dirname                  0.00     0.00\n\t  3 dmidecode                0.00     0.00\n\t  3 lscpu                    0.00     0.00\n\t  2 df                       0.00     0.00\n\t  2 dpkg                     0.00     0.00\n\t  2 rm                       0.00     0.00\n\t  2 runcpu                   0.00     0.00\n\t  2 specsha512sum            0.00     0.00\n\t  2 specxz                   0.00     0.00\n\t  2 who                      0.00     0.00\n\t  1 cpupower                 0.00     0.00\n\t  1 head                     0.00     0.00\n\t  1 logname                  0.00     0.00\n\t  1 ls                       0.00     0.00\n\t  1 numactl                  0.00     0.00\n\t  1 sysctl                   0.00     0.00\n\t  1 w                        0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 which                    0.00     0.00\n53 processes running\n53 maximum processes\n<\/code><\/pre>\n\n\n\n<p>specinvoke fires up separate processes on each core.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>    392467) specinvoke       cpu=2 start=3.38  finish=1513.77\n      392469) sh               cpu=10 start=3.38  finish=1507.68\n        392476) bash             cpu=0 start=3.38  finish=1507.68\n          392502) parest_r_base.m  cpu=0 start=3.38  finish=1507.65\n      392470) sh               cpu=0 start=3.38  finish=1511.12\n        392481) bash             cpu=1 start=3.38  finish=1511.12\n          392507) parest_r_base.m  cpu=1 start=3.38  finish=1511.09\n      392471) sh               cpu=12 start=3.38  finish=1477.99\n        392479) bash             cpu=2 start=3.38  finish=1477.99\n          392505) parest_r_base.m  cpu=2 start=3.38  finish=1477.94\n      392472) sh               cpu=0 start=3.38  finish=1512.29\n        392478) bash             cpu=3 start=3.38  finish=1512.29\n          392501) parest_r_base.m  cpu=3 start=3.38  finish=1512.26\n      392473) sh               cpu=12 start=3.38  finish=1476.53\n        392483) bash             cpu=4 start=3.38  finish=1476.53\n          392503) parest_r_base.m  cpu=4 start=3.38  finish=1476.48\n      392474) sh               cpu=0 start=3.38  finish=1513.76\n        392485) bash             cpu=5 start=3.38  finish=1513.76\n          392504) parest_r_base.m  cpu=5 start=3.38  finish=1513.74\n      392475) sh               cpu=4 start=3.38  finish=1492.40\n        392486) bash             cpu=6 start=3.38  finish=1492.40\n          392506) parest_r_base.m  cpu=6 start=3.38  finish=1492.36\n      392477) sh               cpu=10 start=3.38  finish=1504.26\n        392488) bash             cpu=7 start=3.38  finish=1504.26\n          392508) parest_r_base.m  cpu=7 start=3.38  finish=1504.23\n      392480) sh               cpu=6 start=3.38  finish=1507.65\n        392491) bash             cpu=8 start=3.38  finish=1507.65\n          392509) parest_r_base.m  cpu=8 start=3.38  finish=1507.62\n      392482) sh               cpu=14 start=3.38  finish=1511.41\n        392494) bash             cpu=9 start=3.38  finish=1511.41\n          392515) parest_r_base.m  cpu=9 start=3.39  finish=1511.39\n      392484) sh               cpu=10 start=3.38  finish=1479.81\n        392495) bash             cpu=10 start=3.38  finish=1479.81\n          392510) parest_r_base.m  cpu=10 start=3.39  finish=1479.77\n      392487) sh               cpu=9 start=3.38  finish=1512.24\n        392496) bash             cpu=11 start=3.38  finish=1512.24\n          392511) parest_r_base.m  cpu=11 start=3.39  finish=1512.21\n      392489) sh               cpu=12 start=3.38  finish=1474.79\n        392497) bash             cpu=12 start=3.38  finish=1474.79\n          392512) parest_r_base.m  cpu=12 start=3.39  finish=1474.73\n      392490) sh               cpu=15 start=3.38  finish=1513.68\n        392498) bash             cpu=13 start=3.38  finish=1513.68\n          392513) parest_r_base.m  cpu=13 start=3.39  finish=1513.65\n      392492) sh               cpu=10 start=3.38  finish=1486.93\n        392499) bash             cpu=14 start=3.38  finish=1486.93\n          392514) parest_r_base.m  cpu=14 start=3.39  finish=1486.88\n      392493) sh               cpu=10 start=3.38  finish=1502.00\n        392500) bash             cpu=15 start=3.38  finish=1502.00\n          392516) parest_r_base.m  cpu=15 start=3.39  finish=1501.95\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>parest is a SPEC CPU(R) benchmark written in C++ and described here. The workload runs on all logical cores. Topdown profile shows a backend-bound workload with several transition periods. AMD metrics for 7840 processor confirm topdown memory bound workload. Approximately <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/cpu2017\/510-parest_r\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":2297,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2316","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2316","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=2316"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2316\/revisions"}],"predecessor-version":[{"id":2340,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2316\/revisions\/2340"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2297"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=2316"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}