{"id":2360,"date":"2024-06-04T00:01:44","date_gmt":"2024-06-04T00:01:44","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=2360"},"modified":"2024-06-05T11:47:31","modified_gmt":"2024-06-05T11:47:31","slug":"549-fotonik3d_r","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/cpu2017\/549-fotonik3d_r\/","title":{"rendered":"549.fotonik3d_r"},"content":{"rendered":"\n<p>fotonik3d is a SPEC CPU(R) benchmark described <a href=\"https:\/\/spec.org\/cpu2017\/Docs\/benchmarks\/549.fotonik3d_r.html\">here<\/a> and written in Fortran. The workload runs on all logical cores.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-17.png\" alt=\"\" class=\"wp-image-2426\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-17.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-17-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-17-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows a backend-bound workload.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-18.png\" alt=\"\" class=\"wp-image-2427\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-18.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-18-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-18-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics on 7840 shows high  memory stalls, low branching and ~137 L2 access per 1000 instructions with a 44% L2 miss rate.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              4829.371\non_cpu               0.994          # 15.91 \/ 16 cores\nutime                76744.860\nstime                67.845\nnvcsw                98342          # 11.11%\nnivcsw               786651         # 88.89%\ninblock              0              # 0.00\/sec\nonblock              159088         # 32.94\/sec\ncpu-clock            76838330970840 # 76838.331 seconds\ntask-clock           76840232519543 # 76840.233 seconds\npage faults          11120194       # 144.718\/sec\ncontext switches     884416         # 11.510\/sec\ncpu migrations       191            # 0.002\/sec\nmajor page faults    822            # 0.011\/sec\nminor page faults    11119372       # 144.708\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             1394664132044  # 36.518 branches per 1000 inst\nbranch misses        6248634714     # 0.45% branch miss\nconditional          1294948939127  # 33.907 conditional branches per 1000 inst\nindirect             17971867181    # 0.471 indirect branches per 1000 inst\ncpu-cycles           351032269380617 # 4.54 GHz\ninstructions         38201948431292 # 0.11 IPC low\nslots                701926456849818 #\nretiring             13496311935245 #  1.9% ( 2.0%) low\n-- ucode             1040831056     #     0.0%\n-- fastpath          13495271104189 #     1.9%\nfrontend             12762747845567 #  1.8% ( 1.9%) low\n-- latency           9390303604080  #     1.3%\n-- bandwidth         3372444241487  #     0.5%\nbackend              663082120573306 # 94.5% (96.1%) high\n-- cpu               18630364177550 #     2.7%\n-- memory            644451756395756 #    91.8%\nspeculation          375625644534   #  0.1% ( 0.1%) low\n-- branch mispredict 151442673430   #     0.0%\n-- pipeline restart  224182971104   #     0.0%\nsmt-contention       12209406487594 #  1.7% ( 0.0%)\ncpu-cycles           350708633161581 # 4.54 GHz\ninstructions         38203258107427 # 0.11 IPC low\ninstructions         12734020931776 # 137.645 l2 access per 1000 inst\nl2 hit from l1       1292934828906  # 44.60% l2 miss\nl2 miss from l1      423876374872   #\nl2 hit from l2 pf    101912368480   #\nl3 hit from l2 pf    6918925782     #\nl3 miss from l2 pf   351009603803   #\ninstructions         12732077618986 # 286.386 float per 1000 inst\nfloat 512            267            # 0.000 AVX-512 per 1000 inst\nfloat 256            3758           # 0.000 AVX-256 per 1000 inst\nfloat 128            3646292606751  # 286.386 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\ninstructions         38186548499500 #\nopcache              3472233299633  # 90.928 opcache per 1000 inst\nopcache miss         237736886993   #  6.8% opcache miss rate\nl1 dTLB miss         136945139539   # 3.586 L1 dTLB per 1000 inst\nl2 dTLB miss         101063820423   # 2.647 L2 dTLB per 1000 inst\ninstructions         38186835935369 #\nicache               320200387260   # 8.385 icache per 1000 inst\nicache miss          41085810541    # 12.8% icache miss rate\nl1 iTLB miss         227493969      # 0.006 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            226825         # 0.000 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Process overview shows time spent in fotonik3d_r_bas<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>498 processes\n\t 32 fotonik3d_r_bas      51237.24    29.10\n\t 47 specperl                17.10     4.07\n\t  3 specxz                   0.06     0.03\n\t  1 lsb_release              0.01     0.00\n\t 10 ps                       0.00     0.01\n\t  1 flang                    0.00     0.01\n\t140 sh                       0.00     0.00\n\t 38 specinvoke               0.00     0.00\n\t 33 bash                     0.00     0.00\n\t 21 grep                     0.00     0.00\n\t 20 cat                      0.00     0.00\n\t 12 uniq                     0.00     0.00\n\t 11 sort                     0.00     0.00\n\t 10 expand                   0.00     0.00\n\t  7 pwd                      0.00     0.00\n\t  5 basename                 0.00     0.00\n\t  5 specmake                 0.00     0.00\n\t  5 specrxp                  0.00     0.00\n\t  5 systemctl                0.00     0.00\n\t  4 specpp                   0.00     0.00\n\t  4 uname                    0.00     0.00\n\t  3 dirname                  0.00     0.00\n\t  3 dmidecode                0.00     0.00\n\t  3 lscpu                    0.00     0.00\n\t  2 df                       0.00     0.00\n\t  2 dpkg                     0.00     0.00\n\t  2 rm                       0.00     0.00\n\t  2 runcpu                   0.00     0.00\n\t  2 specsha512sum            0.00     0.00\n\t  2 who                      0.00     0.00\n\t  1 cpupower                 0.00     0.00\n\t  1 head                     0.00     0.00\n\t  1 logname                  0.00     0.00\n\t  1 ls                       0.00     0.00\n\t  1 numactl                  0.00     0.00\n\t  1 sysctl                   0.00     0.00\n\t  1 w                        0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 which                    0.00     0.00\n54 processes running\n54 maximum processes\n<\/code><\/pre>\n\n\n\n<p>specinvoke fires up these processes in parallel<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>    471030) specinvoke       cpu=0 start=3.83  finish=1612.78\n      471032) sh               cpu=0 start=3.83  finish=1606.51\n        471037) bash             cpu=0 start=3.83  finish=1606.51\n          471064) fotonik3d_r_bas  cpu=0 start=3.84  finish=1606.31\n      471033) sh               cpu=14 start=3.83  finish=1610.14\n        471042) bash             cpu=1 start=3.83  finish=1610.14\n          471068) fotonik3d_r_bas  cpu=1 start=3.84  finish=1610.03\n      471034) ?? cpu=0 start=3.83  finish=0.00 \n        471045) bash             cpu=2 start=3.83  finish=1607.81\n          471067) fotonik3d_r_bas  cpu=2 start=3.84  finish=1607.61\n      471035) sh               cpu=0 start=3.83  finish=1612.78\n        471041) bash             cpu=3 start=3.83  finish=1612.78\n          471065) fotonik3d_r_bas  cpu=3 start=3.84  finish=1612.72\n      471036) sh               cpu=14 start=3.83  finish=1609.92\n        471047) bash             cpu=4 start=3.83  finish=1609.92\n          471069) fotonik3d_r_bas  cpu=4 start=3.84  finish=1609.80\n      471038) sh               cpu=10 start=3.83  finish=1612.42\n        471043) bash             cpu=5 start=3.83  finish=1612.42\n          471066) fotonik3d_r_bas  cpu=5 start=3.84  finish=1612.35\n      471039) sh               cpu=10 start=3.83  finish=1609.19\n        471049) bash             cpu=6 start=3.83  finish=1609.19\n          471071) fotonik3d_r_bas  cpu=6 start=3.84  finish=1609.03\n      471040) sh               cpu=6 start=3.83  finish=1611.71\n        471052) bash             cpu=7 start=3.83  finish=1611.70\n          471070) fotonik3d_r_bas  cpu=7 start=3.84  finish=1611.62\n      471044) sh               cpu=8 start=3.83  finish=1606.42\n        471053) bash             cpu=8 start=3.83  finish=1606.42\n          471072) fotonik3d_r_bas  cpu=8 start=3.84  finish=1606.16\n      471046) sh               cpu=4 start=3.83  finish=1610.11\n        471056) bash             cpu=9 start=3.83  finish=1610.11\n          471074) fotonik3d_r_bas  cpu=9 start=3.84  finish=1609.98\n      471048) sh               cpu=10 start=3.83  finish=1607.89\n        471058) bash             cpu=10 start=3.83  finish=1607.89\n          471073) fotonik3d_r_bas  cpu=10 start=3.84  finish=1607.71\n      471050) sh               cpu=12 start=3.83  finish=1612.76\n        471059) bash             cpu=11 start=3.84  finish=1612.76\n          471077) fotonik3d_r_bas  cpu=11 start=3.84  finish=1612.69\n      471051) sh               cpu=14 start=3.83  finish=1609.96\n        471060) bash             cpu=12 start=3.84  finish=1609.96\n          471078) fotonik3d_r_bas  cpu=12 start=3.84  finish=1609.85\n      471054) sh               cpu=6 start=3.83  finish=1612.40\n        471061) bash             cpu=13 start=3.84  finish=1612.40\n          471075) fotonik3d_r_bas  cpu=13 start=3.84  finish=1612.31\n      471055) sh               cpu=14 start=3.83  finish=1609.22\n        471062) bash             cpu=14 start=3.84  finish=1609.22\n          471076) fotonik3d_r_bas  cpu=14 start=3.84  finish=1609.11\n      471057) sh               cpu=14 start=3.83  finish=1611.68\n        471063) bash             cpu=15 start=3.84  finish=1611.68\n          471079) fotonik3d_r_bas  cpu=15 start=3.84  finish=1611.57\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>fotonik3d is a SPEC CPU(R) benchmark described here and written in Fortran. The workload runs on all logical cores. Topdown profile shows a backend-bound workload. AMD metrics on 7840 shows high memory stalls, low branching and ~137 L2 access per <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/cpu2017\/549-fotonik3d_r\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":2297,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2360","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2360","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=2360"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2360\/revisions"}],"predecessor-version":[{"id":2429,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2360\/revisions\/2429"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2297"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=2360"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}