{"id":1925,"date":"2024-03-02T16:10:27","date_gmt":"2024-03-02T16:10:27","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=1925"},"modified":"2024-03-03T18:50:23","modified_gmt":"2024-03-03T18:50:23","slug":"daphne","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/daphne\/","title":{"rendered":"daphne"},"content":{"rendered":"\n<p>The Darmstadt Automotive Parallel Heterogeneous Benchmark Suite that tries benchmarks with OpenCL and OpenMP for automotive benchmarks. The OpenCL ones do not run. Most of these appear to be single-threaded despite the OpenMP indicator.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-12.png\" alt=\"\" class=\"wp-image-1945\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-12.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-12-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-12-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows mix of retiring slots and backend stalls.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-13.png\" alt=\"\" class=\"wp-image-1947\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-13.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-13-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-13-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics are a composite of the above showing not much floating point and low L2 access.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              647.523\non_cpu               0.143          # 2.29 \/ 16 cores\nutime                1195.537\nstime                284.194\nnvcsw                1342214        # 9.73%\nnivcsw               12450912       # 90.27%\ninblock              277614544      # 428733.21\/sec\nonblock              35936          # 55.50\/sec\ncpu-clock            1480594180040  # 1480.594 seconds\ntask-clock           1481141930675  # 1481.142 seconds\npage faults          84864057       # 57296.371\/sec\ncontext switches     13796149       # 9314.535\/sec\ncpu migrations       326886         # 220.699\/sec\nmajor page faults    150            # 0.101\/sec\nminor page faults    84863907       # 57296.269\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             1589971314596  # 177.095 branches per 1000 inst\nbranch misses        22901058854    # 1.44% branch miss\nconditional          1003628468174  # 111.787 conditional branches per 1000 inst\nindirect             161161234244   # 17.951 indirect branches per 1000 inst\ncpu-cycles           6094697416325  # 0.59 GHz\ninstructions         8898725499100  # 1.46 IPC\nslots                12168710168130 #\nretiring             3221769683765  # 26.5% (32.7%)\n-- ucode             25931388043    #     0.2%\n-- fastpath          3195838295722  #    26.3%\nfrontend             2229448759160  # 18.3% (22.6%)\n-- latency           1339052845638  #    11.0%\n-- bandwidth         890395913522   #     7.3%\nbackend              4376772220333  # 36.0% (44.4%)\n-- cpu               1570743126155  #    12.9%\n-- memory            2806029094178  #    23.1%\nspeculation          32943244673    #  0.3% ( 0.3%) low\n-- branch mispredict 32596017992    #     0.3%\n-- pipeline restart  347226681      #     0.0%\nsmt-contention       2307718181346  # 19.0% ( 0.0%)\ncpu-cycles           6142429904000  # 0.59 GHz\ninstructions         8913235202843  # 1.45 IPC\ninstructions         2969698292677  # 9.421 l2 access per 1000 inst\nl2 hit from l1       19591167482    # 25.90% l2 miss\nl2 miss from l1      1643216687     #\nl2 hit from l2 pf    2783547056     #\nl3 hit from l2 pf    1420250892     #\nl3 miss from l2 pf   4182540997     #\ninstructions         2964104532232  # 58.608 float per 1000 inst\nfloat 512            73             # 0.000 AVX-512 per 1000 inst\nfloat 256            844            # 0.000 AVX-256 per 1000 inst\nfloat 128            173721135233   # 58.608 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\ninstructions         8915527065582  #\nopcache              1346487074649  # 151.027 opcache per 1000 inst\nopcache miss         134602372282   # 10.0% opcache miss rate\nl1 dTLB miss         2340482149     # 0.263 L1 dTLB per 1000 inst\nl2 dTLB miss         614271217      # 0.069 L2 dTLB per 1000 inst\ninstructions         8904278828800  #\nicache               310867012549   # 34.912 icache per 1000 inst\nicache miss          9574737682     #  3.1% icache miss rate\nl1 iTLB miss         60611547       # 0.007 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            239333         # 0.000 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              1436.480\non_cpu               0.128          # 2.04 \/ 16 cores\nutime                2636.012\nstime                300.586\nnvcsw                2152216        # 98.57%\nnivcsw               31297          # 1.43%\ninblock              554279288      # 385859.26\/sec\nonblock              2368           # 1.65\/sec\ncpu-clock            2930803533535  # 2930.804 seconds\ntask-clock           2931453232627  # 2931.453 seconds\npage faults          120740962      # 41188.091\/sec\ncontext switches     2190497        # 747.239\/sec\ncpu migrations       12032          # 4.104\/sec\nmajor page faults    76             # 0.026\/sec\nminor page faults    120740886      # 41188.065\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             2498902996927  # 176.736 branches per 1000 inst\nbranch misses        1064616276     # 0.04% branch miss\nconditional          2498903022559  # 176.736 conditional branches per 1000 inst\nindirect             399913304330   # 28.284 indirect branches per 1000 inst\nslots                41640024301754 #\nretiring             8457987971296  # 20.3% (20.3%)\n-- ucode             1333390304549  #     3.2%\n-- fastpath          7124597666747  #    17.1%\nfrontend             5693102775992  # 13.7% (13.7%)\n-- latency           3332728754447  #     8.0%\n-- bandwidth         2360374021545  #     5.7%\nbackend              26553295817784 # 63.8% (63.8%)\n-- cpu               22615742223392 #    54.3%\n-- memory            3937553594392  #     9.5%\nspeculation          617223955381   #  1.5% ( 1.5%)\n-- branch mispredict 342222801500   #     0.8%\n-- pipeline restart  275001153881   #     0.7%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           17570217701955 # 0.46 GHz\ninstructions         31250072553799 # 1.78 IPC\nl2 access            148699094152   # 7.007 l2 access per 1000 inst\nl2 miss              96697170846    # 65.03% l2 miss\ncpu-cycles           7587612284914  # 10.3% memory latency\nload stalls          566094003027   #  4.0% l1 bound\nl1 miss              261284310669   #  1.3% l2 bound\nl2 miss              164160337975   #  0.9% l3 bound\nl3 miss              93582661571    #  1.2% dram bound\nstore_stalls         217319060957   #  2.9% store bound\n<\/code><\/pre>\n\n\n\n<p>Process overview shows kernel is the primary driver<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>764 processes\n\t246 kernel               23220.31  7341.70\n\t204 clinfo                  50.68    21.18\n\t 38 vulkaninfo               1.31     1.33\n\t  4 vulkani:disk$0           0.14     0.14\n\t  6 php                      0.09     0.20\n\t  2 llvmpipe-0               0.07     0.07\n\t  2 llvmpipe-1               0.07     0.07\n\t  2 llvmpipe-10              0.07     0.07\n\t  2 llvmpipe-11              0.07     0.07\n\t  2 llvmpipe-12              0.07     0.07\n\t  2 llvmpipe-13              0.07     0.07\n\t  2 llvmpipe-14              0.07     0.07\n\t  2 llvmpipe-15              0.07     0.07\n\t  2 llvmpipe-2               0.07     0.07\n\t  2 llvmpipe-3               0.07     0.07\n\t  2 llvmpipe-4               0.07     0.07\n\t  2 llvmpipe-5               0.07     0.07\n\t  2 llvmpipe-6               0.07     0.07\n\t  2 llvmpipe-7               0.07     0.07\n\t  2 llvmpipe-8               0.07     0.07\n\t  2 llvmpipe-9               0.07     0.07\n\t  6 glxinfo:gdrv0            0.06     0.12\n\t  6 glxinfo:gl0              0.06     0.12\n\t  6 clang                    0.06     0.05\n\t  2 glxinfo                  0.04     0.04\n\t  2 glxinfo:cs0              0.04     0.04\n\t  2 glxinfo:disk$0           0.04     0.04\n\t  2 glxinfo:sh0              0.04     0.04\n\t  2 glxinfo:shlo0            0.04     0.04\n\t  3 rocminfo                 0.00     0.03\n\t  1 lspci                    0.00     0.02\n\t 92 sh                       0.00     0.00\n\t 24 daphne                   0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t  9 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  4 gmain                    0.00     0.00\n\t  3 dconf worker             0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 ps                       0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Computation block<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      882112) daphne           cpu=4 start=24.47 finish=46.59\n        882113) kernel           cpu=5 start=24.47 finish=46.52\n          882114) kernel           cpu=14 start=35.86 finish=46.52\n          882115) kernel           cpu=0 start=35.86 finish=46.52\n          882116) kernel           cpu=8 start=35.86 finish=46.52\n          882117) kernel           cpu=1 start=35.86 finish=46.52\n          882118) kernel           cpu=2 start=35.86 finish=46.52\n          882119) kernel           cpu=15 start=35.86 finish=46.52\n          882120) kernel           cpu=10 start=35.86 finish=46.52\n          882121) kernel           cpu=12 start=35.86 finish=46.52\n          882122) kernel           cpu=6 start=35.86 finish=46.52\n          882123) kernel           cpu=7 start=35.86 finish=46.52\n          882124) kernel           cpu=11 start=35.86 finish=46.52\n          882125) kernel           cpu=9 start=35.86 finish=46.52\n          882126) kernel           cpu=4 start=35.86 finish=46.52\n          882127) kernel           cpu=3 start=35.86 finish=46.52\n          882128) kernel           cpu=13 start=35.86 finish=46.52\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>The Darmstadt Automotive Parallel Heterogeneous Benchmark Suite that tries benchmarks with OpenCL and OpenMP for automotive benchmarks. The OpenCL ones do not run. Most of these appear to be single-threaded despite the OpenMP indicator. Topdown profile shows mix of retiring <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/daphne\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-1925","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1925","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=1925"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1925\/revisions"}],"predecessor-version":[{"id":1948,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1925\/revisions\/1948"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=1925"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}