{"id":1880,"date":"2024-03-01T02:32:51","date_gmt":"2024-03-01T02:32:51","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=1880"},"modified":"2024-03-01T12:22:29","modified_gmt":"2024-03-01T12:22:29","slug":"core-latency","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/core-latency\/","title":{"rendered":"core-latency"},"content":{"rendered":"\n<p>Shows the latency between various core combinations on the CPU. It runs for a while before concluding a single number (86.5ns on AMD and 146.2ns on Intel). Mostly three threads with a spiky usage profile and moderate level of interrupts.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-5.png\" alt=\"\" class=\"wp-image-1884\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-5.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-5-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-5-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows more backend constraints with different phases.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-5.png\" alt=\"\" class=\"wp-image-1885\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-5.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-5-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-5-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics include almost no floating point, many branches, and reasonable backend stalls with 45% memory related stalls.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              347.624\non_cpu               0.068          # 1.09 \/ 16 cores\nutime                377.052\nstime                1.097\nnvcsw                16947          # 34.74%\nnivcsw               31834          # 65.26%\ninblock              0              # 0.00\/sec\nonblock              12744          # 36.66\/sec\ncpu-clock            381605191602   # 381.605 seconds\ntask-clock           381632595433   # 381.633 seconds\npage faults          366889         # 961.367\/sec\ncontext switches     50347          # 131.925\/sec\ncpu migrations       15123          # 39.627\/sec\nmajor page faults    2              # 0.005\/sec\nminor page faults    366887         # 961.362\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             1290084247883  # 317.431 branches per 1000 inst\nbranch misses        2376229821     # 0.18% branch miss\nconditional          1262738703916  # 310.702 conditional branches per 1000 inst\nindirect             7684465944     # 1.891 indirect branches per 1000 inst\ncpu-cycles           1686373241655  # 0.30 GHz\ninstructions         4047814184336  # 2.40 IPC\nslots                3570759217254  #\nretiring             992216232198   # 27.8% (27.8%)\n-- ucode             862926843      #     0.0%\n-- fastpath          991353305355   #    27.8%\nfrontend             706977537808   # 19.8% (19.8%)\n-- latency           398876251902   #    11.2%\n-- bandwidth         308101285906   #     8.6%\nbackend              1798308407946  # 50.4% (50.4%)\n-- cpu               187178390007   #     5.2%\n-- memory            1611130017939  #    45.1%\nspeculation          69693310609    #  2.0% ( 2.0%)\n-- branch mispredict 48349557225    #     1.4%\n-- pipeline restart  21343753384    #     0.6%\nsmt-contention       3562760504     #  0.1% ( 0.0%)\ncpu-cycles           1707892632149  # 0.30 GHz\ninstructions         4094199228615  # 2.40 IPC\ninstructions         1350541887660  # 0.776 l2 access per 1000 inst\nl2 hit from l1       1004371938     # 67.46% l2 miss\nl2 miss from l1      682022950      #\nl2 hit from l2 pf    18820367       #\nl3 hit from l2 pf    10453741       #\nl3 miss from l2 pf   14761656       #\ninstructions         1353828518705  # 3.712 float per 1000 inst\nfloat 512            46             # 0.000 AVX-512 per 1000 inst\nfloat 256            384            # 0.000 AVX-256 per 1000 inst\nfloat 128            5025323444     # 3.712 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         13             # 0.000 scalar per 1000 inst\ninstructions         4069008521799  #\nopcache              259340881921   # 63.736 opcache per 1000 inst\nopcache miss         2342524413     #  0.9% opcache miss rate\nl1 dTLB miss         33936466       # 0.008 L1 dTLB per 1000 inst\nl2 dTLB miss         6530737        # 0.002 L2 dTLB per 1000 inst\ninstructions         4089180366000  #\nicache               4014925148     # 0.982 icache per 1000 inst\nicache miss          330099304      #  8.2% icache miss rate\nl1 iTLB miss         8820947        # 0.002 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics show this fits in L3 and interestingly is more cpu-bound than memory-bound in contrast to the AMD processor.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              338.064\non_cpu               0.068          # 1.09 \/ 16 cores\nutime                366.068\nstime                0.844\nnvcsw                16599          # 34.41%\nnivcsw               31636          # 65.59%\ninblock              8              # 0.02\/sec\nonblock              1496           # 4.43\/sec\ncpu-clock            369575801564   # 369.576 seconds\ntask-clock           369591458164   # 369.591 seconds\npage faults          557191         # 1507.586\/sec\ncontext switches     49764          # 134.646\/sec\ncpu migrations       14929          # 40.393\/sec\nmajor page faults    0              # 0.000\/sec\nminor page faults    557191         # 1507.586\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             1488546243170  # 320.204 branches per 1000 inst\nbranch misses        2105739450     # 0.14% branch miss\nconditional          1488546728162  # 320.204 conditional branches per 1000 inst\nindirect             170666966807   # 36.713 indirect branches per 1000 inst\nslots                5976172192742  #\nretiring             3048896025037  # 51.0% (51.0%)\n-- ucode             73313850447    #     1.2%\n-- fastpath          2975582174590  #    49.8%\nfrontend             415667801674   #  7.0% ( 7.0%)\n-- latency           134077085418   #     2.2%\n-- bandwidth         281590716256   #     4.7%\nbackend              2284255214220  # 38.2% (38.2%)\n-- cpu               1853685353998  #    31.0%\n-- memory            430569860222   #     7.2%\nspeculation          229582167842   #  3.8% ( 3.8%)\n-- branch mispredict 208351532527   #     3.5%\n-- pipeline restart  21230635315    #     0.4%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           999791035793   # 0.18 GHz\ninstructions         4168346020829  # 4.17 IPC high\nl2 access            1082001631     # 0.260 l2 access per 1000 inst\nl2 miss              736658180      # 68.08% l2 miss\ncpu-cycles           997555573602   #  9.8% memory latency\nload stalls          97736959756    #  0.3% l1 bound\nl1 miss              94948090517    #  1.4% l2 bound\nl2 miss              81224075073    #  8.1% l3 bound\nl3 miss              330804862      #  0.0% dram bound\nstore_stalls         509402552      #  0.1% store bound\n<\/code><\/pre>\n\n\n\n<p>Process overview shows ~15k versions of the core-latency process.  Each of these runs for a very short time.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>15322 processes\n\t14976 core-latency         1535299.29  6920.82\n\t 68 clinfo                  15.87     6.98\n\t 38 vulkaninfo               1.71     0.95\n\t  4 vulkani:disk$0           0.18     0.10\n\t  6 glxinfo:gdrv0            0.11     0.07\n\t  6 glxinfo:gl0              0.11     0.07\n\t  2 llvmpipe-0               0.09     0.05\n\t  2 llvmpipe-1               0.09     0.05\n\t  2 llvmpipe-10              0.09     0.05\n\t  2 llvmpipe-11              0.09     0.05\n\t  2 llvmpipe-12              0.09     0.05\n\t  2 llvmpipe-13              0.09     0.05\n\t  2 llvmpipe-14              0.09     0.05\n\t  2 llvmpipe-15              0.09     0.05\n\t  2 llvmpipe-2               0.09     0.05\n\t  2 llvmpipe-3               0.09     0.05\n\t  2 llvmpipe-4               0.09     0.05\n\t  2 llvmpipe-5               0.09     0.05\n\t  2 llvmpipe-6               0.09     0.05\n\t  2 llvmpipe-7               0.09     0.05\n\t  2 llvmpipe-8               0.09     0.05\n\t  2 llvmpipe-9               0.09     0.05\n\t  6 php                      0.08     0.06\n\t  6 clang                    0.06     0.06\n\t  2 glxinfo                  0.05     0.03\n\t  2 glxinfo:cs0              0.05     0.03\n\t  2 glxinfo:disk$0           0.05     0.03\n\t  2 glxinfo:sh0              0.05     0.03\n\t  2 glxinfo:shlo0            0.05     0.03\n\t  3 rocminfo                 0.03     0.00\n\t  1 lspci                    0.00     0.03\n\t 80 sh                       0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  7 gsettings                0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 gmain                    0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  4 dconf worker             0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 ps                       0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Shows the latency between various core combinations on the CPU. It runs for a while before concluding a single number (86.5ns on AMD and 146.2ns on Intel). Mostly three threads with a spiky usage profile and moderate level of interrupts. <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/core-latency\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-1880","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1880","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=1880"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1880\/revisions"}],"predecessor-version":[{"id":1886,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1880\/revisions\/1886"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=1880"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}