{"id":2196,"date":"2024-03-24T20:25:27","date_gmt":"2024-03-24T20:25:27","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=2196"},"modified":"2024-03-29T10:37:21","modified_gmt":"2024-03-29T10:37:21","slug":"schbench","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/schbench\/","title":{"rendered":"schbench"},"content":{"rendered":"\n<p>A Linux kernel scheduler benchmark. There are nine workloads measuring latency with increasing numbers of threads. Plot below reflects both increased runable threads and usage.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-48.png\" alt=\"\" class=\"wp-image-2231\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-48.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-48-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-48-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows mostly memory bound with some frontend stalls.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-51.png\" alt=\"\" class=\"wp-image-2233\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-51.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-51-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-51-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show backend stalls. There is little floating point code and little L2 access.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              1556.050\non_cpu               0.645          # 10.32 \/ 16 cores\nutime                16042.943\nstime                11.160\nnvcsw                1661935        # 45.05%\nnivcsw               2026894        # 54.95%\ninblock              0              # 0.00\/sec\nonblock              14568          # 9.36\/sec\ncpu-clock            16053848559152 # 16053.849 seconds\ntask-clock           16054114142431 # 16054.114 seconds\npage faults          242257         # 15.090\/sec\ncontext switches     3696293        # 230.240\/sec\ncpu migrations       790049         # 49.212\/sec\nmajor page faults    57             # 0.004\/sec\nminor page faults    242200         # 15.086\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             8363015807661  # 181.011 branches per 1000 inst\nbranch misses        1751506043     # 0.02% branch miss\nconditional          6129106510907  # 132.659 conditional branches per 1000 inst\nindirect             555221078517   # 12.017 indirect branches per 1000 inst\ncpu-cycles           73140054874996 # 2.54 GHz\ninstructions         46237826856780 # 0.63 IPC low\nslots                146432173835976 #\nretiring             24129100695098 # 16.5% (24.2%)\n-- ucode             370520955370   #     0.3%\n-- fastpath          23758579739728 #    16.2%\nfrontend             29102512339367 # 19.9% (29.2%)\n-- latency           24587855340780 #    16.8%\n-- bandwidth         4514656998587  #     3.1%\nbackend              46483315674898 # 31.7% (46.6%)\n-- cpu               10093079433097 #     6.9%\n-- memory            36390236241801 #    24.9%\nspeculation          2955527432     #  0.0% ( 0.0%) low\n-- branch mispredict 2946103259     #     0.0%\n-- pipeline restart  9424173        #     0.0%\nsmt-contention       46714117571170 # 31.9% ( 0.0%)\ncpu-cycles           67822818429201 # 2.35 GHz\ninstructions         43003680601254 # 0.63 IPC low\ninstructions         14340971853956 # 0.122 l2 access per 1000 inst\nl2 hit from l1       1599372168     # 15.28% l2 miss\nl2 miss from l1      197105566      #\nl2 hit from l2 pf    77075632       #\nl3 hit from l2 pf    55117623       #\nl3 miss from l2 pf   14657946       #\ninstructions         14335290807670 # 4.475 float per 1000 inst\nfloat 512            108            # 0.000 AVX-512 per 1000 inst\nfloat 256            480            # 0.000 AVX-256 per 1000 inst\nfloat 128            64152822075    # 4.475 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\ninstructions         26647384037721 #\nopcache              3861767750251  # 144.921 opcache per 1000 inst\nopcache miss         7442788516     #  0.2% opcache miss rate\nl1 dTLB miss         139727119      # 0.005 L1 dTLB per 1000 inst\nl2 dTLB miss         18498881       # 0.001 L2 dTLB per 1000 inst\ninstructions         52883862220061 #\nicache               25470270754    # 0.482 icache per 1000 inst\nicache miss          3143824182     # 12.3% icache miss rate\nl1 iTLB miss         9913687        # 0.000 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            146204         # 0.000 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics show backend stalls as more CPU-based<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              2139.355\non_cpu               0.565          # 9.04 \/ 16 cores\nutime                19329.698\nstime                10.853\nnvcsw                1642438        # 46.31%\nnivcsw               1904138        # 53.69%\ninblock              288            # 0.13\/sec\nonblock              3760           # 1.76\/sec\ncpu-clock            19342627391967 # 19342.627 seconds\ntask-clock           19343100659394 # 19343.101 seconds\npage faults          180818         # 9.348\/sec\ncontext switches     3556927        # 183.886\/sec\ncpu migrations       785457         # 40.607\/sec\nmajor page faults    66             # 0.003\/sec\nminor page faults    180752         # 9.345\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             4562313428524  # 180.964 branches per 1000 inst\nbranch misses        259291716      # 0.01% branch miss\nconditional          4562313732684  # 180.964 conditional branches per 1000 inst\nindirect             1397283320799  # 55.423 indirect branches per 1000 inst\nslots                89163987381800 #\nretiring             12228322389427 # 13.7% (13.7%) low\n-- ucode             4221981936017  #     4.7%\n-- fastpath          8006340453410  #     9.0%\nfrontend             6991394912521  #  7.8% ( 7.8%)\n-- latency           5126137258625  #     5.7%\n-- bandwidth         1865257653896  #     2.1%\nbackend              69975878002186 # 78.5% (78.5%) high\n-- cpu               67849553454305 #    76.1%\n-- memory            2126324547881  #     2.4%\nspeculation          2900478872     #  0.0% ( 0.0%) low\n-- branch mispredict 2772785413     #     0.0%\n-- pipeline restart  127693459      #     0.0%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           61862121549034 # 1.75 GHz\ninstructions         20391443056364 # 0.33 IPC low\nl2 access            2429707648     # 0.224 l2 access per 1000 inst\nl2 miss              786985989      # 32.39% l2 miss\ncpu-cycles           23582529091664 #  3.7% memory latency\nload stalls          864295341452   #  3.6% l1 bound\nl1 miss              9987194866     #  0.0% l2 bound\nl2 miss              6326008394     #  0.0% l3 bound\nl3 miss              926737004      #  0.0% dram bound\nstore_stalls         276316533      #  0.0% store bound\n<\/code><\/pre>\n\n\n\n<p>Process summary<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>8083 processes\n\t7664 schbench             3752749.00  1085.68\n\t 68 clinfo                  16.21     5.99\n\t 38 vulkaninfo               1.52     0.96\n\t  4 vulkani:disk$0           0.16     0.10\n\t  6 php                      0.14     0.23\n\t  6 glxinfo:gdrv0            0.11     0.05\n\t  6 glxinfo:gl0              0.11     0.05\n\t  2 llvmpipe-0               0.08     0.05\n\t  2 llvmpipe-1               0.08     0.05\n\t  2 llvmpipe-10              0.08     0.05\n\t  2 llvmpipe-11              0.08     0.05\n\t  2 llvmpipe-12              0.08     0.05\n\t  2 llvmpipe-13              0.08     0.05\n\t  2 llvmpipe-14              0.08     0.05\n\t  2 llvmpipe-15              0.08     0.05\n\t  2 llvmpipe-2               0.08     0.05\n\t  2 llvmpipe-3               0.08     0.05\n\t  2 llvmpipe-4               0.08     0.05\n\t  2 llvmpipe-5               0.08     0.05\n\t  2 llvmpipe-6               0.08     0.05\n\t  2 llvmpipe-7               0.08     0.05\n\t  2 llvmpipe-8               0.08     0.05\n\t  2 llvmpipe-9               0.08     0.05\n\t  6 clang                    0.06     0.04\n\t  2 glxinfo                  0.05     0.03\n\t  2 glxinfo:cs0              0.05     0.03\n\t  2 glxinfo:disk$0           0.05     0.03\n\t  2 glxinfo:sh0              0.05     0.03\n\t  2 glxinfo:shlo0            0.05     0.03\n\t  3 rocminfo                 0.03     0.00\n\t  1 lspci                    0.00     0.02\n\t  1 ps                       0.00     0.01\n\t 98 sh                       0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t 11 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  3 gmain                    0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 dconf worker             0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n1291 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Computation structures<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      87468) schbench         cpu=6 start=148.37 finish=178.43\n        87469) schbench         cpu=7 start=148.37 finish=178.43\n          87470) schbench         cpu=2 start=148.37 finish=178.43\n            87472) schbench         cpu=10 start=148.37 finish=178.43\n            87474) schbench         cpu=12 start=148.37 finish=178.43\n            87476) schbench         cpu=15 start=148.37 finish=178.43\n            87477) schbench         cpu=6 start=148.37 finish=178.43\n          87471) schbench         cpu=1 start=148.37 finish=178.43\n            87473) schbench         cpu=4 start=148.37 finish=178.43\n            87475) schbench         cpu=5 start=148.37 finish=178.43\n            87478) schbench         cpu=11 start=148.37 finish=178.43\n            87479) schbench         cpu=9 start=148.37 finish=178.43\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>A Linux kernel scheduler benchmark. There are nine workloads measuring latency with increasing numbers of threads. Plot below reflects both increased runable threads and usage. Topdown profile shows mostly memory bound with some frontend stalls. AMD metrics show backend stalls. <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/schbench\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2196","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2196","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=2196"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2196\/revisions"}],"predecessor-version":[{"id":2234,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2196\/revisions\/2234"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=2196"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}