{"id":1137,"date":"2024-01-31T01:11:56","date_gmt":"2024-01-31T01:11:56","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=1137"},"modified":"2024-01-31T14:25:55","modified_gmt":"2024-01-31T14:25:55","slug":"hackbench","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/hackbench\/","title":{"rendered":"hackbench"},"content":{"rendered":"\n<p>A test of the kernel scheduler. This uses varying amounts of threads and processes as shown in the progression below.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-93.png\" alt=\"\" class=\"wp-image-1189\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-93.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-93-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-93-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile is dominated by frontend stalls with not many backend stalls.  Retirement rate is consistent through different numbers of threads\/processes.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-130.png\" alt=\"\" class=\"wp-image-1191\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-130.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-130-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-130-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show little floating point or L2 access.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              3103.839\non_cpu               0.881          # 14.10 \/ 16 cores\nutime                2834.004\nstime                40925.639\nnvcsw                685247592      # 76.91%\nnivcsw               205745161      # 23.09%\ninblock              0              # 0.00\/sec\nonblock              15128          # 4.87\/sec\ncpu-clock            43755689957976 # 43755.690 seconds\ntask-clock           43759705281147 # 43759.705 seconds\npage faults          689472         # 15.756\/sec\ncontext switches     890984992      # 20360.854\/sec\ncpu migrations       82144359       # 1877.169\/sec\nmajor page faults    46             # 0.001\/sec\nminor page faults    689426         # 15.755\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             23501436824532 # 203.654 branches per 1000 inst\nbranch misses        3032522200893  # 12.90% branch miss\nconditional          10698516228519 # 92.709 conditional branches per 1000 inst\nindirect             239505493842   # 2.075 indirect branches per 1000 inst\ncpu-cycles           91347428654789 # 3.38 GHz\ninstructions         60895739342932 # 0.67 IPC low\nslots                181699842904956 #\nretiring             24675428706414 # 13.6% (15.3%)\n-- ucode             217739037199   #     0.1%\n-- fastpath          24457689669215 #    13.5%\nfrontend             105015576147955 # 57.8% (65.2%) high\n-- latency           88990652940450 #    49.0%\n-- bandwidth         16024923207505 #     8.8%\nbackend              30804556660061 # 17.0% (19.1%)\n-- cpu               7074311060381  #     3.9%\n-- memory            23730245599680 #    13.1%\nspeculation          515827690784   #  0.3% ( 0.3%) low\n-- branch mispredict 515481404909   #     0.3%\n-- pipeline restart  346285875      #     0.0%\nsmt-contention       20687259190202 # 11.4% ( 0.0%)\ncpu-cycles           90504608662935 # 3.21 GHz\ninstructions         61341111308824 # 0.68 IPC low\ninstructions         20314482395291 # 35.427 l2 access per 1000 inst\nl2 hit from l1       575062511723   # 16.81% l2 miss\nl2 miss from l1      70863633140    #\nl2 hit from l2 pf    94475719768    #\nl3 hit from l2 pf    35652924877    #\nl3 miss from l2 pf   14483398129    #\ninstructions         20307336146846 # 21.070 float per 1000 inst\nfloat 512            103            # 0.000 AVX-512 per 1000 inst\nfloat 256            498            # 0.000 AVX-256 per 1000 inst\nfloat 128            427884882765   # 21.070 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              1286.246\non_cpu               0.786          # 12.58 \/ 16 cores\nutime                1244.302\nstime                14940.373\nnvcsw                276770340      # 81.39%\nnivcsw               63265868       # 18.61%\ninblock              600            # 0.47\/sec\nonblock              3176           # 2.47\/sec\ncpu-clock            16176082544622 # 16176.083 seconds\ntask-clock           16179007940539 # 16179.008 seconds\npage faults          431645         # 26.679\/sec\ncontext switches     340031467      # 21016.830\/sec\ncpu migrations       41546880       # 2567.950\/sec\nmajor page faults    48             # 0.003\/sec\nminor page faults    431597         # 26.676\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             9419060412929  # 164.638 branches per 1000 inst\nbranch misses        44100542658    # 0.47% branch miss\nconditional          9419061039425  # 164.638 conditional branches per 1000 inst\nindirect             2849081581423  # 49.800 indirect branches per 1000 inst\nslots                93413724946706 #\nretiring             39970625421293 # 42.8% (42.8%)\n-- ucode             7326098617840  #     7.8%\n-- fastpath          32644526803453 #    34.9%\nfrontend             40717041467788 # 43.6% (43.6%)\n-- latency           18873031629963 #    20.2%\n-- bandwidth         21844009837825 #    23.4%\nbackend              9785979087670  # 10.5% (10.5%) low\n-- cpu               3709347493414  #     4.0%\n-- memory            6076631594256  #     6.5%\nspeculation          2982448418427  #  3.2% ( 3.2%)\n-- branch mispredict 2689553597192  #     2.9%\n-- pipeline restart  292894821235   #     0.3%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           48091671451085 # 2.48 GHz\ninstructions         53251990904221 # 1.11 IPC\nl2 access            909727906941   # 33.186 l2 access per 1000 inst\nl2 miss              327415399029   # 35.99% l2 miss\n<\/code><\/pre>\n\n\n\n<p>Process overview gives 1291 as maximum number of active processes.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>28052 processes\n\t27628 hackbench_bin        163474.49 2273324.74\n\t 68 clinfo                  17.20     5.65\n\t 38 vulkaninfo               0.76     1.52\n\t  6 php                      0.18     0.41\n\t  6 glxinfo:gdrv0            0.09     0.10\n\t  6 glxinfo:gl0              0.09     0.10\n\t  6 clang                    0.09     0.03\n\t  4 vulkani:disk$0           0.08     0.16\n\t  2 glxinfo                  0.05     0.04\n\t  2 glxinfo:cs0              0.05     0.04\n\t  2 glxinfo:disk$0           0.05     0.04\n\t  2 glxinfo:sh0              0.05     0.04\n\t  2 glxinfo:shlo0            0.05     0.04\n\t  2 llvmpipe-0               0.04     0.08\n\t  2 llvmpipe-1               0.04     0.08\n\t  2 llvmpipe-10              0.04     0.08\n\t  2 llvmpipe-11              0.04     0.08\n\t  2 llvmpipe-12              0.04     0.08\n\t  2 llvmpipe-13              0.04     0.08\n\t  2 llvmpipe-14              0.04     0.08\n\t  2 llvmpipe-15              0.04     0.08\n\t  2 llvmpipe-2               0.04     0.08\n\t  2 llvmpipe-3               0.04     0.08\n\t  2 llvmpipe-4               0.04     0.08\n\t  2 llvmpipe-5               0.04     0.08\n\t  2 llvmpipe-6               0.04     0.08\n\t  2 llvmpipe-7               0.04     0.08\n\t  2 llvmpipe-8               0.04     0.08\n\t  2 llvmpipe-9               0.04     0.08\n\t  3 rocminfo                 0.03     0.00\n\t  1 lspci                    0.00     0.03\n\t  1 ps                       0.00     0.01\n\t102 sh                       0.00     0.00\n\t 56 hackbench                0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t  8 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 gmain                    0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  3 dconf worker             0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n1291 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Computation structure is straightforward<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      2544825) hackbench        cpu=7 start=6.59  finish=10.49\n        2544826) hackbench_bin    cpu=8 start=6.60  finish=10.49\n          2544827) hackbench_bin    cpu=3 start=6.60  finish=10.49\n          2544828) hackbench_bin    cpu=6 start=6.60  finish=10.49\n          2544829) hackbench_bin    cpu=0 start=6.60  finish=10.49\n          2544830) hackbench_bin    cpu=12 start=6.60  finish=10.49\n          2544831) hackbench_bin    cpu=13 start=6.60  finish=10.49\n          2544832) hackbench_bin    cpu=4 start=6.60  finish=10.49\n          2544833) hackbench_bin    cpu=9 start=6.60  finish=10.49\n          2544834) hackbench_bin    cpu=5 start=6.60  finish=10.49\n          2544835) hackbench_bin    cpu=8 start=6.60  finish=10.49\n          2544836) hackbench_bin    cpu=14 start=6.60  finish=10.49\n          2544837) hackbench_bin    cpu=15 start=6.60  finish=10.49\n          2544838) hackbench_bin    cpu=12 start=6.60  finish=10.49\n          2544839) hackbench_bin    cpu=0 start=6.60  finish=10.49\n          2544840) hackbench_bin    cpu=4 start=6.60  finish=10.49\n          2544841) hackbench_bin    cpu=6 start=6.60  finish=10.49\n          2544842) hackbench_bin    cpu=1 start=6.60  finish=10.49\n          2544843) hackbench_bin    cpu=11 start=6.60  finish=10.49\n          2544844) hackbench_bin    cpu=5 start=6.60  finish=10.49\n          2544845) hackbench_bin    cpu=12 start=6.60  finish=10.49\n          2544846) hackbench_bin    cpu=8 start=6.60  finish=10.49\n          2544847) hackbench_bin    cpu=7 start=6.60  finish=10.49\n          2544848) hackbench_bin    cpu=13 start=6.60  finish=10.46\n          2544849) hackbench_bin    cpu=2 start=6.60  finish=10.45\n          2544850) hackbench_bin    cpu=5 start=6.60  finish=10.48\n          2544851) hackbench_bin    cpu=2 start=6.60  finish=10.45\n          2544852) hackbench_bin    cpu=10 start=6.60  finish=10.48\n          2544853) hackbench_bin    cpu=8 start=6.60  finish=10.41\n          2544854) hackbench_bin    cpu=14 start=6.60  finish=10.44\n          2544855) hackbench_bin    cpu=11 start=6.60  finish=10.46\n          2544856) hackbench_bin    cpu=8 start=6.60  finish=10.46\n          2544857) hackbench_bin    cpu=7 start=6.60  finish=10.42\n          2544858) hackbench_bin    cpu=11 start=6.60  finish=10.48\n          2544859) hackbench_bin    cpu=0 start=6.60  finish=10.40\n          2544860) hackbench_bin    cpu=8 start=6.60  finish=10.43\n          2544861) hackbench_bin    cpu=14 start=6.60  finish=10.45\n          2544862) hackbench_bin    cpu=0 start=6.60  finish=10.44\n          2544863) hackbench_bin    cpu=12 start=6.60  finish=10.48\n          2544864) hackbench_bin    cpu=7 start=6.60  finish=10.47\n          2544865) hackbench_bin    cpu=9 start=6.60  finish=10.49\n          2544866) hackbench_bin    cpu=1 start=6.60  finish=10.45\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>A test of the kernel scheduler. This uses varying amounts of threads and processes as shown in the progression below. Topdown profile is dominated by frontend stalls with not many backend stalls. Retirement rate is consistent through different numbers of <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/hackbench\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-1137","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1137","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=1137"}],"version-history":[{"count":3,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1137\/revisions"}],"predecessor-version":[{"id":1192,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1137\/revisions\/1192"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=1137"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}