{"id":2056,"date":"2024-03-07T13:01:15","date_gmt":"2024-03-07T13:01:15","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=2056"},"modified":"2024-03-07T21:16:40","modified_gmt":"2024-03-07T21:16:40","slug":"mutex","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/mutex\/","title":{"rendered":"mutex"},"content":{"rendered":"\n<p>A test of eight different mutex operations. These look to be single-threaded.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-28.png\" alt=\"\" class=\"wp-image-2064\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-28.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-28-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-28-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows a high retirement rate overall and some backend stalls depending on the operator.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-28.png\" alt=\"\" class=\"wp-image-2066\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-28.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-28-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-28-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics confirm ~0.8 cores of activity and a high retirement rate. There is no floating point or l2 access. The opcache miss rate is very low and the icache access is also low.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              768.246\non_cpu               0.051          # 0.81 \/ 16 cores\nutime                624.890\nstime                0.979\nnvcsw                2235           # 46.24%\nnivcsw               2598           # 53.76%\ninblock              0              # 0.00\/sec\nonblock              13848          # 18.03\/sec\ncpu-clock            625977324320   # 625.977 seconds\ntask-clock           625986101516   # 625.986 seconds\npage faults          164393         # 262.614\/sec\ncontext switches     8443           # 13.488\/sec\ncpu migrations       304            # 0.486\/sec\nmajor page faults    2              # 0.003\/sec\nminor page faults    164391         # 262.611\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             1592406098562  # 184.609 branches per 1000 inst\nbranch misses        241102970      # 0.02% branch miss\nconditional          651535715124   # 75.533 conditional branches per 1000 inst\nindirect             46735009277    # 5.418 indirect branches per 1000 inst\ncpu-cycles           2897158648882  # 0.24 GHz\ninstructions         8622339610405  # 2.98 IPC\nslots                5800859735712  #\nretiring             3253046943596  # 56.1% (56.1%) high\n-- ucode             24462969586    #     0.4%\n-- fastpath          3228583974010  #    55.7%\nfrontend             873341239219   # 15.1% (15.1%)\n-- latency           433757472954   #     7.5%\n-- bandwidth         439583766265   #     7.6%\nbackend              1577324717331  # 27.2% (27.2%)\n-- cpu               150783304857   #     2.6%\n-- memory            1426541412474  #    24.6%\nspeculation          96796569505    #  1.7% ( 1.7%)\n-- branch mispredict 12546909414    #     0.2%\n-- pipeline restart  84249660091    #     1.5%\nsmt-contention       349845427      #  0.0% ( 0.0%)\ncpu-cycles           2895380779245  # 0.24 GHz\ninstructions         8625321930524  # 2.98 IPC\ninstructions         2877430346634  # 0.054 l2 access per 1000 inst\nl2 hit from l1       135717532      # 17.98% l2 miss\nl2 miss from l1      17906204       #\nl2 hit from l2 pf    8493747        #\nl3 hit from l2 pf    4915787        #\nl3 miss from l2 pf   4856894        #\ninstructions         2875652905685  # 0.018 float per 1000 inst\nfloat 512            92             # 0.000 AVX-512 per 1000 inst\nfloat 256            596            # 0.000 AVX-256 per 1000 inst\nfloat 128            50502963       # 0.018 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\ninstructions         8625751799175  #\nopcache              1754552237838  # 203.409 opcache per 1000 inst\nopcache miss         3140782157     #  0.2% opcache miss rate\nl1 dTLB miss         30325200       # 0.004 L1 dTLB per 1000 inst\nl2 dTLB miss         5222148        # 0.001 L2 dTLB per 1000 inst\ninstructions         8976029807411  #\nicache               4197245078     # 0.468 icache per 1000 inst\nicache miss          220014730      #  5.2% icache miss rate\nl1 iTLB miss         9439540        # 0.001 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            20437          # 0.000 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics show backend stalls as store-bound.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              1009.781\non_cpu               0.054          # 0.86 \/ 16 cores\nutime                867.884\nstime                0.576\nnvcsw                2071           # 33.11%\nnivcsw               4184           # 66.89%\ninblock              8              # 0.01\/sec\nonblock              2496           # 2.47\/sec\ncpu-clock            868580465423   # 868.580 seconds\ntask-clock           868590350103   # 868.590 seconds\npage faults          156145         # 179.768\/sec\ncontext switches     11080          # 12.756\/sec\ncpu migrations       440            # 0.507\/sec\nmajor page faults    0              # 0.000\/sec\nminor page faults    156145         # 179.768\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             1592194973756  # 184.595 branches per 1000 inst\nbranch misses        22779649       # 0.00% branch miss\nconditional          1592194987388  # 184.595 conditional branches per 1000 inst\nindirect             46747423365    # 5.420 indirect branches per 1000 inst\nslots                27528185511464 #\nretiring             13766928686759 # 50.0% (50.0%)\n-- ucode             2564852441369  #     9.3%\n-- fastpath          11202076245390 #    40.7%\nfrontend             1454939362848  #  5.3% ( 5.3%)\n-- latency           628410579732   #     2.3%\n-- bandwidth         826528783116   #     3.0%\nbackend              12177694657167 # 44.2% (44.2%)\n-- cpu               2595455796043  #     9.4%\n-- memory            9582238861124  #    34.8%\nspeculation          5619303554     #  0.0% ( 0.0%) low\n-- branch mispredict 3012286657     #     0.0%\n-- pipeline restart  2607016897     #     0.0%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           3362261398526  # 0.20 GHz\ninstructions         8780623521865  # 2.61 IPC\nl2 access            348782437      # 0.040 l2 access per 1000 inst\nl2 miss              111403109      # 31.94% l2 miss\ncpu-cycles           3289745122103  # 38.2% memory latency\nload stalls          649290423321   # 19.7% l1 bound\nl1 miss              1067015600     #  0.0% l2 bound\nl2 miss              459914057      #  0.0% l3 bound\nl3 miss              254176849      #  0.0% dram bound\nstore_stalls         608692073488   # 18.5% store bound\n<\/code><\/pre>\n\n\n\n<p>Process overview<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>410 processes\n\t 24 BenchmarkMutex         622.88     0.00\n\t 68 clinfo                  17.12     5.66\n\t 38 vulkaninfo               1.12     1.52\n\t  4 vulkani:disk$0           0.11     0.16\n\t  6 php                      0.10     0.24\n\t  6 glxinfo:gdrv0            0.07     0.10\n\t  6 glxinfo:gl0              0.07     0.10\n\t  2 llvmpipe-0               0.06     0.08\n\t  2 llvmpipe-1               0.06     0.08\n\t  2 llvmpipe-10              0.06     0.08\n\t  2 llvmpipe-11              0.06     0.08\n\t  2 llvmpipe-12              0.06     0.08\n\t  2 llvmpipe-13              0.06     0.08\n\t  2 llvmpipe-14              0.06     0.08\n\t  2 llvmpipe-15              0.06     0.08\n\t  2 llvmpipe-2               0.06     0.08\n\t  2 llvmpipe-3               0.06     0.08\n\t  2 llvmpipe-4               0.06     0.08\n\t  2 llvmpipe-5               0.06     0.08\n\t  2 llvmpipe-6               0.06     0.08\n\t  2 llvmpipe-7               0.06     0.08\n\t  2 llvmpipe-8               0.06     0.08\n\t  2 llvmpipe-9               0.06     0.08\n\t  2 glxinfo                  0.05     0.04\n\t  2 glxinfo:cs0              0.05     0.04\n\t  2 glxinfo:disk$0           0.05     0.04\n\t  2 glxinfo:sh0              0.05     0.04\n\t  2 glxinfo:shlo0            0.05     0.04\n\t  6 clang                    0.04     0.08\n\t  1 lspci                    0.00     0.02\n\t  1 ps                       0.00     0.01\n\t 96 sh                       0.00     0.00\n\t 24 mutex                    0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t 12 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  3 gmain                    0.00     0.00\n\t  3 rocminfo                 0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dconf worker             0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Computation blocks<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      1044464) mutex            cpu=6 start=6.49  finish=22.02\n        1044465) BenchmarkMutex   cpu=1 start=6.50  finish=22.02\n      1044468) mutex            cpu=6 start=26.03 finish=41.57\n        1044469) BenchmarkMutex   cpu=7 start=26.03 finish=41.57\n      1044470) mutex            cpu=7 start=45.57 finish=61.15\n        1044471) BenchmarkMutex   cpu=8 start=45.58 finish=61.15\n      1044472) sh               cpu=14 start=61.15 finish=61.15\n        1044473) sh               cpu=7 start=61.15 finish=61.15\n      1044474) mutex            cpu=0 start=71.35 finish=170.56\n        1044475) BenchmarkMutex   cpu=1 start=71.35 finish=170.56\n      1044476) mutex            cpu=0 start=174.57 finish=273.25\n        1044477) BenchmarkMutex   cpu=10 start=174.57 finish=273.25\n      1044612) mutex            cpu=0 start=277.25 finish=374.00\n        1044613) BenchmarkMutex   cpu=1 start=277.26 finish=374.00\n      1044616) sh               cpu=1 start=374.00 finish=374.00\n        1044617) sh               cpu=2 start=374.00 finish=374.00\n      1044618) mutex            cpu=6 start=384.42 finish=397.09\n        1044619) BenchmarkMutex   cpu=7 start=384.42 finish=397.09\n      1044620) mutex            cpu=15 start=401.10 finish=413.73\n        1044621) BenchmarkMutex   cpu=0 start=401.10 finish=413.73\n      1044624) mutex            cpu=6 start=417.73 finish=430.36\n        1044625) BenchmarkMutex   cpu=15 start=417.73 finish=430.36\n      1044626) sh               cpu=15 start=430.36 finish=430.36\n        1044627) sh               cpu=3 start=430.36 finish=430.36\n      1044628) mutex            cpu=1 start=440.56 finish=455.14\n        1044629) BenchmarkMutex   cpu=10 start=440.57 finish=455.14\n      1044630) mutex            cpu=9 start=459.14 finish=473.68\n        1044631) BenchmarkMutex   cpu=2 start=459.14 finish=473.68\n      1044632) mutex            cpu=1 start=477.68 finish=492.46\n        1044633) BenchmarkMutex   cpu=2 start=477.69 finish=492.46\n      1044634) sh               cpu=2 start=492.46 finish=492.46\n        1044635) sh               cpu=3 start=492.46 finish=492.46\n      1044636) mutex            cpu=1 start=503.44 finish=511.86\n        1044637) BenchmarkMutex   cpu=2 start=503.45 finish=511.85\n      1044640) mutex            cpu=9 start=515.86 finish=524.24\n        1044641) BenchmarkMutex   cpu=2 start=515.86 finish=524.24\n      1044642) mutex            cpu=9 start=528.24 finish=536.62\n        1044643) BenchmarkMutex   cpu=10 start=528.24 finish=536.62\n      1044644) sh               cpu=1 start=536.62 finish=536.62\n        1044645      1044646) mutex            cpu=1 start=547.36 finish=568.53\n        1044647) BenchmarkMutex   cpu=2 start=547.36 finish=568.53\n      1044650) mutex            cpu=1 start=572.53 finish=593.61\n        1044651) BenchmarkMutex   cpu=2 start=572.53 finish=593.61\n      1044652) mutex            cpu=9 start=597.62 finish=618.78\n        1044653) BenchmarkMutex   cpu=2 start=597.62 finish=618.78\n      1044654) sh               cpu=9 start=618.79 finish=618.79\n        1044655) sh               cpu=3 start=618.79 finish=618.79\n      1044656) mutex            cpu=1 start=629.03 finish=656.63\n        1044657) BenchmarkMutex   cpu=2 start=629.03 finish=656.63\n      1044658) mutex            cpu=1 start=660.64 finish=688.22\n        1044659) BenchmarkMutex   cpu=2 start=660.64 finish=688.21\n      1044663) mutex            cpu=9 start=692.22 finish=719.79\n        1044664) BenchmarkMutex   cpu=10 start=692.22 finish=719.79\n      1044665) sh               cpu=9 start=719.79 finish=719.79\n        1044666) sh               cpu=10 start=719.79 finish=719.79\n      1044667) mutex            cpu=1 start=730.00 finish=739.59\n        1044668) BenchmarkMutex   cpu=10 start=730.00 finish=739.59\n      1044669) mutex            cpu=1 start=743.60 finish=753.15\n        1044670) BenchmarkMutex   cpu=2 start=743.60 finish=753.15\n      1044671) mutex            cpu=2 start=757.16 finish=766.73\n        1044672) BenchmarkMutex   cpu=3 start=757.16 finish=766.73\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>A test of eight different mutex operations. These look to be single-threaded. Topdown profile shows a high retirement rate overall and some backend stalls depending on the operator. AMD metrics confirm ~0.8 cores of activity and a high retirement rate. <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/mutex\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2056","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2056","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=2056"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2056\/revisions"}],"predecessor-version":[{"id":2067,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2056\/revisions\/2067"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=2056"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}