{"id":632,"date":"2024-01-16T22:04:05","date_gmt":"2024-01-16T22:04:05","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=632"},"modified":"2024-01-16T22:04:06","modified_gmt":"2024-01-16T22:04:06","slug":"quicksilver","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/quicksilver\/","title":{"rendered":"quicksilver"},"content":{"rendered":"\n<p>quicksilver is a proxy app developed by LLNL. The source is <a href=\"https:\/\/github.com\/LLNL\/Quicksilver\">here<\/a>. There are three workloads. Somehow the AMD processor is much more stable with the workload while the Intel processor needs more iterations to reduce the deviation, particularly on the third workload. The system overview shows full usage of the CPU with a constant set of runnable processes.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-27.png\" alt=\"\" class=\"wp-image-633\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-27.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-27-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-27-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>The topdown metrics shows a reasonable retirement rate that is limited by backend stalls and with low frontend stalls.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-64.png\" alt=\"\" class=\"wp-image-634\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-64.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-64-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-64-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show floating point code with a low branch miss rate and small amount of L2 access.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              2243.218\non_cpu               0.962          # 15.40 \/ 16 cores\nutime                34535.396\nstime                4.859\nnvcsw                47991          # 14.52%\nnivcsw               282447         # 85.48%\ninblock              0              # 0.00\/sec\nonblock              14152          # 6.31\/sec\ncpu-clock            34541446980854 # 34541.447 seconds\ntask-clock           34541661660234 # 34541.662 seconds\npage faults          928862         # 26.891\/sec\ncontext switches     341458         # 9.885\/sec\ncpu migrations       748            # 0.022\/sec\nmajor page faults    2              # 0.000\/sec\nminor page faults    928860         # 26.891\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             33807731553674 # 139.022 branches per 1000 inst\nbranch misses        289158326921   # 0.86% branch miss\nconditional          24835473487220 # 102.127 conditional branches per 1000 inst\nindirect             93366335290    # 0.384 indirect branches per 1000 inst\ncpu-cycles           301810730461036 # 3.82 GHz\ninstructions         557270879640021 # 1.85 IPC\nslots                603566322131106 #\nretiring             200786255732776 # 33.3% (53.8%)\n-- ucode             119011016871   #     0.0%\n-- fastpath          200667244715905 #    33.2%\nfrontend             25741282012149 #  4.3% ( 6.9%)\n-- latency           14193596857908 #     2.4%\n-- bandwidth         11547685154241 #     1.9%\nbackend              131803639022099 # 21.8% (35.3%)\n-- cpu               65456513788431 #    10.8%\n-- memory            66347125233668 #    11.0%\nspeculation          15093195866873 #  2.5% ( 4.0%)\n-- branch mispredict 14790109025347 #     2.5%\n-- pipeline restart  303086841526   #     0.1%\nsmt-contention       230141589715974 # 38.1% ( 0.0%)\ncpu-cycles           136527883836719 # 3.79 GHz\ninstructions         243228511528062 # 1.78 IPC\ninstructions         81079749463259 # 22.298 l2 access per 1000 inst\nl2 hit from l1       1388992535076  # 6.77% l2 miss\nl2 miss from l1      58670581772    #\nl2 hit from l2 pf    355183403163   #\nl3 hit from l2 pf    54030832873    #\nl3 miss from l2 pf   9713216835     #\ninstructions         81035206592148 # 181.052 float per 1000 inst\nfloat 512            63             # 0.000 AVX-512 per 1000 inst\nfloat 256            600            # 0.000 AVX-256 per 1000 inst\nfloat 128            14671559696429 # 181.052 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         5              # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics also reflect the longer runtime.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              5771.167\non_cpu               0.966          # 15.45 \/ 16 cores\nutime                89151.589\nstime                8.509\nnvcsw                200322         # 25.08%\nnivcsw               598411         # 74.92%\ninblock              0              # 0.00\/sec\nonblock              3976           # 0.69\/sec\ncpu-clock            89157847905491 # 89157.848 seconds\ntask-clock           89158226755683 # 89158.227 seconds\npage faults          1424428        # 15.976\/sec\ncontext switches     827367         # 9.280\/sec\ncpu migrations       60452          # 0.678\/sec\nmajor page faults    0              # 0.000\/sec\nminor page faults    1424428        # 15.976\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             50932134417907 # 143.312 branches per 1000 inst\nbranch misses        698619389499   # 1.37% branch miss\nconditional          50932134437843 # 143.312 conditional branches per 1000 inst\nindirect             12618507409368 # 35.506 indirect branches per 1000 inst\nslots                440838563740580 #\nretiring             173635425896210 # 39.4% (39.4%)\n-- ucode             7860615540594  #     1.8%\n-- fastpath          165774810355616 #    37.6%\nfrontend             60186415101796 # 13.7% (13.7%)\n-- latency           36475838824133 #     8.3%\n-- bandwidth         23710576277663 #     5.4%\nbackend              168243321796642 # 38.2% (38.2%)\n-- cpu               122576607097747 #    27.8%\n-- memory            45666714698895 #    10.4%\nspeculation          39460472909226 #  9.0% ( 9.0%)\n-- branch mispredict 39056742849763 #     8.9%\n-- pipeline restart  403730059463   #     0.1%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           285841270645200 # 3.06 GHz\ninstructions         351599636598954 # 1.23 IPC\nl2 access            1962502029154  # 11.053 l2 access per 1000 inst\nl2 miss              448311834186   # 22.84% l2 miss\n<\/code><\/pre>\n\n\n\n<p>Process overview i straightforward<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>499 processes\n\t144 qs                   552276.80    51.29\n\t 68 clinfo                  15.87     7.65\n\t 38 vulkaninfo               0.76     1.52\n\t  6 php                      0.24     0.12\n\t  6 glxinfo:gdrv0            0.12     0.12\n\t  4 vulkani:disk$0           0.08     0.16\n\t  6 clang                    0.07     0.05\n\t  2 glxinfo                  0.06     0.06\n\t  2 glxinfo:cs0              0.06     0.06\n\t  2 glxinfo:disk$0           0.06     0.05\n\t  2 glxinfo:sh0              0.06     0.04\n\t  2 glxinfo:shlo0            0.06     0.04\n\t  2 llvmpipe-0               0.04     0.08\n\t  2 llvmpipe-1               0.04     0.08\n\t  2 llvmpipe-10              0.04     0.08\n\t  2 llvmpipe-11              0.04     0.08\n\t  2 llvmpipe-12              0.04     0.08\n\t  2 llvmpipe-13              0.04     0.08\n\t  2 llvmpipe-14              0.04     0.08\n\t  2 llvmpipe-15              0.04     0.08\n\t  2 llvmpipe-2               0.04     0.08\n\t  2 llvmpipe-3               0.04     0.08\n\t  2 llvmpipe-4               0.04     0.08\n\t  2 llvmpipe-5               0.04     0.08\n\t  2 llvmpipe-6               0.04     0.08\n\t  2 llvmpipe-7               0.04     0.08\n\t  2 llvmpipe-8               0.04     0.08\n\t  2 llvmpipe-9               0.04     0.08\n\t  1 lspci                    0.01     0.02\n\t  3 rocminfo                 0.00     0.03\n\t  1 ps                       0.00     0.01\n\t 86 sh                       0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t 11 gsettings                0.00     0.00\n\t  9 quicksilver              0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  4 gmain                    0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dconf worker             0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Following is the compute structure<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      105523) quicksilver      cpu=5 start=6.09  finish=441.82\n        105524) qs               cpu=6 start=6.10  finish=441.82\n          105525) qs               cpu=5 start=6.29  finish=441.82\n          105526) qs               cpu=8 start=6.29  finish=441.82\n          105527) qs               cpu=3 start=6.29  finish=441.82\n          105528) qs               cpu=7 start=6.29  finish=441.82\n          105529) qs               cpu=9 start=6.29  finish=441.82\n          105530) qs               cpu=4 start=6.29  finish=441.82\n          105531) qs               cpu=14 start=6.29  finish=441.82\n          105532) qs               cpu=2 start=6.29  finish=441.82\n          105533) qs               cpu=15 start=6.29  finish=441.82\n          105534) qs               cpu=0 start=6.29  finish=441.82\n          105535) qs               cpu=1 start=6.29  finish=441.82\n          105536) qs               cpu=10 start=6.29  finish=441.82\n          105537) qs               cpu=11 start=6.29  finish=441.82\n          105538) qs               cpu=12 start=6.29  finish=441.82\n          105539) qs               cpu=13 start=6.29  finish=441.82\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>quicksilver is a proxy app developed by LLNL. The source is here. There are three workloads. Somehow the AMD processor is much more stable with the workload while the Intel processor needs more iterations to reduce the deviation, particularly on <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/quicksilver\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-632","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/632","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=632"}],"version-history":[{"count":1,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/632\/revisions"}],"predecessor-version":[{"id":635,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/632\/revisions\/635"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=632"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}