{"id":735,"date":"2024-01-20T12:52:33","date_gmt":"2024-01-20T12:52:33","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=735"},"modified":"2024-01-20T18:53:45","modified_gmt":"2024-01-20T18:53:45","slug":"simdjson","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/simdjson\/","title":{"rendered":"simdjson"},"content":{"rendered":"\n<p>Json parsing workload with five test cases. Not much variation between the cases at this level, all single threaded.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-43.png\" alt=\"\" class=\"wp-image-759\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-43.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-43-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-43-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown metrics show some variation in first workload vs. other three. Otherwise a higher retirement rate limited by backend stalls.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-81.png\" alt=\"\" class=\"wp-image-761\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-81.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-81-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-81-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show many branches though not much mispredict ratio.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              1167.221\non_cpu               0.058          # 0.93 \/ 16 cores\nutime                947.249\nstime                132.789\nnvcsw                2144           # 31.39%\nnivcsw               4686           # 68.61%\ninblock              0              # 0.00\/sec\nonblock              13520          # 11.58\/sec\ncpu-clock            1060764404934  # 1060.764 seconds\ntask-clock           1043565962599  # 1043.566 seconds\npage faults          4074935        # 3904.818\/sec\ncontext switches     12459          # 11.939\/sec\ncpu migrations       384            # 0.368\/sec\nmajor page faults    2              # 0.002\/sec\nminor page faults    4074933        # 3904.816\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             1623251344069  # 208.543 branches per 1000 inst\nbranch misses        6830289396     # 0.42% branch miss\nconditional          1555765693583  # 199.873 conditional branches per 1000 inst\nindirect             1109680660     # 0.143 indirect branches per 1000 inst\ncpu-cycles           4597218192476  # 0.25 GHz\ninstructions         14599846846048 # 3.18 IPC\nslots                7265075693532  #\nretiring             3534025371404  # 48.6% (48.6%)\n-- ucode             19825160717    #     0.3%\n-- fastpath          3514200210687  #    48.4%\nfrontend             885850299239   # 12.2% (12.2%)\n-- latency           352808287308   #     4.9%\n-- bandwidth         533042011931   #     7.3%\nbackend              2637646765616  # 36.3% (36.3%)\n-- cpu               1449201042303  #    19.9%\n-- memory            1188445723313  #    16.4%\nspeculation          207095188915   #  2.9% ( 2.9%)\n-- branch mispredict 202244588303   #     2.8%\n-- pipeline restart  4850600612     #     0.1%\nsmt-contention       457507570      #  0.0% ( 0.0%)\ncpu-cycles           4580022863126  # 0.25 GHz\ninstructions         14573909729059 # 3.18 IPC\ninstructions         3860925654216  # 48.096 l2 access per 1000 inst\nl2 hit from l1       100834500364   # 32.20% l2 miss\nl2 miss from l1      7616180873     #\nl2 hit from l2 pf    32681720010    #\nl3 hit from l2 pf    31643262792    #\nl3 miss from l2 pf   20537034976    #\ninstructions         3859524617727  # 61.757 float per 1000 inst\nfloat 512            49             # 0.000 AVX-512 per 1000 inst\nfloat 256            20825894754    # 5.396 AVX-256 per 1000 inst\nfloat 128            217526082164   # 56.361 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         7              # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              1064.589\non_cpu               0.057          # 0.92 \/ 16 cores\nutime                900.027\nstime                76.835\nnvcsw                2024           # 31.70%\nnivcsw               4361           # 68.30%\ninblock              5824           # 5.47\/sec\nonblock              2256           # 2.12\/sec\ncpu-clock            958565278780   # 958.565 seconds\ntask-clock           958030477330   # 958.030 seconds\npage faults          3644797        # 3804.469\/sec\ncontext switches     11494          # 11.998\/sec\ncpu migrations       715            # 0.746\/sec\nmajor page faults    21             # 0.022\/sec\nminor page faults    3644776        # 3804.447\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             2034933733227  # 151.323 branches per 1000 inst\nbranch misses        6610203847     # 0.32% branch miss\nconditional          2034933746187  # 151.323 conditional branches per 1000 inst\nindirect             1353675464     # 0.101 indirect branches per 1000 inst\nslots                20711401748834 #\nretiring             9582714603191  # 46.3% (46.3%)\n-- ucode             833592945195   #     4.0%\n-- fastpath          8749121657996  #    42.2%\nfrontend             3817548477815  # 18.4% (18.4%)\n-- latency           1765649812090  #     8.5%\n-- bandwidth         2051898665725  #     9.9%\nbackend              6769475286303  # 32.7% (32.7%)\n-- cpu               2797925769978  #    13.5%\n-- memory            3971549516325  #    19.2%\nspeculation          1539595442119  #  7.4% ( 7.4%)\n-- branch mispredict 1277383990632  #     6.2%\n-- pipeline restart  262211451487   #     1.3%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           3457915967322  # 0.21 GHz\ninstructions         13453460153942 # 3.89 IPC\nl2 access            386536318482   # 28.736 l2 access per 1000 inst\nl2 miss              145523023280   # 37.65% l2 miss\n<\/code><\/pre>\n\n\n\n<p>Process overview<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>380 processes\n\t 15 bench_ondemand         947.56   144.01\n\t 68 clinfo                  16.20     6.66\n\t 38 vulkaninfo               1.31     0.95\n\t  4 vulkani:disk$0           0.14     0.10\n\t  6 glxinfo:gdrv0            0.14     0.09\n\t  6 php                      0.09     0.27\n\t  2 llvmpipe-0               0.07     0.05\n\t  2 llvmpipe-1               0.07     0.05\n\t  2 llvmpipe-10              0.07     0.05\n\t  2 llvmpipe-11              0.07     0.05\n\t  2 llvmpipe-12              0.07     0.05\n\t  2 llvmpipe-13              0.07     0.05\n\t  2 llvmpipe-14              0.07     0.05\n\t  2 llvmpipe-15              0.07     0.05\n\t  2 llvmpipe-2               0.07     0.05\n\t  2 llvmpipe-3               0.07     0.05\n\t  2 llvmpipe-4               0.07     0.05\n\t  2 llvmpipe-5               0.07     0.05\n\t  2 llvmpipe-6               0.07     0.05\n\t  2 llvmpipe-7               0.07     0.05\n\t  2 llvmpipe-8               0.07     0.05\n\t  2 llvmpipe-9               0.07     0.05\n\t  6 clang                    0.06     0.06\n\t  2 glxinfo                  0.06     0.04\n\t  2 glxinfo:cs0              0.06     0.03\n\t  2 glxinfo:disk$0           0.06     0.03\n\t  2 glxinfo:sh0              0.06     0.03\n\t  2 glxinfo:shlo0            0.06     0.03\n\t  3 rocminfo                 0.03     0.00\n\t  1 lspci                    0.00     0.02\n\t  1 ps                       0.00     0.01\n\t 90 sh                       0.00     0.00\n\t 15 simdjson                 0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t 10 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  4 gmain                    0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 dconf worker             0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Computation blocks are straighforward<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      2823492) simdjson         cpu=15 start=5.84  finish=98.89\n        2823493) bench_ondemand   cpu=14 start=5.84  finish=98.89\n      2823497) simdjson         cpu=14 start=102.89 finish=196.91\n        2823498) bench_ondemand   cpu=7 start=102.89 finish=196.91\n      2823501) simdjson         cpu=6 start=200.92 finish=294.99\n        2823502) bench_ondemand   cpu=7 start=200.92 finish=294.99\n      2823504) sh               cpu=8 start=294.99 finish=294.99\n        2823505) sh               cpu=1 start=294.99 finish=294.99\n      2823506) simdjson         cpu=7 start=305.31 finish=378.69\n        2823507) bench_ondemand   cpu=8 start=305.32 finish=378.69\n      2823508) simdjson         cpu=14 start=382.69 finish=455.96\n        2823509) bench_ondemand   cpu=15 start=382.69 finish=455.96\n      2823544) simdjson         cpu=14 start=459.97 finish=533.54\n        2823545) bench_ondemand   cpu=0 start=459.97 finish=533.54\n      2823546) sh               cpu=7 start=533.54 finish=533.54\n        2823547) sh               cpu=0 start=533.54 finish=533.54\n      2823548) simdjson         cpu=6 start=543.72 finish=594.32\n        2823549) bench_ondemand   cpu=7 start=543.73 finish=594.32\n      2823550) simdjson         cpu=6 start=598.32 finish=649.21\n        2823551) bench_ondemand   cpu=15 start=598.32 finish=649.21\n      2823552) simdjson         cpu=6 start=653.21 finish=703.82\n        2823553) bench_ondemand   cpu=15 start=653.21 finish=703.82\n      2823555) sh               cpu=15 start=703.82 finish=703.82\n        2823556) sh               cpu=0 start=703.82 finish=703.82\n      2823558) simdjson         cpu=14 start=714.00 finish=786.58\n        2823559) bench_ondemand   cpu=7 start=714.00 finish=786.58\n      2823631) simdjson         cpu=6 start=790.58 finish=862.67\n        2823632) bench_ondemand   cpu=7 start=790.59 finish=862.67\n      2823634) simdjson         cpu=14 start=866.68 finish=938.66\n        2823635) bench_ondemand   cpu=7 start=866.68 finish=938.66\n      2823639) sh               cpu=0 start=938.66 finish=938.66\n        2823640) sh               cpu=9 start=938.66 finish=938.66\n      2823641) simdjson         cpu=6 start=949.02 finish=1022.77\n        2823642) bench_ondemand   cpu=7 start=949.02 finish=1022.77\n      2823644) simdjson         cpu=14 start=1026.78 finish=1101.20\n        2823645) bench_ondemand   cpu=7 start=1026.78 finish=1101.20\n      2823648) simdjson         cpu=14 start=1105.21 finish=1178.76\n        2823649) bench_ondemand   cpu=7 start=1105.21 finish=1178.75\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Json parsing workload with five test cases. Not much variation between the cases at this level, all single threaded. Topdown metrics show some variation in first workload vs. other three. Otherwise a higher retirement rate limited by backend stalls. AMD <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/simdjson\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-735","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/735","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=735"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/735\/revisions"}],"predecessor-version":[{"id":762,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/735\/revisions\/762"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=735"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}