{"id":1082,"date":"2024-01-29T11:27:09","date_gmt":"2024-01-29T11:27:09","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=1082"},"modified":"2024-01-30T01:54:32","modified_gmt":"2024-01-30T01:54:32","slug":"pyperformance","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/pyperformance\/","title":{"rendered":"pyperformance"},"content":{"rendered":"\n<p>PyPerformance is the reference Python performance benchmark suite. It takes approximately 20x longer to run than pybench benchmark.It is however still single threaded. There are 13 subtests below with slightly different profiles and a few that are interrupt driven.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-88.png\" alt=\"\" class=\"wp-image-1118\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-88.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-88-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-88-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows slight differences among the 13 workloads with most having high retirement rates but then some backend stalls and frontend stalls.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-125.png\" alt=\"\" class=\"wp-image-1120\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-125.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-125-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-125-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics are a composite. This is still a workload with little floating point.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              1353.957\non_cpu               0.052          # 0.83 \/ 16 cores\nutime                973.990\nstime                147.482\nnvcsw                94747          # 89.19%\nnivcsw               11484          # 10.81%\ninblock              0              # 0.00\/sec\nonblock              2035664        # 1503.49\/sec\ncpu-clock            1119830961587  # 1119.831 seconds\ntask-clock           1119904626937  # 1119.905 seconds\npage faults          29943174       # 26737.254\/sec\ncontext switches     84047          # 75.048\/sec\ncpu migrations       3839           # 3.428\/sec\nmajor page faults    2              # 0.002\/sec\nminor page faults    29943172       # 26737.252\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             2994321577851  # 195.311 branches per 1000 inst\nbranch misses        28649603561    # 0.96% branch miss\nconditional          2425795071846  # 158.228 conditional branches per 1000 inst\nindirect             227999745660   # 14.872 indirect branches per 1000 inst\ncpu-cycles           4689348425248  # 0.22 GHz\ninstructions         14736906286871 # 3.14 IPC high\nslots                9924238921752  #\nretiring             4920694875718  # 49.6% (49.6%)\n-- ucode             11882730629    #     0.1%\n-- fastpath          4908812145089  #    49.5%\nfrontend             2414939339008  # 24.3% (24.3%)\n-- latency           1660300722108  #    16.7%\n-- bandwidth         754638616900   #     7.6%\nbackend              2004798551184  # 20.2% (20.2%)\n-- cpu               309889045759   #     3.1%\n-- memory            1694909505425  #    17.1%\nspeculation          582435602064   #  5.9% ( 5.9%)\n-- branch mispredict 521857008302   #     5.3%\n-- pipeline restart  60578593762    #     0.6%\nsmt-contention       1369112675     #  0.0% ( 0.0%)\ncpu-cycles           4558877170353  # 0.22 GHz\ninstructions         14265709084002 # 3.13 IPC high\ninstructions         4920908278314  # 22.496 l2 access per 1000 inst\nl2 hit from l1       103761201510   # 8.27% l2 miss\nl2 miss from l1      5618730997     #\nl2 hit from l2 pf    3397082098     #\nl3 hit from l2 pf    2981238540     #\nl3 miss from l2 pf   560005053      #\ninstructions         4916752155881  # 10.649 float per 1000 inst\nfloat 512            8949           # 0.000 AVX-512 per 1000 inst\nfloat 256            58794          # 0.000 AVX-256 per 1000 inst\nfloat 128            52357282574    # 10.649 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         7035           # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              1643.309\non_cpu               0.052          # 0.83 \/ 16 cores\nutime                1284.497\nstime                87.323\nnvcsw                119014         # 85.69%\nnivcsw               19872          # 14.31%\ninblock              99088          # 60.30\/sec\nonblock              2667160        # 1623.04\/sec\ncpu-clock            1368265116188  # 1368.265 seconds\ntask-clock           1368486534913  # 1368.487 seconds\npage faults          30647479       # 22395.163\/sec\ncontext switches     118152         # 86.338\/sec\ncpu migrations       2880           # 2.105\/sec\nmajor page faults    426            # 0.311\/sec\nminor page faults    30647053       # 22394.852\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             3333799503012  # 193.531 branches per 1000 inst\nbranch misses        17891763254    # 0.54% branch miss\nconditional          3333800440068  # 193.531 conditional branches per 1000 inst\nindirect             263760873573   # 15.312 indirect branches per 1000 inst\nslots                29643189116246 #\nretiring             16016698426036 # 54.0% (54.0%) high\n-- ucode             951310250035   #     3.2%\n-- fastpath          15065388176001 #    50.8%\nfrontend             6104719705607  # 20.6% (20.6%)\n-- latency           2443560383669  #     8.2%\n-- bandwidth         3661159321938  #    12.4%\nbackend              4195516677023  # 14.2% (14.2%) low\n-- cpu               3110800077212  #    10.5%\n-- memory            1084716599811  #     3.7%\nspeculation          3297960168706  # 11.1% (11.1%) high\n-- branch mispredict 2388958032173  #     8.1%\n-- pipeline restart  909002136533   #     3.1%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           4884162551636  # 0.19 GHz\ninstructions         16802905156491 # 3.44 IPC high\nl2 access            272007867054   # 16.198 l2 access per 1000 inst\nl2 miss              29076273170    # 10.69% l2 miss\n<\/code><\/pre>\n\n\n\n<p>Process overview shows a large number of python processes started<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>29122 processes\n\t25720 python                 856.28    58.98\n\t 68 clinfo                  16.87     5.99\n\t 38 vulkaninfo               1.12     1.31\n\t 39 python3.10               0.43     0.05\n\t  6 php                      0.13     0.42\n\t  4 vulkani:disk$0           0.11     0.13\n\t  6 glxinfo:gdrv0            0.09     0.09\n\t  6 glxinfo:gl0              0.09     0.09\n\t  2 llvmpipe-0               0.06     0.07\n\t  2 llvmpipe-1               0.06     0.07\n\t  2 llvmpipe-10              0.06     0.07\n\t  2 llvmpipe-11              0.06     0.07\n\t  2 llvmpipe-12              0.06     0.07\n\t  2 llvmpipe-13              0.06     0.07\n\t  2 llvmpipe-14              0.06     0.07\n\t  2 llvmpipe-15              0.06     0.07\n\t  2 llvmpipe-2               0.06     0.07\n\t  2 llvmpipe-3               0.06     0.07\n\t  2 llvmpipe-4               0.06     0.07\n\t  2 llvmpipe-5               0.06     0.07\n\t  2 llvmpipe-6               0.06     0.07\n\t  2 llvmpipe-7               0.06     0.07\n\t  2 llvmpipe-8               0.06     0.07\n\t  2 llvmpipe-9               0.06     0.07\n\t  6 clang                    0.06     0.06\n\t  2 glxinfo                  0.05     0.04\n\t  2 glxinfo:cs0              0.05     0.04\n\t  2 glxinfo:disk$0           0.05     0.04\n\t  2 glxinfo:sh0              0.05     0.04\n\t  2 glxinfo:shlo0            0.05     0.04\n\t  3 rocminfo                 0.04     0.00\n\t  1 lspci                    0.00     0.02\n\t  1 ps                       0.00     0.01\n\t1601 uname                    0.00     0.00\n\t1353 file                     0.00     0.00\n\t107 sh                       0.00     0.00\n\t 39 pyperformance            0.00     0.00\n\t 12 gcc                      0.00     0.00\n\t  8 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 gmain                    0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  3 dconf worker             0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 cc                       0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 python3                  0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>PyPerformance is the reference Python performance benchmark suite. It takes approximately 20x longer to run than pybench benchmark.It is however still single threaded. There are 13 subtests below with slightly different profiles and a few that are interrupt driven. Topdown <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/pyperformance\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-1082","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1082","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=1082"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1082\/revisions"}],"predecessor-version":[{"id":1121,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1082\/revisions\/1121"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=1082"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}