{"id":1776,"date":"2024-02-23T01:40:13","date_gmt":"2024-02-23T01:40:13","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=1776"},"modified":"2024-02-27T00:42:02","modified_gmt":"2024-02-27T00:42:02","slug":"pyhpc","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/pyhpc\/","title":{"rendered":"pyhpc"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">A suite of Python HPC benchmarks that run on both the CPU and the GPU. There are multiple backends but my system seemed to only have numpy available with JAX, Numba and Aesara missing and Tensorflow and Pytorch not chosen.  So two workloads from twelve to run. This workload looks single-threaded in middle of the chart with edges being JAX, Numba and Aesara attempts.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-58.png\" alt=\"\" class=\"wp-image-1791\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-58.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-58-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-58-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Topdown profile shows a mix between workload attempts with frontend stalls on failing cases and backend stalls on passing ones.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-60.png\" alt=\"\" class=\"wp-image-1792\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-60.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-60-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-60-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">AMD metrics include some floating point and balance of frontend and backend stalls. I expect this can vary depending on backends chosen.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              214.590\non_cpu               0.029          # 0.46 \/ 16 cores\nutime                65.401\nstime                33.947\nnvcsw                2373           # 70.79%\nnivcsw               979            # 29.21%\ninblock              0              # 0.00\/sec\nonblock              2256           # 10.51\/sec\ncpu-clock            99401250819    # 99.401 seconds\ntask-clock           99408371231    # 99.408 seconds\npage faults          9395772        # 94516.909\/sec\ncontext switches     4242           # 42.672\/sec\ncpu migrations       222            # 2.233\/sec\nmajor page faults    0              # 0.000\/sec\nminor page faults    9395772        # 94516.909\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             103193296965   # 153.981 branches per 1000 inst\nbranch misses        4321031643     # 4.19% branch miss\nconditional          61129993121    # 91.216 conditional branches per 1000 inst\nindirect             10014830714    # 14.944 indirect branches per 1000 inst\ncpu-cycles           421731065390   # 0.12 GHz\ninstructions         662316153608   # 1.57 IPC\nslots                852041413506   #\nretiring             239696482165   # 28.1% (29.0%)\n-- ucode             1192405255     #     0.1%\n-- fastpath          238504076910   #    28.0%\nfrontend             235886914660   # 27.7% (28.6%)\n-- latency           182114380362   #    21.4%\n-- bandwidth         53772534298    #     6.3%\nbackend              321872190110   # 37.8% (39.0%)\n-- cpu               119350501661   #    14.0%\n-- memory            202521688449   #    23.8%\nspeculation          28443856449    #  3.3% ( 3.4%)\n-- branch mispredict 28180477317    #     3.3%\n-- pipeline restart  263379132      #     0.0%\nsmt-contention       26141421008    #  3.1% ( 0.0%)\ncpu-cycles           420627774223   # 0.12 GHz\ninstructions         667713898623   # 1.59 IPC\ninstructions         221072463900   # 81.519 l2 access per 1000 inst\nl2 hit from l1       9622992118     # 31.34% l2 miss\nl2 miss from l1      512043469      #\nl2 hit from l2 pf    3262559785     #\nl3 hit from l2 pf    3743710843     #\nl3 miss from l2 pf   1392296327     #\ninstructions         222193823037   # 108.414 float per 1000 inst\nfloat 512            58             # 0.000 AVX-512 per 1000 inst\nfloat 256            95640867       # 0.430 AVX-256 per 1000 inst\nfloat 128            23993315635    # 107.984 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         40             # 0.000 scalar per 1000 inst\ninstructions         2376781        #\nopcache              893250         # 375.823 opcache per 1000 inst\nopcache miss         473918         # 53.1% opcache miss rate\nl1 dTLB miss         4172           # 1.755 L1 dTLB per 1000 inst\nl2 dTLB miss         1012           # 0.426 L2 dTLB per 1000 inst\ninstructions         2402871        #\nicache               1176420        # 489.589 icache per 1000 inst\nicache miss          108062         #  9.2% icache miss rate\nl1 iTLB miss         10             # 0.004 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            20             # 0.008 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              237.303\non_cpu               0.032          # 0.51 \/ 16 cores\nutime                91.367\nstime                29.632\nnvcsw                2776           # 4.58%\nnivcsw               57898          # 95.42%\ninblock              30536          # 128.68\/sec\nonblock              2200           # 9.27\/sec\ncpu-clock            121049310913   # 121.049 seconds\ntask-clock           121056968018   # 121.057 seconds\npage faults          9552684        # 78910.650\/sec\ncontext switches     61654          # 509.297\/sec\ncpu migrations       388            # 3.205\/sec\nmajor page faults    240            # 1.983\/sec\nminor page faults    9552444        # 78908.667\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             124270109121   # 151.128 branches per 1000 inst\nbranch misses        1427179445     # 1.15% branch miss\nconditional          124270133921   # 151.128 conditional branches per 1000 inst\nindirect             18073500647    # 21.980 indirect branches per 1000 inst\nslots                2221669248344  #\nretiring             747027275446   # 33.6% (33.6%)\n-- ucode             73452005367    #     3.3%\n-- fastpath          673575270079   #    30.3%\nfrontend             279668953965   # 12.6% (12.6%)\n-- latency           122976164207   #     5.5%\n-- bandwidth         156692789758   #     7.1%\nbackend              996700033973   # 44.9% (44.9%)\n-- cpu               314848326319   #    14.2%\n-- memory            681851707654   #    30.7%\nspeculation          205285930989   #  9.2% ( 9.2%)\n-- branch mispredict 193227678611   #     8.7%\n-- pipeline restart  12058252378    #     0.5%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           398681714399   # 0.11 GHz\ninstructions         745549574653   # 1.87 IPC\nl2 access            45007184463    # 60.675 l2 access per 1000 inst\nl2 miss              25658347372    # 57.01% l2 miss\ncpu-cycles           397444517269   # 35.9% memory latency\nload stalls          106883580882   #  0.0% l1 bound\nl1 miss              123115394313   # 14.8% l2 bound\nl2 miss              64305341449    #  6.9% l3 bound\nl3 miss              36690735023    #  9.2% dram bound\nstore_stalls         35742162866    #  9.0% store bound\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Process summary highlights these are python driven tests.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>661 processes\n\t385 python3               1031.50   534.99\n\t 38 vulkaninfo               1.51     1.34\n\t  6 glxinfo:gdrv0            0.19     0.07\n\t  6 glxinfo:gl0              0.19     0.07\n\t  4 vulkani:disk$0           0.16     0.15\n\t  6 php                      0.11     0.14\n\t  2 glxinfo                  0.09     0.03\n\t  2 glxinfo:cs0              0.09     0.03\n\t  2 glxinfo:disk$0           0.09     0.03\n\t  2 glxinfo:sh0              0.09     0.03\n\t  2 glxinfo:shlo0            0.09     0.03\n\t  2 llvmpipe-0               0.08     0.07\n\t  2 llvmpipe-1               0.08     0.07\n\t  2 llvmpipe-10              0.08     0.07\n\t  2 llvmpipe-11              0.08     0.07\n\t  2 llvmpipe-12              0.08     0.07\n\t  2 llvmpipe-13              0.08     0.07\n\t  2 llvmpipe-14              0.08     0.07\n\t  2 llvmpipe-15              0.08     0.07\n\t  2 llvmpipe-2               0.08     0.07\n\t  2 llvmpipe-3               0.08     0.07\n\t  2 llvmpipe-4               0.08     0.07\n\t  2 llvmpipe-5               0.08     0.07\n\t  2 llvmpipe-6               0.08     0.07\n\t  2 llvmpipe-7               0.08     0.07\n\t  2 llvmpipe-8               0.08     0.07\n\t  2 llvmpipe-9               0.08     0.07\n\t  1 lspci                    0.01     0.02\n\t  1 ps                       0.00     0.01\n\t 70 sh                       0.00     0.00\n\t 24 pyhpc                    0.00     0.00\n\t 12 gcc                      0.00     0.00\n\t 10 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  3 dconf worker             0.00     0.00\n\t  3 gmain                    0.00     0.00\n\t  2 clinfo                   0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 cc                       0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Example execution block<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      130168) pyhpc            cpu=3 start=78.30 finish=84.94\n        130169) python3          cpu=12 start=78.30 finish=84.94\n          130170) python3          cpu=13 start=78.33 finish=84.93\n          130171) python3          cpu=6 start=78.33 finish=84.93\n          130172) python3          cpu=15 start=78.33 finish=84.93\n          130173) python3          cpu=0 start=78.33 finish=84.93\n          130174) python3          cpu=9 start=78.33 finish=84.93\n          130175) python3          cpu=2 start=78.33 finish=84.93\n          130176) python3          cpu=11 start=78.33 finish=84.93\n          130177) python3          cpu=4 start=78.33 finish=84.93\n          130178) python3          cpu=14 start=78.34 finish=84.93\n          130179) python3          cpu=7 start=78.34 finish=84.93\n          130180) python3          cpu=8 start=78.34 finish=84.93\n          130181) python3          cpu=1 start=78.34 finish=84.93\n          130182) python3          cpu=5 start=78.34 finish=84.93\n          130183) python3          cpu=10 start=78.34 finish=84.93\n          130184) python3          cpu=3 start=78.34 finish=84.93\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Overall a test that can be elaborated further to really exercise particular backends, though also not long running.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A suite of Python HPC benchmarks that run on both the CPU and the GPU. There are multiple backends but my system seemed to only have numpy available with JAX, Numba and Aesara missing and Tensorflow and Pytorch not chosen. <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/pyhpc\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-1776","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1776","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=1776"}],"version-history":[{"count":3,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1776\/revisions"}],"predecessor-version":[{"id":1830,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1776\/revisions\/1830"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=1776"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}