{"id":686,"date":"2024-01-19T10:44:20","date_gmt":"2024-01-19T10:44:20","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=686"},"modified":"2024-01-19T10:44:20","modified_gmt":"2024-01-19T10:44:20","slug":"numpy","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/numpy\/","title":{"rendered":"numpy"},"content":{"rendered":"\n<p>A single threaded test of the numpy library<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-35.png\" alt=\"\" class=\"wp-image-687\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-35.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-35-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-35-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown shows more frontend activity and not much backend. Also looks like the test has a few phases.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-72.png\" alt=\"\" class=\"wp-image-688\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-72.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-72-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-72-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show a moderate amount of floating point and some L2 access.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              361.124\non_cpu               0.060          # 0.96 \/ 16 cores\nutime                323.026\nstime                23.631\nnvcsw                2254           # 52.37%\nnivcsw               2050           # 47.63%\ninblock              0              # 0.00\/sec\nonblock              14272          # 39.52\/sec\ncpu-clock            346708703424   # 346.709 seconds\ntask-clock           346714252381   # 346.714 seconds\npage faults          13793915       # 39784.678\/sec\ncontext switches     5835           # 16.829\/sec\ncpu migrations       342            # 0.986\/sec\nmajor page faults    2              # 0.006\/sec\nminor page faults    13793913       # 39784.673\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             753903919180   # 192.310 branches per 1000 inst\nbranch misses        3223543724     # 0.43% branch miss\nconditional          548538917674   # 139.924 conditional branches per 1000 inst\nindirect             50159713959    # 12.795 indirect branches per 1000 inst\ncpu-cycles           1631214371880  # 0.28 GHz\ninstructions         3910901560480  # 2.40 IPC\nslots                3270777827046  #\nretiring             1318275595981  # 40.3% (40.3%)\n-- ucode             4213617091     #     0.1%\n-- fastpath          1314061978890  #    40.2%\nfrontend             1316680239451  # 40.3% (40.3%)\n-- latency           692593480368   #    21.2%\n-- bandwidth         624086759083   #    19.1%\nbackend              574041611376   # 17.6% (17.6%)\n-- cpu               115146122764   #     3.5%\n-- memory            458895488612   #    14.0%\nspeculation          61567308765    #  1.9% ( 1.9%)\n-- branch mispredict 50900498632    #     1.6%\n-- pipeline restart  10666810133    #     0.3%\nsmt-contention       212699336      #  0.0% ( 0.0%)\ncpu-cycles           1614316899243  # 0.28 GHz\ninstructions         3910468849113  # 2.42 IPC\ninstructions         1305538175558  # 81.788 l2 access per 1000 inst\nl2 hit from l1       84691082021    # 8.38% l2 miss\nl2 miss from l1      1604500894     #\nl2 hit from l2 pf    14740210877    #\nl3 hit from l2 pf    5624516569     #\nl3 miss from l2 pf   1721642011     #\ninstructions         1303230480870  # 62.065 float per 1000 inst\nfloat 512            104            # 0.000 AVX-512 per 1000 inst\nfloat 256            118848212      # 0.091 AVX-256 per 1000 inst\nfloat 128            80766728853    # 61.974 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         338            # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics also show many branches<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              396.775\non_cpu               0.060          # 0.97 \/ 16 cores\nutime                361.739\nstime                21.166\nnvcsw                2203           # 51.17%\nnivcsw               2102           # 48.83%\ninblock              1344           # 3.39\/sec\nonblock              2976           # 7.50\/sec\ncpu-clock            382925715780   # 382.926 seconds\ntask-clock           382930356389   # 382.930 seconds\npage faults          14274949       # 37278.186\/sec\ncontext switches     6020           # 15.721\/sec\ncpu migrations       430            # 1.123\/sec\nmajor page faults    6              # 0.016\/sec\nminor page faults    14274943       # 37278.170\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             759460727990   # 189.477 branches per 1000 inst\nbranch misses        1094303570     # 0.14% branch miss\nconditional          759460743158   # 189.477 conditional branches per 1000 inst\nindirect             50988694116    # 12.721 indirect branches per 1000 inst\nslots                8587423249148  #\nretiring             3904258871034  # 45.5% (45.5%)\n-- ucode             352967547446   #     4.1%\n-- fastpath          3551291323588  #    41.4%\nfrontend             2737042962078  # 31.9% (31.9%)\n-- latency           611472582951   #     7.1%\n-- bandwidth         2125570379127  #    24.8%\nbackend              1721573209956  # 20.0% (20.0%)\n-- cpu               506032738355   #     5.9%\n-- memory            1215540471601  #    14.2%\nspeculation          235846863912   #  2.7% ( 2.7%)\n-- branch mispredict 160947414957   #     1.9%\n-- pipeline restart  74899448955    #     0.9%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           1433601238554  # 0.23 GHz\ninstructions         4003580216381  # 2.79 IPC\nl2 access            270788452542   # 67.664 l2 access per 1000 inst\nl2 miss              34377572620    # 12.70% l2 miss\n<\/code><\/pre>\n\n\n\n<p>Process overview shows this is python driven workload<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>448 processes\n\t 94 python3                322.40    21.76\n\t 68 clinfo                  15.91     6.33\n\t 38 vulkaninfo               1.13     1.14\n\t  4 vulkani:disk$0           0.12     0.12\n\t  6 glxinfo:gdrv0            0.12     0.09\n\t  6 clang                    0.07     0.05\n\t  6 php                      0.06     0.07\n\t  2 llvmpipe-0               0.06     0.06\n\t  2 llvmpipe-1               0.06     0.06\n\t  2 llvmpipe-10              0.06     0.06\n\t  2 llvmpipe-11              0.06     0.06\n\t  2 llvmpipe-12              0.06     0.06\n\t  2 llvmpipe-13              0.06     0.06\n\t  2 llvmpipe-14              0.06     0.06\n\t  2 llvmpipe-15              0.06     0.06\n\t  2 llvmpipe-2               0.06     0.06\n\t  2 llvmpipe-3               0.06     0.06\n\t  2 llvmpipe-4               0.06     0.06\n\t  2 llvmpipe-5               0.06     0.06\n\t  2 llvmpipe-6               0.06     0.06\n\t  2 llvmpipe-7               0.06     0.06\n\t  2 llvmpipe-8               0.06     0.06\n\t  2 llvmpipe-9               0.06     0.06\n\t  2 glxinfo                  0.06     0.04\n\t  2 glxinfo:cs0              0.06     0.04\n\t  2 glxinfo:disk$0           0.06     0.04\n\t  2 glxinfo:sh0              0.06     0.03\n\t  2 glxinfo:shlo0            0.06     0.03\n\t  3 run.sh                   0.03     0.03\n\t  3 rocminfo                 0.00     0.03\n\t  1 lspci                    0.00     0.02\n\t  1 ps                       0.00     0.01\n\t 83 sh                       0.00     0.00\n\t 12 gcc                      0.00     0.00\n\t 11 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  4 dirname                  0.00     0.00\n\t  4 gmain                    0.00     0.00\n\t  3 cat                      0.00     0.00\n\t  3 numpy                    0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 cc                       0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dconf worker             0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 python                   0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Looks like a set of small python tests run in sequence.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      2560738) numpy            cpu=9 start=5.76  finish=119.95\n        2560739) run.sh           cpu=2 start=5.76  finish=119.94\n          2560740) dirname          cpu=12 start=5.79  finish=5.79 \n          2560741) python3          cpu=3 start=5.79  finish=6.55 \n          2560742) python3          cpu=5 start=6.55  finish=7.90 \n          2560743) python3          cpu=11 start=7.90  finish=8.75 \n          2560744) python3          cpu=11 start=8.75  finish=11.64\n          2560745) python3          cpu=4 start=11.65 finish=11.80\n          2560746) python3          cpu=6 start=11.80 finish=15.29\n          2560748) python3          cpu=3 start=15.30 finish=15.64\n          2560749) python3          cpu=4 start=15.65 finish=16.14\n          2560750) python3          cpu=3 start=16.14 finish=16.39\n          2560751) python3          cpu=4 start=16.39 finish=16.66\n          2560752) python3          cpu=3 start=16.66 finish=18.28\n          2560753) python3          cpu=4 start=18.28 finish=25.17\n          2560754) python3          cpu=5 start=25.17 finish=30.43\n          2560755) python3          cpu=3 start=30.43 finish=32.69\n          2560756) python3          cpu=4 start=32.69 finish=40.84\n          2560757) python3          cpu=5 start=40.85 finish=43.38\n          2560758) python3          cpu=11 start=43.38 finish=44.33\n          2560759) python3          cpu=4 start=44.33 finish=45.02\n          2560760) python3          cpu=5 start=45.02 finish=46.48\n          2560761) python3          cpu=3 start=46.49 finish=47.16\n          2560762) python3          cpu=4 start=47.16 finish=49.62\n          2560763) python3          cpu=11 start=49.62 finish=49.70\n          2560764) python3          cpu=4 start=49.70 finish=50.16\n          2560765) python3          cpu=11 start=50.16 finish=50.22\n          2560766) python3          cpu=4 start=50.22 finish=50.44\n          2560767) python3          cpu=11 start=50.45 finish=56.22\n          2560768) python3          cpu=4 start=56.22 finish=116.88\n          2560769) python3          cpu=11 start=116.89 finish=117.22\n          2560770) python3          cpu=4 start=117.22 finish=118.51\n          2560771) python3          cpu=3 start=118.52 finish=119.94\n        2560772) cat              cpu=2 start=119.94 finish=119.94\n        2560773) python3          cpu=3 start=119.94 finish=119.95\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>A single threaded test of the numpy library Topdown shows more frontend activity and not much backend. Also looks like the test has a few phases. AMD metrics show a moderate amount of floating point and some L2 access. Intel <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/numpy\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-686","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/686","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=686"}],"version-history":[{"count":1,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/686\/revisions"}],"predecessor-version":[{"id":689,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/686\/revisions\/689"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=686"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}