{"id":1320,"date":"2024-02-03T01:32:07","date_gmt":"2024-02-03T01:32:07","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=1320"},"modified":"2024-02-04T10:40:17","modified_gmt":"2024-02-04T10:40:17","slug":"ai-benchmark","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/ai-benchmark\/","title":{"rendered":"ai-benchmark"},"content":{"rendered":"\n<p>A python library which uses tensorflow and measures various AI models. One test that reports both training and inference scores. The number of running processes seems to bounce to the number of cores and single threaded and then in between. Looks like it also varies with sub workloads.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-13.png\" alt=\"\" class=\"wp-image-1340\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-13.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-13-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-13-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown metrics are similar to the &#8220;tensorflow&#8221; test in being heavily backend bound. That test has more regular patterns while this one goes to different sub tests.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-13.png\" alt=\"\" class=\"wp-image-1341\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-13.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-13-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-13-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show not much floating point code and a high backend stalls and low retiring and frontend stalls. This report is one of the first with opcache, tlb and icache statistics.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              1051.310\non_cpu               0.735          # 11.76 \/ 16 cores\nutime                12130.094\nstime                232.507\nnvcsw                15457321       # 97.25%\nnivcsw               436597         # 2.75%\ninblock              0              # 0.00\/sec\nonblock              12584          # 11.97\/sec\ncpu-clock            12368006983271 # 12368.007 seconds\ntask-clock           12373062836640 # 12373.063 seconds\npage faults          26680062       # 2156.302\/sec\ncontext switches     15898993       # 1284.968\/sec\ncpu migrations       3022862        # 244.310\/sec\nmajor page faults    7              # 0.001\/sec\nminor page faults    26680055       # 2156.302\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             957490729882   # 31.389 branches per 1000 inst\nbranch misses        13287236219    # 1.39% branch miss\nconditional          801168210108   # 26.264 conditional branches per 1000 inst\nindirect             23166096714    # 0.759 indirect branches per 1000 inst\ncpu-cycles           49761243136903 # 2.95 GHz\ninstructions         30637757626442 # 0.62 IPC low\nslots                99453847169220 #\nretiring             10179875202861 # 10.2% (12.1%) low\n-- ucode             2580640907     #     0.0%\n-- fastpath          10177294561954 #    10.2%\nfrontend             3462384584201  #  3.5% ( 4.1%) low\n-- latency           2663176605084  #     2.7%\n-- bandwidth         799207979117   #     0.8%\nbackend              70000221724395 # 70.4% (83.5%) high\n-- cpu               36281460856251 #    36.5%\n-- memory            33718760868144 #    33.9%\nspeculation          141920805302   #  0.1% ( 0.2%) low\n-- branch mispredict 124024703083   #     0.1%\n-- pipeline restart  17896102219    #     0.0%\nsmt-contention       15668164812391 # 15.8% ( 0.0%)\ncpu-cycles           49849520997432 # 2.96 GHz\ninstructions         30646476066015 # 0.61 IPC low\ninstructions         10204482310837 # 131.096 l2 access per 1000 inst\nl2 hit from l1       992671755236   # 12.84% l2 miss\nl2 miss from l1      56974474740    #\nl2 hit from l2 pf    230317005078   #\nl3 hit from l2 pf    74512435238    #\nl3 miss from l2 pf   40261901613    #\ninstructions         10210314170078 # 31.654 float per 1000 inst\nfloat 512            68             # 0.000 AVX-512 per 1000 inst\nfloat 256            15380490758    # 1.506 AVX-256 per 1000 inst\nfloat 128            307817367452   # 30.148 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         6              # 0.000 scalar per 1000 inst\ninstructions         2668409        #\nopcache              996007         # 373.259 opcache per 1000 inst\nopcache miss         537203         # 53.9% opcache miss rate\nl1 dTLB miss         4865           # 1.823 L1 dTLB per 1000 inst\nl2 dTLB miss         990            # 0.371 L2 dTLB per 1000 inst\ninstructions         2715789        #\nicache               1316573        # 484.785 icache per 1000 inst\nicache miss          112561         #  8.5% icache miss rate\nl1 iTLB miss         13             # 0.005 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            18             # 0.007 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              1337.941\non_cpu               0.494          # 7.91 \/ 16 cores\nutime                10366.642\nstime                215.507\nnvcsw                8160500        # 93.34%\nnivcsw               582164         # 6.66%\ninblock              148904         # 111.29\/sec\nonblock              1344           # 1.00\/sec\ncpu-clock            10557710888684 # 10557.711 seconds\ntask-clock           10563300378839 # 10563.300 seconds\npage faults          22965931       # 2174.125\/sec\ncontext switches     8749217        # 828.265\/sec\ncpu migrations       2136601        # 202.266\/sec\nmajor page faults    724            # 0.069\/sec\nminor page faults    22965207       # 2174.056\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             1798252958796  # 41.581 branches per 1000 inst\nbranch misses        8481393140     # 0.47% branch miss\nconditional          1798252972524  # 41.581 conditional branches per 1000 inst\nindirect             548815196792   # 12.690 indirect branches per 1000 inst\nslots                58533044377064 #\nretiring             23888560940405 # 40.8% (40.8%)\n-- ucode             1622686986715  #     2.8%\n-- fastpath          22265873953690 #    38.0%\nfrontend             7616630872137  # 13.0% (13.0%)\n-- latency           5497768921797  #     9.4%\n-- bandwidth         2118861950340  #     3.6%\nbackend              25098776551734 # 42.9% (42.9%)\n-- cpu               11916166002514 #    20.4%\n-- memory            13182610549220 #    22.5%\nspeculation          2054328069108  #  3.5% ( 3.5%)\n-- branch mispredict 1869991804975  #     3.2%\n-- pipeline restart  184336264133   #     0.3%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           31128010635019 # 1.46 GHz\ninstructions         57746411381473 # 1.86 IPC\nl2 access            1262468579018  # 48.640 l2 access per 1000 inst\nl2 miss              297475919726   # 23.56% l2 miss\n<\/code><\/pre>\n\n\n\n<p>Process overview shows mostly invocations of python<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>410 processes\n\t 54 python3              600264.00  9015.82\n\t 68 clinfo                  17.19     5.00\n\t 38 vulkaninfo               1.15     0.96\n\t  4 vulkani:disk$0           0.13     0.11\n\t  6 php                      0.11     0.23\n\t  6 glxinfo:gdrv0            0.08     0.09\n\t  6 glxinfo:gl0              0.08     0.09\n\t  2 llvmpipe-0               0.07     0.06\n\t  2 llvmpipe-1               0.07     0.06\n\t  2 llvmpipe-10              0.07     0.06\n\t  2 llvmpipe-11              0.07     0.06\n\t  2 llvmpipe-12              0.07     0.06\n\t  2 llvmpipe-13              0.07     0.06\n\t  2 llvmpipe-14              0.07     0.06\n\t  2 llvmpipe-15              0.07     0.06\n\t  2 llvmpipe-2               0.07     0.06\n\t  2 llvmpipe-3               0.07     0.06\n\t  2 llvmpipe-4               0.07     0.06\n\t  2 llvmpipe-5               0.07     0.06\n\t  2 llvmpipe-6               0.07     0.06\n\t  2 llvmpipe-7               0.07     0.06\n\t  2 llvmpipe-8               0.07     0.06\n\t  2 llvmpipe-9               0.07     0.06\n\t  2 glxinfo                  0.05     0.04\n\t  6 clang                    0.04     0.08\n\t  2 glxinfo:cs0              0.04     0.04\n\t  2 glxinfo:disk$0           0.04     0.04\n\t  2 glxinfo:sh0              0.04     0.04\n\t  2 glxinfo:shlo0            0.04     0.04\n\t  3 rocminfo                 0.03     0.00\n\t  1 lspci                    0.00     0.02\n\t 81 sh                       0.00     0.00\n\t 12 gcc                      0.00     0.00\n\t 11 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  4 uname                    0.00     0.00\n\t  3 gmain                    0.00     0.00\n\t  3 lscpu                    0.00     0.00\n\t  2 dconf worker             0.00     0.00\n\t  2 dmesg                    0.00     0.00\n\t  2 file                     0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 ai-benchmark             0.00     0.00\n\t  1 cat                      0.00     0.00\n\t  1 cc                       0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 ps                       0.00     0.00\n\t  1 python                   0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 sysctl                   0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n1 processes running\n61 maximum processes\n<\/code><\/pre>\n\n\n\n<p>An example computation block<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      206953) ai-benchmark     cpu=12 start=5.67  finish=1046.64\n        206954) python3          cpu=2 start=5.68  finish=1046.37\n          206955) python3          cpu=15 start=5.86  finish=1046.37\n          206956) python3          cpu=0 start=5.86  finish=1046.37\n          206957) python3          cpu=1 start=5.86  finish=1046.37\n          206958) python3          cpu=10 start=5.86  finish=1046.37\n          206959) python3          cpu=11 start=5.86  finish=1046.37\n          206960) python3          cpu=13 start=5.86  finish=1046.37\n          206961) python3          cpu=12 start=5.86  finish=1046.37\n          206962) python3          cpu=14 start=5.86  finish=1046.37\n          206963) python3          cpu=7 start=5.86  finish=1046.37\n          206964) python3          cpu=8 start=5.86  finish=1046.37\n          206965) python3          cpu=9 start=5.86  finish=1046.37\n          206966) python3          cpu=6 start=5.86  finish=1046.37\n          206967) python3          cpu=3 start=5.86  finish=1046.37\n          206968) python3          cpu=5 start=5.86  finish=1046.37\n          206969) python3          cpu=4 start=5.86  finish=1046.37\n          206970) file             cpu=7 start=6.72  finish=6.73 \n          206971) uname            cpu=1 start=6.73  finish=6.73 \n          206972) python3          cpu=15 start=6.73  finish=7.80 \n            206973) file             cpu=1 start=6.75  finish=6.76 \n            206974) uname            cpu=10 start=6.76  finish=6.76 \n            206975) cat              cpu=4 start=6.76  finish=6.76 \n            206976) lscpu            cpu=1 start=6.76  finish=6.77 \n            206977) sysctl           cpu=10 start=6.77  finish=6.77 \n            206978) dmesg            cpu=4 start=6.77  finish=6.78 \n            206979) python3          cpu=1 start=6.78  finish=7.79 \n              206981) ?? cpu=0 start=7.79  finish=0.00 \n          206982) python3          cpu=13 start=7.80  finish=1046.37\n          206983) python3          cpu=0 start=7.80  finish=1046.37\n          206984) python3          cpu=11 start=7.80  finish=1046.37\n          206985) python3          cpu=9 start=7.80  finish=1046.37\n          206986) python3          cpu=9 start=7.80  finish=1046.37\n          206987) python3          cpu=8 start=7.80  finish=1046.37\n          206988) python3          cpu=15 start=7.80  finish=1046.37\n          206989) python3          cpu=6 start=7.80  finish=1046.37\n          206990) python3          cpu=13 start=7.80  finish=1046.37\n          206991) python3          cpu=5 start=7.80  finish=1046.37\n          206992) python3          cpu=10 start=7.80  finish=1046.37\n          206993) python3          cpu=14 start=7.80  finish=1046.37\n          206994) python3          cpu=12 start=7.80  finish=1046.37\n          206995) python3          cpu=3 start=7.80  finish=1046.37\n          206996) python3          cpu=4 start=7.80  finish=1046.37\n          206997) python3          cpu=12 start=7.81  finish=1046.37\n          206998) python3          cpu=11 start=7.81  finish=7.81 \n          207000) python3          cpu=7 start=14.34 finish=1046.37\n          207001) python3          cpu=7 start=14.34 finish=1046.37\n          207002) python3          cpu=5 start=14.34 finish=1046.37\n          207003) python3          cpu=1 start=14.34 finish=1046.37\n          207004) python3          cpu=1 start=14.34 finish=1046.37\n          207005) python3          cpu=1 start=14.34 finish=1046.37\n          207006) python3          cpu=2 start=14.34 finish=1046.37\n          207007) python3          cpu=4 start=14.34 finish=1046.37\n          207008) python3          cpu=6 start=14.34 finish=1046.37\n          207009) python3          cpu=8 start=14.34 finish=1046.37\n          207010) python3          cpu=3 start=14.34 finish=1046.37\n          207011) python3          cpu=10 start=14.35 finish=1046.37\n          207012) python3          cpu=14 start=14.35 finish=1046.37\n          207013) python3          cpu=14 start=14.35 finish=1046.37\n          207014) python3          cpu=13 start=14.35 finish=1046.37\n          207015) python3          cpu=12 start=14.35 finish=1046.37\n          207016) python3          cpu=2 start=14.35 finish=1046.37\n          207017) python3          cpu=7 start=14.70 finish=14.70\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>A python library which uses tensorflow and measures various AI models. One test that reports both training and inference scores. The number of running processes seems to bounce to the number of cores and single threaded and then in between. <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/ai-benchmark\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-1320","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1320","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=1320"}],"version-history":[{"count":3,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1320\/revisions"}],"predecessor-version":[{"id":1342,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1320\/revisions\/1342"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=1320"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}