{"id":788,"date":"2024-01-21T13:52:05","date_gmt":"2024-01-21T13:52:05","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=788"},"modified":"2024-01-21T13:52:06","modified_gmt":"2024-01-21T13:52:06","slug":"aom-av1","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/aom-av1\/","title":{"rendered":"aom-av1"},"content":{"rendered":"\n<p>Test of a media encoder for AV1 format using 16 different test cases. These seem to vary on number of runnable processes as well as how busy the CPU cores are kept.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-49.png\" alt=\"\" class=\"wp-image-789\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-49.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-49-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-49-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profiles show occasional frontend stalls but more dominated by backend memory stalls and a mid-level retirement<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-87.png\" alt=\"\" class=\"wp-image-790\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-87.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-87-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-87-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show on average running on half the cores.  Some floating point and not very many branches.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              2276.295\non_cpu               0.493          # 7.88 \/ 16 cores\nutime                17665.690\nstime                280.753\nnvcsw                14114020       # 98.83%\nnivcsw               167320         # 1.17%\ninblock              0              # 0.00\/sec\nonblock              201704         # 88.61\/sec\ncpu-clock            17919079774968 # 17919.080 seconds\ntask-clock           17926988199725 # 17926.988 seconds\npage faults          43234643       # 2411.707\/sec\ncontext switches     14292008       # 797.234\/sec\ncpu migrations       34673          # 1.934\/sec\nmajor page faults    799            # 0.045\/sec\nminor page faults    43233844       # 2411.662\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             7098286347416  # 67.726 branches per 1000 inst\nbranch misses        104143888828   # 1.47% branch miss\nconditional          5790064231135  # 55.244 conditional branches per 1000 inst\nindirect             204125042808   # 1.948 indirect branches per 1000 inst\ncpu-cycles           69273323037251 # 1.79 GHz\ninstructions         109171124055305 # 1.58 IPC\nslots                138509791403472 #\nretiring             36282593854356 # 26.2% (35.2%)\n-- ucode             64945263255    #     0.0%\n-- fastpath          36217648591101 #    26.1%\nfrontend             12898774713475 #  9.3% (12.5%)\n-- latency           8858202001362  #     6.4%\n-- bandwidth         4040572712113  #     2.9%\nbackend              52192703052128 # 37.7% (50.6%)\n-- cpu               15916741610982 #    11.5%\n-- memory            36275961441146 #    26.2%\nspeculation          1744351594826  #  1.3% ( 1.7%)\n-- branch mispredict 1684874499188  #     1.2%\n-- pipeline restart  59477095638    #     0.0%\nsmt-contention       35389948976400 # 25.6% ( 0.0%)\ncpu-cycles           66976690835536 # 1.77 GHz\ninstructions         106537986397909 # 1.59 IPC\ninstructions         35497402711557 # 71.862 l2 access per 1000 inst\nl2 hit from l1       1983397983497  # 9.58% l2 miss\nl2 miss from l1      130076666250   #\nl2 hit from l2 pf    453115676932   #\nl3 hit from l2 pf    79419555362    #\nl3 miss from l2 pf   34973391671    #\ninstructions         35484242372575 # 113.131 float per 1000 inst\nfloat 512            180            # 0.000 AVX-512 per 1000 inst\nfloat 256            588            # 0.000 AVX-256 per 1000 inst\nfloat 128            4014364969945  # 113.131 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         61848          # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              3545.058\non_cpu               0.597          # 9.54 \/ 16 cores\nutime                33458.439\nstime                378.988\nnvcsw                29791383       # 98.64%\nnivcsw               410241         # 1.36%\ninblock              8928           # 2.52\/sec\nonblock              159936         # 45.12\/sec\ncpu-clock            33776356152864 # 33776.356 seconds\ntask-clock           33787174021112 # 33787.174 seconds\npage faults          46536742       # 1377.349\/sec\ncontext switches     30218803       # 894.387\/sec\ncpu migrations       435720         # 12.896\/sec\nmajor page faults    308            # 0.009\/sec\nminor page faults    46536434       # 1377.340\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             9846368555244  # 61.950 branches per 1000 inst\nbranch misses        134683159152   # 1.37% branch miss\nconditional          9846368644140  # 61.950 conditional branches per 1000 inst\nindirect             2925897507424  # 18.409 indirect branches per 1000 inst\nslots                198700353218444 #\nretiring             111274731064663 # 56.0% (56.0%)\n-- ucode             8502464763869  #     4.3%\n-- fastpath          102772266300794 #    51.7%\nfrontend             31289436066663 # 15.7% (15.7%)\n-- latency           19819385118484 #    10.0%\n-- bandwidth         11470050948179 #     5.8%\nbackend              47506138189700 # 23.9% (23.9%)\n-- cpu               23218262742886 #    11.7%\n-- memory            24287875446814 #    12.2%\nspeculation          11676409983912 #  5.9% ( 5.9%)\n-- branch mispredict 11341721304326 #     5.7%\n-- pipeline restart  334688679586   #     0.2%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           118239467131762 # 1.75 GHz\ninstructions         222294493255214 # 1.88 IPC\nl2 access            6755722248527  # 60.866 l2 access per 1000 inst\nl2 miss              1024804494949  # 15.17% l2 miss\n<\/code><\/pre>\n\n\n\n<p>Process overview<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>2409 processes\n\t 93 aomenc               16959.58   203.65\n\t1665 aom enc worker         117.00     0.00\n\t 68 clinfo                  16.53     6.08\n\t 38 vulkaninfo               0.95     1.33\n\t  6 php                      0.24     0.73\n\t  6 glxinfo:gdrv0            0.11     0.10\n\t  4 vulkani:disk$0           0.10     0.14\n\t  2 llvmpipe-0               0.05     0.07\n\t  2 llvmpipe-1               0.05     0.07\n\t  2 llvmpipe-10              0.05     0.07\n\t  2 llvmpipe-11              0.05     0.07\n\t  2 llvmpipe-12              0.05     0.07\n\t  2 llvmpipe-13              0.05     0.07\n\t  2 llvmpipe-14              0.05     0.07\n\t  2 llvmpipe-15              0.05     0.07\n\t  2 llvmpipe-2               0.05     0.07\n\t  2 llvmpipe-3               0.05     0.07\n\t  2 llvmpipe-4               0.05     0.07\n\t  2 llvmpipe-5               0.05     0.07\n\t  2 llvmpipe-6               0.05     0.07\n\t  2 llvmpipe-7               0.05     0.07\n\t  2 llvmpipe-8               0.05     0.07\n\t  2 llvmpipe-9               0.05     0.07\n\t  6 clang                    0.05     0.06\n\t  2 glxinfo                  0.05     0.04\n\t  2 glxinfo:cs0              0.05     0.04\n\t  2 glxinfo:disk$0           0.05     0.04\n\t  2 glxinfo:sh0              0.05     0.04\n\t  2 glxinfo:shlo0            0.05     0.04\n\t  3 rocminfo                 0.03     0.00\n\t  1 lspci                    0.00     0.02\n\t  1 ps                       0.00     0.01\n\t112 sh                       0.00     0.00\n\t 94 sed                      0.00     0.00\n\t 93 aom-av1                  0.00     0.00\n\t 93 rm                       0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t 10 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 gmain                    0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dconf worker             0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Computation structures seem to be set on one CPU at least as far as the last CPU run on?<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      34201) aom-av1          cpu=3 start=5.69  finish=105.89\n        34202) aomenc           cpu=5 start=5.69  finish=105.81\n          34203) aom enc worker   cpu=0 start=5.96  finish=6.87 \n          34204) aom enc worker   cpu=0 start=5.96  finish=6.87 \n          34205) aom enc worker   cpu=0 start=5.96  finish=6.87 \n          34206) aom enc worker   cpu=0 start=5.96  finish=6.87 \n          34207) aom enc worker   cpu=0 start=5.96  finish=6.87 \n          34208) aom enc worker   cpu=0 start=5.96  finish=6.87 \n          34209) aom enc worker   cpu=0 start=5.96  finish=6.87 \n          34210) aom enc worker   cpu=0 start=5.96  finish=6.87 \n          34211) aom enc worker   cpu=0 start=5.96  finish=6.87 \n          34212) aom enc worker   cpu=0 start=5.96  finish=6.87 \n          34213) aom enc worker   cpu=0 start=5.96  finish=6.87 \n          34214) aom enc worker   cpu=0 start=5.96  finish=6.87 \n          34215) aom enc worker   cpu=0 start=5.96  finish=6.87 \n          34216) aom enc worker   cpu=0 start=5.96  finish=6.87 \n          34217) aom enc worker   cpu=0 start=5.96  finish=6.87 \n          34218) aom enc worker   cpu=0 start=7.07  finish=105.80\n          34219) aom enc worker   cpu=0 start=7.07  finish=105.80\n          34220) aom enc worker   cpu=0 start=7.07  finish=105.80\n          34221) aom enc worker   cpu=0 start=7.07  finish=105.80\n          34222) aom enc worker   cpu=0 start=7.07  finish=105.80\n          34223) aom enc worker   cpu=0 start=7.07  finish=105.81\n          34224) aom enc worker   cpu=0 start=7.07  finish=105.81\n          34225) aom enc worker   cpu=0 start=7.08  finish=105.81\n          34226) aom enc worker   cpu=0 start=7.08  finish=105.81\n          34227) aom enc worker   cpu=0 start=7.08  finish=105.81\n          34228) aom enc worker   cpu=0 start=7.08  finish=105.81\n          34229) aom enc worker   cpu=0 start=7.08  finish=105.81\n          34230) aom enc worker   cpu=0 start=7.08  finish=105.81\n          34231) aom enc worker   cpu=0 start=7.08  finish=105.81\n          34232) aom enc worker   cpu=0 start=7.08  finish=105.81\n        34236) sed              cpu=4 start=105.88 finish=105.89\n        34237) rm               cpu=14 start=105.89 finish=105.89\n<\/code><\/pre>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Test of a media encoder for AV1 format using 16 different test cases. These seem to vary on number of runnable processes as well as how busy the CPU cores are kept. Topdown profiles show occasional frontend stalls but more <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/aom-av1\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-788","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/788","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=788"}],"version-history":[{"count":1,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/788\/revisions"}],"predecessor-version":[{"id":791,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/788\/revisions\/791"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=788"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}