{"id":1167,"date":"2024-01-31T11:50:39","date_gmt":"2024-01-31T11:50:39","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=1167"},"modified":"2024-02-01T01:05:57","modified_gmt":"2024-02-01T01:05:57","slug":"dav1d","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/dav1d\/","title":{"rendered":"dav1d"},"content":{"rendered":"\n<p>Dav1d is an AV1 video decoder. The test profile decodes four test cases. Looks like the second is the most parallel and the others are more varied.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime.png\" alt=\"\" class=\"wp-image-1208\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows variation among the four workloads with backend stalls being the highest.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown.png\" alt=\"\" class=\"wp-image-1210\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show moderate amount of floating point and L2 access. Backend memory stalls are almost a third.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              213.115\non_cpu               0.549          # 8.78 \/ 16 cores\nutime                1805.063\nstime                66.539\nnvcsw                7292893        # 99.54%\nnivcsw               33533          # 0.46%\ninblock              8              # 0.04\/sec\nonblock              32120          # 150.72\/sec\ncpu-clock            1868875848465  # 1868.876 seconds\ntask-clock           1871235081315  # 1871.235 seconds\npage faults          1296792        # 693.014\/sec\ncontext switches     7327292        # 3915.752\/sec\ncpu migrations       259220         # 138.529\/sec\nmajor page faults    114            # 0.061\/sec\nminor page faults    1296678        # 692.953\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             797429608804   # 91.153 branches per 1000 inst\nbranch misses        31051314265    # 3.89% branch miss\nconditional          554521597992   # 63.387 conditional branches per 1000 inst\nindirect             31801373582    # 3.635 indirect branches per 1000 inst\ncpu-cycles           7341843796700  # 2.16 GHz\ninstructions         8777978628368  # 1.20 IPC\nslots                14651540873352 #\nretiring             3052095062857  # 20.8% (24.6%)\n-- ucode             9469121995     #     0.1%\n-- fastpath          3042625940862  #    20.8%\nfrontend             2509044102042  # 17.1% (20.2%)\n-- latency           1787348551950  #    12.2%\n-- bandwidth         721695550092   #     4.9%\nbackend              6416392439737  # 43.8% (51.8%)\n-- cpu               1756022735988  #    12.0%\n-- memory            4660369703749  #    31.8%\nspeculation          412129118255   #  2.8% ( 3.3%)\n-- branch mispredict 400255856656   #     2.7%\n-- pipeline restart  11873261599    #     0.1%\nsmt-contention       2260744776351  # 15.4% ( 0.0%)\ncpu-cycles           7322460098899  # 2.16 GHz\ninstructions         8771050990318  # 1.20 IPC\ninstructions         2918023070070  # 42.342 l2 access per 1000 inst\nl2 hit from l1       98802700440    # 20.41% l2 miss\nl2 miss from l1      16287763107    #\nl2 hit from l2 pf    15816267300    #\nl3 hit from l2 pf    5099496907     #\nl3 miss from l2 pf   3835063557     #\ninstructions         2907823574044  # 78.422 float per 1000 inst\nfloat 512            71             # 0.000 AVX-512 per 1000 inst\nfloat 256            780909299      # 0.269 AVX-256 per 1000 inst\nfloat 128            227256236913   # 78.153 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              310.147\non_cpu               0.595          # 9.52 \/ 16 cores\nutime                2887.159\nstime                64.256\nnvcsw                7951545        # 98.51%\nnivcsw               120051         # 1.49%\ninblock              1128           # 3.64\/sec\nonblock              21720          # 70.03\/sec\ncpu-clock            2944930192235  # 2944.930 seconds\ntask-clock           2947005189952  # 2947.005 seconds\npage faults          1327781        # 450.553\/sec\ncontext switches     8072952        # 2739.375\/sec\ncpu migrations       462799         # 157.040\/sec\nmajor page faults    51             # 0.017\/sec\nminor page faults    1327730        # 450.535\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             953131786197   # 82.134 branches per 1000 inst\nbranch misses        36337085889    # 3.81% branch miss\nconditional          953131806421   # 82.134 conditional branches per 1000 inst\nindirect             291031404595   # 25.079 indirect branches per 1000 inst\nslots                14767434767564 #\nretiring             6255263054310  # 42.4% (42.4%)\n-- ucode             555928141851   #     3.8%\n-- fastpath          5699334912459  #    38.6%\nfrontend             3164173402768  # 21.4% (21.4%)\n-- latency           1771171686807  #    12.0%\n-- bandwidth         1393001715961  #     9.4%\nbackend              3663495894120  # 24.8% (24.8%)\n-- cpu               1313061620123  #     8.9%\n-- memory            2350434273997  #    15.9%\nspeculation          1538479772866  # 10.4% (10.4%) high\n-- branch mispredict 1426065334132  #     9.7%\n-- pipeline restart  112414438734   #     0.8%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           10440337596098 # 1.94 GHz\ninstructions         14928314665212 # 1.43 IPC\nl2 access            244374955407   # 35.293 l2 access per 1000 inst\nl2 miss              70966808581    # 29.04% l2 miss\n<\/code><\/pre>\n\n\n\n<p>Process overview shows dav1d and dav1d-worker as taking the most time.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>570 processes\n\t192 dav1d-worker         28840.16   673.60\n\t 24 dav1d                 1802.51    42.24\n\t 68 clinfo                  17.19     5.66\n\t 38 vulkaninfo               0.95     1.15\n\t  6 php                      0.62     0.13\n\t  4 vulkani:disk$0           0.11     0.13\n\t  6 glxinfo:gdrv0            0.11     0.06\n\t  6 glxinfo:gl0              0.11     0.06\n\t  2 llvmpipe-0               0.06     0.07\n\t  2 llvmpipe-1               0.06     0.07\n\t  2 llvmpipe-10              0.06     0.07\n\t  2 llvmpipe-11              0.06     0.07\n\t  2 llvmpipe-12              0.06     0.07\n\t  2 llvmpipe-13              0.06     0.07\n\t  2 llvmpipe-14              0.06     0.07\n\t  2 llvmpipe-15              0.06     0.07\n\t  2 llvmpipe-2               0.06     0.07\n\t  2 llvmpipe-3               0.06     0.07\n\t  2 llvmpipe-4               0.06     0.07\n\t  2 llvmpipe-5               0.06     0.07\n\t  2 llvmpipe-6               0.06     0.07\n\t  2 llvmpipe-7               0.06     0.07\n\t  2 llvmpipe-8               0.06     0.07\n\t  2 llvmpipe-9               0.06     0.07\n\t  2 glxinfo                  0.06     0.03\n\t  2 glxinfo:cs0              0.06     0.03\n\t  2 glxinfo:disk$0           0.06     0.03\n\t  2 glxinfo:sh0              0.06     0.03\n\t  2 glxinfo:shlo0            0.06     0.03\n\t  6 clang                    0.05     0.07\n\t  3 rocminfo                 0.03     0.00\n\t  1 lspci                    0.00     0.02\n\t  1 ps                       0.00     0.01\n\t 88 sh                       0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t 12 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  3 gmain                    0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dconf worker             0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Computation blocks are regular<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      2637319) dav1d            cpu=0 start=5.52  finish=18.44\n        2637320) dav1d            cpu=2 start=5.52  finish=18.44\n          2637321) dav1d-worker     cpu=4 start=5.53  finish=18.44\n          2637322) dav1d-worker     cpu=10 start=5.53  finish=18.44\n          2637323) dav1d-worker     cpu=2 start=5.53  finish=18.44\n          2637324) dav1d-worker     cpu=5 start=5.53  finish=18.44\n          2637325) dav1d-worker     cpu=8 start=5.53  finish=18.43\n          2637326) dav1d-worker     cpu=12 start=5.53  finish=18.43\n          2637327) dav1d-worker     cpu=11 start=5.53  finish=18.43\n          2637328) dav1d-worker     cpu=9 start=5.53  finish=18.43\n          2637329) dav1d-worker     cpu=6 start=5.53  finish=18.43\n          2637330) dav1d-worker     cpu=3 start=5.53  finish=18.43\n          2637331) dav1d-worker     cpu=1 start=5.53  finish=18.43\n          2637332) dav1d-worker     cpu=15 start=5.53  finish=18.43\n          2637333) dav1d-worker     cpu=0 start=5.54  finish=18.43\n          2637334) dav1d-worker     cpu=14 start=5.54  finish=18.43\n          2637335) dav1d-worker     cpu=15 start=5.54  finish=18.43\n          2637336) dav1d-worker     cpu=13 start=5.54  finish=18.43\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Dav1d is an AV1 video decoder. The test profile decodes four test cases. Looks like the second is the most parallel and the others are more varied. Topdown profile shows variation among the four workloads with backend stalls being the <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/dav1d\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-1167","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1167","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=1167"}],"version-history":[{"count":3,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1167\/revisions"}],"predecessor-version":[{"id":1211,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1167\/revisions\/1211"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=1167"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}