{"id":814,"date":"2024-01-22T11:00:12","date_gmt":"2024-01-22T11:00:12","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=814"},"modified":"2024-01-24T00:06:35","modified_gmt":"2024-01-24T00:06:35","slug":"rav1e","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/rav1e\/","title":{"rendered":"rav1e"},"content":{"rendered":"\n<p>Rust written AV1 video encoder test. There are four workloads at various speed settings. These also seem to vary on how much on-cpu time is spent.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-55.png\" alt=\"\" class=\"wp-image-843\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-55.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-55-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-55-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows a consistent retirement rate with backend stalls higher in the last and frontend stalls higher in the first.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-93.png\" alt=\"\" class=\"wp-image-845\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-93.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-93-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-93-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show on average six cores. The is some floating point and not much L2 activity.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              680.283\non_cpu               0.405          # 6.48 \/ 16 cores\nutime                4380.077\nstime                29.750\nnvcsw                529699         # 94.26%\nnivcsw               32249          # 5.74%\ninblock              0              # 0.00\/sec\nonblock              15808          # 23.24\/sec\ncpu-clock            4406070896173  # 4406.071 seconds\ntask-clock           4406708412361  # 4406.708 seconds\npage faults          6123612        # 1389.611\/sec\ncontext switches     565143         # 128.246\/sec\ncpu migrations       7223           # 1.639\/sec\nmajor page faults    2              # 0.000\/sec\nminor page faults    6123610        # 1389.611\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             4662711827592  # 111.593 branches per 1000 inst\nbranch misses        52847581667    # 1.13% branch miss\nconditional          3741973573338  # 89.557 conditional branches per 1000 inst\nindirect             184899584708   # 4.425 indirect branches per 1000 inst\ncpu-cycles           17439787503556 # 1.60 GHz\ninstructions         41786711974119 # 2.40 IPC\nslots                34865080874874 #\nretiring             13743579233199 # 39.4% (51.7%)\n-- ucode             30766579480    #     0.1%\n-- fastpath          13712812653719 #    39.3%\nfrontend             6301204183817  # 18.1% (23.7%)\n-- latency           3991486122378  #    11.4%\n-- bandwidth         2309718061439  #     6.6%\nbackend              5757168999901  # 16.5% (21.6%)\n-- cpu               1810773177827  #     5.2%\n-- memory            3946395822074  #    11.3%\nspeculation          804632596029   #  2.3% ( 3.0%)\n-- branch mispredict 783402978854   #     2.2%\n-- pipeline restart  21229617175    #     0.1%\nsmt-contention       8258433048876  # 23.7% ( 0.0%)\ncpu-cycles           17436364491114 # 1.60 GHz\ninstructions         41772250512279 # 2.40 IPC\ninstructions         13931405364622 # 30.075 l2 access per 1000 inst\nl2 hit from l1       399942908773   # 2.77% l2 miss\nl2 miss from l1      4259235520     #\nl2 hit from l2 pf    11699912704    #\nl3 hit from l2 pf    4059614458     #\nl3 miss from l2 pf   3278519298     #\ninstructions         13924366621450 # 107.589 float per 1000 inst\nfloat 512            70             # 0.000 AVX-512 per 1000 inst\nfloat 256            534            # 0.000 AVX-256 per 1000 inst\nfloat 128            1498110268831  # 107.589 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              856.175\non_cpu               0.403          # 6.45 \/ 16 cores\nutime                5497.713\nstime                22.929\nnvcsw                539159         # 89.84%\nnivcsw               60943          # 10.16%\ninblock              664            # 0.78\/sec\nonblock              4800           # 5.61\/sec\ncpu-clock            5513430812677  # 5513.431 seconds\ntask-clock           5514030198429  # 5514.030 seconds\npage faults          6400742        # 1160.810\/sec\ncontext switches     604186         # 109.572\/sec\ncpu migrations       45126          # 8.184\/sec\nmajor page faults    5              # 0.001\/sec\nminor page faults    6400733        # 1160.808\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             4723772752112  # 108.408 branches per 1000 inst\nbranch misses        55866941113    # 1.18% branch miss\nconditional          4723772771216  # 108.408 conditional branches per 1000 inst\nindirect             979987448967   # 22.490 indirect branches per 1000 inst\nslots                42426994010144 #\nretiring             25181587491963 # 59.4% (59.4%)\n-- ucode             1276993347564  #     3.0%\n-- fastpath          23904594144399 #    56.3%\nfrontend             9465887818390  # 22.3% (22.3%)\n-- latency           3321055045324  #     7.8%\n-- bandwidth         6144832773066  #    14.5%\nbackend              4043327042134  #  9.5% ( 9.5%)\n-- cpu               2743315610137  #     6.5%\n-- memory            1300011431997  #     3.1%\nspeculation          3778862180966  #  8.9% ( 8.9%)\n-- branch mispredict 3656643627421  #     8.6%\n-- pipeline restart  122218553545   #     0.3%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           19850940272366 # 1.44 GHz\ninstructions         52275803783004 # 2.63 IPC\nl2 access            629155938709   # 24.142 l2 access per 1000 inst\nl2 miss              42062422802    # 6.69% l2 miss\n<\/code><\/pre>\n\n\n\n<p>Process overview shows time spent in rav1e processes.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>573 processes\n\t212 rav1e                73556.31   386.72\n\t 68 clinfo                  15.47     6.98\n\t 38 vulkaninfo               1.12     1.14\n\t  6 glxinfo:gdrv0            0.17     0.04\n\t  4 vulkani:disk$0           0.11     0.12\n\t  6 php                      0.10     0.14\n\t  2 glxinfo                  0.08     0.02\n\t  2 glxinfo:cs0              0.08     0.02\n\t  2 glxinfo:disk$0           0.08     0.02\n\t  2 glxinfo:sh0              0.08     0.02\n\t  2 glxinfo:shlo0            0.08     0.02\n\t  2 llvmpipe-0               0.06     0.06\n\t  2 llvmpipe-1               0.06     0.06\n\t  2 llvmpipe-10              0.06     0.06\n\t  2 llvmpipe-11              0.06     0.06\n\t  2 llvmpipe-12              0.06     0.06\n\t  2 llvmpipe-13              0.06     0.06\n\t  2 llvmpipe-14              0.06     0.06\n\t  2 llvmpipe-15              0.06     0.06\n\t  2 llvmpipe-2               0.06     0.06\n\t  2 llvmpipe-3               0.06     0.06\n\t  2 llvmpipe-4               0.06     0.06\n\t  2 llvmpipe-5               0.06     0.06\n\t  2 llvmpipe-6               0.06     0.06\n\t  2 llvmpipe-7               0.06     0.06\n\t  2 llvmpipe-8               0.06     0.06\n\t  2 llvmpipe-9               0.06     0.06\n\t  6 clang                    0.03     0.09\n\t  3 rocminfo                 0.03     0.00\n\t  1 lspci                    0.00     0.02\n\t 87 sh                       0.00     0.00\n\t 12 gcc                      0.00     0.00\n\t 12 tr                       0.00     0.00\n\t 10 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  4 gmain                    0.00     0.00\n\t  2 dconf worker             0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 cc                       0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 ps                       0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n4 processes running\n51 maximum processes\n<\/code><\/pre>\n\n\n\n<p>These seem to be started on all CPUs<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      157481) rav1e            cpu=11 start=92.26 finish=175.52\n        157482) rav1e            cpu=7 start=92.26 finish=175.46\n          157483) rav1e            cpu=15 start=92.27 finish=175.46\n          157484) rav1e            cpu=12 start=92.27 finish=175.46\n          157485) rav1e            cpu=5 start=92.27 finish=175.47\n          157486) rav1e            cpu=13 start=92.27 finish=175.47\n          157487) rav1e            cpu=0 start=92.27 finish=175.47\n          157488) rav1e            cpu=14 start=92.27 finish=175.47\n          157489) rav1e            cpu=11 start=92.27 finish=175.47\n          157490) rav1e            cpu=2 start=92.27 finish=175.47\n          157491) rav1e            cpu=9 start=92.27 finish=175.46\n          157492) rav1e            cpu=10 start=92.27 finish=175.47\n          157493) rav1e            cpu=12 start=92.27 finish=175.47\n          157494) rav1e            cpu=4 start=92.27 finish=175.47\n          157495) rav1e            cpu=6 start=92.27 finish=175.47\n          157496) rav1e            cpu=3 start=92.27 finish=175.46\n          157497) rav1e            cpu=3 start=92.27 finish=175.47\n          157498) rav1e            cpu=8 start=92.27 finish=175.47\n        157500) tr               cpu=4 start=175.52 finish=175.52\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Rust written AV1 video encoder test. There are four workloads at various speed settings. These also seem to vary on how much on-cpu time is spent. Topdown profile shows a consistent retirement rate with backend stalls higher in the last <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/rav1e\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-814","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/814","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=814"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/814\/revisions"}],"predecessor-version":[{"id":846,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/814\/revisions\/846"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=814"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}