{"id":741,"date":"2024-01-20T12:55:16","date_gmt":"2024-01-20T12:55:16","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=741"},"modified":"2024-01-20T19:22:16","modified_gmt":"2024-01-20T19:22:16","slug":"vvenc","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/vvenc\/","title":{"rendered":"vvenc"},"content":{"rendered":"\n<p>A H.266 video encoder. with four test cases. Runs on all cores with each workload slightly different cpu busy profile.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-47.png\" alt=\"\" class=\"wp-image-775\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-47.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-47-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-47-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows moderate retirement rate limited more by frontend stalls than backend.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-85.png\" alt=\"\" class=\"wp-image-777\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-85.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-85-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-85-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              910.226\non_cpu               0.739          # 11.83 \/ 16 cores\nutime                10625.310\nstime                143.634\nnvcsw                1926068        # 82.44%\nnivcsw               410360         # 17.56%\ninblock              3896           # 4.28\/sec\nonblock              13280          # 14.59\/sec\ncpu-clock            10772646851924 # 10772.647 seconds\ntask-clock           10773516408795 # 10773.516 seconds\npage faults          22719430       # 2108.822\/sec\ncontext switches     2340781        # 217.272\/sec\ncpu migrations       164307         # 15.251\/sec\nmajor page faults    13             # 0.001\/sec\nminor page faults    22719417       # 2108.821\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             6857273267357  # 95.819 branches per 1000 inst\nbranch misses        95079953506    # 1.39% branch miss\nconditional          5898906790895  # 82.428 conditional branches per 1000 inst\nindirect             141469966782   # 1.977 indirect branches per 1000 inst\ncpu-cycles           41460858052026 # 2.81 GHz\ninstructions         71608739292263 # 1.73 IPC\nslots                82933942651956 #\nretiring             24177769564488 # 29.2% (40.2%)\n-- ucode             319921379423   #     0.4%\n-- fastpath          23857848185065 #    28.8%\nfrontend             11886006036051 # 14.3% (19.8%)\n-- latency           7575762751728  #     9.1%\n-- bandwidth         4310243284323  #     5.2%\nbackend              22821946777065 # 27.5% (38.0%)\n-- cpu               6695369364745  #     8.1%\n-- memory            16126577412320 #    19.4%\nspeculation          1202222269765  #  1.4% ( 2.0%)\n-- branch mispredict 1154699472637  #     1.4%\n-- pipeline restart  47522797128    #     0.1%\nsmt-contention       22845563644718 # 27.5% ( 0.0%)\ncpu-cycles           41405432666677 # 2.77 GHz\ninstructions         71526738202667 # 1.73 IPC\ninstructions         23847203050777 # 55.784 l2 access per 1000 inst\nl2 hit from l1       1000350502604  # 8.78% l2 miss\nl2 miss from l1      59618749154    #\nl2 hit from l2 pf    272733137303   #\nl3 hit from l2 pf    40268494255    #\nl3 miss from l2 pf   16941099985    #\ninstructions         23835385703310 # 178.948 float per 1000 inst\nfloat 512            92             # 0.000 AVX-512 per 1000 inst\nfloat 256            714            # 0.000 AVX-256 per 1000 inst\nfloat 128            4265301205354  # 178.948 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              1142.666\non_cpu               0.751          # 12.02 \/ 16 cores\nutime                13631.248\nstime                104.240\nnvcsw                1304868        # 75.71%\nnivcsw               418550         # 24.29%\ninblock              18795744       # 16449.03\/sec\nonblock              2040           # 1.79\/sec\ncpu-clock            13737027683049 # 13737.028 seconds\ntask-clock           13737572064843 # 13737.572 seconds\npage faults          22921326       # 1668.514\/sec\ncontext switches     1728942        # 125.855\/sec\ncpu migrations       216303         # 15.745\/sec\nmajor page faults    111            # 0.008\/sec\nminor page faults    22921215       # 1668.506\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             6920566559508  # 96.451 branches per 1000 inst\nbranch misses        92751466267    # 1.34% branch miss\nconditional          6920566578324  # 96.451 conditional branches per 1000 inst\nindirect             1600611418708  # 22.307 indirect branches per 1000 inst\nslots                70202024331338 #\nretiring             39826875497438 # 56.7% (56.7%)\n-- ucode             3276105188693  #     4.7%\n-- fastpath          36550770308745 #    52.1%\nfrontend             14572791682931 # 20.8% (20.8%)\n-- latency           7132152801138  #    10.2%\n-- bandwidth         7440638881793  #    10.6%\nbackend              10333613723682 # 14.7% (14.7%)\n-- cpu               5828042496726  #     8.3%\n-- memory            4505571226956  #     6.4%\nspeculation          5686879274637  #  8.1% ( 8.1%)\n-- branch mispredict 5429543298347  #     7.7%\n-- pipeline restart  257335976290   #     0.4%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           42320701718846 # 2.31 GHz\ninstructions         83198283723517 # 1.97 IPC\nl2 access            1709736624260  # 41.157 l2 access per 1000 inst\nl2 miss              252358065728   # 14.76% l2 miss\n<\/code><\/pre>\n\n\n\n<p>Process overview<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>564 processes\n\t204 vvencapp             180330.99  2203.52\n\t 68 clinfo                  17.86     6.32\n\t 38 vulkaninfo               0.93     1.33\n\t  6 glxinfo:gdrv0            0.13     0.10\n\t  6 php                      0.11     0.18\n\t  4 vulkani:disk$0           0.10     0.14\n\t  2 glxinfo                  0.07     0.05\n\t  2 glxinfo:cs0              0.07     0.05\n\t  2 glxinfo:disk$0           0.07     0.05\n\t  2 glxinfo:sh0              0.07     0.04\n\t  2 glxinfo:shlo0            0.07     0.04\n\t  6 clang                    0.05     0.07\n\t  2 llvmpipe-0               0.05     0.07\n\t  2 llvmpipe-1               0.05     0.07\n\t  2 llvmpipe-10              0.05     0.07\n\t  2 llvmpipe-11              0.05     0.07\n\t  2 llvmpipe-12              0.05     0.07\n\t  2 llvmpipe-13              0.05     0.07\n\t  2 llvmpipe-14              0.05     0.07\n\t  2 llvmpipe-15              0.05     0.07\n\t  2 llvmpipe-2               0.05     0.07\n\t  2 llvmpipe-3               0.05     0.07\n\t  2 llvmpipe-4               0.05     0.07\n\t  2 llvmpipe-5               0.05     0.07\n\t  2 llvmpipe-6               0.05     0.07\n\t  2 llvmpipe-7               0.05     0.07\n\t  2 llvmpipe-8               0.05     0.07\n\t  2 llvmpipe-9               0.05     0.07\n\t  3 rocminfo                 0.03     0.00\n\t  1 lspci                    0.01     0.01\n\t  1 ps                       0.00     0.01\n\t 88 sh                       0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t 12 vvenc                    0.00     0.00\n\t 11 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  4 gmain                    0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dconf worker             0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Process computation is straightforward with one process on each core.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      2906507) vvenc            cpu=11 start=6.19  finish=148.80\n        2906508) vvencapp         cpu=5 start=6.19  finish=148.80\n          2906509) vvencapp         cpu=0 start=6.20  finish=148.54\n          2906510) vvencapp         cpu=9 start=6.20  finish=148.54\n          2906511) vvencapp         cpu=14 start=6.20  finish=148.54\n          2906512) vvencapp         cpu=12 start=6.20  finish=148.54\n          2906513) vvencapp         cpu=3 start=6.20  finish=148.54\n          2906514) vvencapp         cpu=8 start=6.20  finish=148.54\n          2906515) vvencapp         cpu=8 start=6.20  finish=148.54\n          2906516) vvencapp         cpu=2 start=6.20  finish=148.54\n          2906517) vvencapp         cpu=10 start=6.20  finish=148.54\n          2906518) vvencapp         cpu=1 start=6.20  finish=148.54\n          2906519) vvencapp         cpu=13 start=6.20  finish=148.54\n          2906520) vvencapp         cpu=4 start=6.20  finish=148.54\n          2906521) vvencapp         cpu=11 start=6.20  finish=148.54\n          2906522) vvencapp         cpu=7 start=6.20  finish=148.54\n          2906523) vvencapp         cpu=15 start=6.20  finish=148.54\n          2906524) vvencapp         cpu=12 start=6.20  finish=148.54\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>A H.266 video encoder. with four test cases. Runs on all cores with each workload slightly different cpu busy profile. Topdown profile shows moderate retirement rate limited more by frontend stalls than backend. AMD metrics Intel metrics Process overview Process <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/vvenc\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-741","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/741","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=741"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/741\/revisions"}],"predecessor-version":[{"id":778,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/741\/revisions\/778"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=741"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}