{"id":1365,"date":"2024-02-03T15:15:21","date_gmt":"2024-02-03T15:15:21","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=1365"},"modified":"2024-02-03T20:32:00","modified_gmt":"2024-02-03T20:32:00","slug":"vpxenc","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/vpxenc\/","title":{"rendered":"vpxenc"},"content":{"rendered":"\n<p>Video encoding using the Google libvpx library. There are four workloads, two for each of two speed levels and then for 4K and 1080p decoding. Looks like variable numbers of processes though one per physical core.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-20.png\" alt=\"\" class=\"wp-image-1396\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-20.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-20-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-20-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profil has slight differences in workloads, though fairly high retirement rate with backend stalls being the largest limiter.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-20.png\" alt=\"\" class=\"wp-image-1398\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-20.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-20-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-20-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show floating point code with some memory bound stalls. There are few branches, though still branch misprediction.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              623.409\non_cpu               0.311          # 4.97 \/ 16 cores\nutime                3048.837\nstime                48.411\nnvcsw                3637285        # 99.86%\nnivcsw               5119           # 0.14%\ninblock              0              # 0.00\/sec\nonblock              15512          # 24.88\/sec\ncpu-clock            3089203964534  # 3089.204 seconds\ntask-clock           3091077405978  # 3091.077 seconds\npage faults          2123880        # 687.100\/sec\ncontext switches     3645325        # 1179.306\/sec\ncpu migrations       3420           # 1.106\/sec\nmajor page faults    2              # 0.001\/sec\nminor page faults    2123878        # 687.100\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             1721188727239  # 52.252 branches per 1000 inst\nbranch misses        26850172244    # 1.56% branch miss\nconditional          1385965910144  # 42.075 conditional branches per 1000 inst\nindirect             56654184709    # 1.720 indirect branches per 1000 inst\ncpu-cycles           12148882895870 # 1.24 GHz\ninstructions         32948907414886 # 2.71 IPC\nslots                24279285092832 #\nretiring             11013916606615 # 45.4% (45.6%)\n-- ucode             20225750600    #     0.1%\n-- fastpath          10993690856015 #    45.3%\nfrontend             3545739782724  # 14.6% (14.7%)\n-- latency           1451506674132  #     6.0%\n-- bandwidth         2094233108592  #     8.6%\nbackend              8701349022251  # 35.8% (36.0%)\n-- cpu               3205982477495  #    13.2%\n-- memory            5495366544756  #    22.6%\nspeculation          891925444057   #  3.7% ( 3.7%)\n-- branch mispredict 844250791145   #     3.5%\n-- pipeline restart  47674652912    #     0.2%\nsmt-contention       126338239440   #  0.5% ( 0.0%)\ncpu-cycles           12134150872890 # 1.24 GHz\ninstructions         32940470679251 # 2.71 IPC\ninstructions         10979290279131 # 40.403 l2 access per 1000 inst\nl2 hit from l1       383961375679   # 16.06% l2 miss\nl2 miss from l1      52653668633    #\nl2 hit from l2 pf    41052024947    #\nl3 hit from l2 pf    14381078758    #\nl3 miss from l2 pf   4198549610     #\ninstructions         10981114496824 # 243.225 float per 1000 inst\nfloat 512            58             # 0.000 AVX-512 per 1000 inst\nfloat 256            514            # 0.000 AVX-256 per 1000 inst\nfloat 128            2670880984964  # 243.225 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         2469           # 0.000 scalar per 1000 inst\ninstructions         2686249        #\nopcache              993457         # 369.831 opcache per 1000 inst\nopcache miss         533742         # 53.7% opcache miss rate\nl1 dTLB miss         6511           # 2.424 L1 dTLB per 1000 inst\nl2 dTLB miss         1193           # 0.444 L2 dTLB per 1000 inst\ninstructions         2738555        #\nicache               1323391        # 483.244 icache per 1000 inst\nicache miss          110104         #  8.3% icache miss rate\nl1 iTLB miss         6              # 0.002 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            19             # 0.007 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics show L2 cache as most active stalls for memory<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              1393.848\non_cpu               0.320          # 5.12 \/ 16 cores\nutime                7051.575\nstime                81.121\nnvcsw                5955297        # 98.67%\nnivcsw               80192          # 1.33%\ninblock              20984032       # 15054.75\/sec\nonblock              5464           # 3.92\/sec\ncpu-clock            7093046110780  # 7093.046 seconds\ntask-clock           7096878999736  # 7096.879 seconds\npage faults          2495483        # 351.631\/sec\ncontext switches     6042276        # 851.399\/sec\ncpu migrations       163180         # 22.993\/sec\nmajor page faults    2861           # 0.403\/sec\nminor page faults    2492622        # 351.228\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             2459209749628  # 50.743 branches per 1000 inst\nbranch misses        44820326405    # 1.82% branch miss\nconditional          2459209765468  # 50.743 conditional branches per 1000 inst\nindirect             620204202405   # 12.797 indirect branches per 1000 inst\nslots                49219054467146 #\nretiring             23760356174046 # 48.3% (48.3%)\n-- ucode             1008550311918  #     2.0%\n-- fastpath          22751805862128 #    46.2%\nfrontend             6378237707292  # 13.0% (13.0%)\n-- latency           3046542845679  #     6.2%\n-- bandwidth         3331694861613  #     6.8%\nbackend              16273030130206 # 33.1% (33.1%)\n-- cpu               9088240401088  #    18.5%\n-- memory            7184789729118  #    14.6%\nspeculation          3130243686700  #  6.4% ( 6.4%)\n-- branch mispredict 2944870071324  #     6.0%\n-- pipeline restart  185373615376   #     0.4%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           16282458738408 # 1.01 GHz\ninstructions         44439229330649 # 2.73 IPC\nl2 access            902938123881   # 39.431 l2 access per 1000 inst\nl2 miss              264018120167   # 29.24% l2 miss\ncpu-cycles           8391353786130  # 21.0% memory latency\nload stalls          1601500730217  #  0.0% l1 bound\nl1 miss              1707146006108  # 12.4% l2 bound\nl2 miss              668404285722   #  4.3% l3 bound\nl3 miss              305217164369   #  3.6% dram bound\nstore_stalls         158271209456   #  1.9% store bound\n<\/code><\/pre>\n\n\n\n<p>Process overview shows vpxenc as the primary process<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>450 processes\n\t 96 vpxenc               23832.39   254.91\n\t 68 clinfo                  17.20     5.33\n\t 38 vulkaninfo               1.14     1.14\n\t  4 vulkani:disk$0           0.12     0.12\n\t  6 glxinfo:gdrv0            0.10     0.04\n\t  6 glxinfo:gl0              0.10     0.04\n\t  6 clang                    0.10     0.01\n\t  6 php                      0.07     0.17\n\t  2 llvmpipe-0               0.06     0.06\n\t  2 llvmpipe-1               0.06     0.06\n\t  2 llvmpipe-10              0.06     0.06\n\t  2 llvmpipe-11              0.06     0.06\n\t  2 llvmpipe-12              0.06     0.06\n\t  2 llvmpipe-13              0.06     0.06\n\t  2 llvmpipe-14              0.06     0.06\n\t  2 llvmpipe-15              0.06     0.06\n\t  2 llvmpipe-2               0.06     0.06\n\t  2 llvmpipe-3               0.06     0.06\n\t  2 llvmpipe-4               0.06     0.06\n\t  2 llvmpipe-5               0.06     0.06\n\t  2 llvmpipe-6               0.06     0.06\n\t  2 llvmpipe-7               0.06     0.06\n\t  2 llvmpipe-8               0.06     0.06\n\t  2 llvmpipe-9               0.06     0.06\n\t  2 glxinfo                  0.06     0.02\n\t  2 glxinfo:cs0              0.06     0.02\n\t  2 glxinfo:disk$0           0.06     0.02\n\t  2 glxinfo:sh0              0.06     0.02\n\t  2 glxinfo:shlo0            0.06     0.02\n\t  3 rocminfo                 0.03     0.00\n\t  1 lspci                    0.00     0.02\n\t 88 sh                       0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t 13 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 gmain                    0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dconf worker             0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 ps                       0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Computation structure is straightforward.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      333831) vpxenc           cpu=11 start=6.65  finish=91.46\n        333832) vpxenc           cpu=11 start=6.66  finish=91.46\n          333833) vpxenc           cpu=2 start=7.15  finish=91.41\n          333834) vpxenc           cpu=12 start=7.15  finish=91.41\n          333835) vpxenc           cpu=7 start=7.16  finish=91.41\n          333836) vpxenc           cpu=1 start=7.16  finish=91.41\n          333837) vpxenc           cpu=5 start=7.16  finish=91.41\n          333838) vpxenc           cpu=6 start=7.16  finish=91.41\n          333839) vpxenc           cpu=0 start=7.17  finish=91.41\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Video encoding using the Google libvpx library. There are four workloads, two for each of two speed levels and then for 4K and 1080p decoding. Looks like variable numbers of processes though one per physical core. Topdown profil has slight <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/vpxenc\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-1365","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1365","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=1365"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1365\/revisions"}],"predecessor-version":[{"id":1399,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1365\/revisions\/1399"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=1365"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}