{"id":349,"date":"2024-01-08T01:04:59","date_gmt":"2024-01-08T01:04:59","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=349"},"modified":"2024-01-08T13:27:06","modified_gmt":"2024-01-08T13:27:06","slug":"ffmpeg","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/ffmpeg\/","title":{"rendered":"ffmpeg"},"content":{"rendered":"\n<p>Benchmark for the ffmpeg  multimedia framework, working with various video and image workloads.  Eight different workloads with slightly different characteristics but moderate retirement rate.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-29.png\" alt=\"\" class=\"wp-image-353\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-29.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-29-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-29-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show we&#8217;re spending ~3.62 of cores, with a floating point app and some workloads with more branch prediction issues.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              3516.551\non_cpu               0.226          # 3.62 \/ 16 cores\nutime                12390.943\nstime                340.884\nnvcsw                15578181       # 97.24%\nnivcsw               442087         # 2.76%\ninblock              544            # 0.15\/sec\nonblock              2240680        # 637.18\/sec\ncpu-clock            12698964696026 # 12698.965 seconds\ntask-clock           12707598404585 # 12707.598 seconds\npage faults          79503821       # 6256.400\/sec\ncontext switches     16034984       # 1261.842\/sec\ncpu migrations       1151669        # 90.628\/sec\nmajor page faults    209            # 0.016\/sec\nminor page faults    79503612       # 6256.384\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             7203248030886  # 75.632 branches per 1000 inst\nbranch misses        240049896958   # 3.33% branch miss\nconditional          4533988502930  # 47.606 conditional branches per 1000 inst\nindirect             523019511730   # 5.492 indirect branches per 1000 inst\ncpu-cycles           50141302219917 # 0.90 GHz\ninstructions         95031760041237 # 1.90 IPC\nslots                101058172691034 #\nretiring             33343174547595 # 33.0% (38.6%)\n-- ucode             369097077701   #     0.4%\n-- fastpath          32974077469894 #    32.6%\nfrontend             18316326101792 # 18.1% (21.2%)\n-- latency           11796656848290 #    11.7%\n-- bandwidth         6519669253502  #     6.5%\nbackend              28902503738660 # 28.6% (33.5%)\n-- cpu               9238344511430  #     9.1%\n-- memory            19664159227230 #    19.5%\nspeculation          5741462680659  #  5.7% ( 6.7%)\n-- branch mispredict 5493902043181  #     5.4%\n-- pipeline restart  247560637478   #     0.2%\nsmt-contention       14753369236746 # 14.6% ( 0.0%)\ncpu-cycles           50150039568465 # 0.90 GHz\ninstructions         95013999407440 # 1.89 IPC\ninstructions         31732577834943 # 38.418 l2 access per 1000 inst\nl2 hit from l1       1041329163186  # 10.60% l2 miss\nl2 miss from l1      77034115956    #\nl2 hit from l2 pf    125542444785   #\nl3 hit from l2 pf    32137285724    #\nl3 miss from l2 pf   20098536952    #\ninstructions         31712371266847 # 162.036 float per 1000 inst\nfloat 512            1031           # 0.000 AVX-512 per 1000 inst\nfloat 256            8708556624     # 0.275 AVX-256 per 1000 inst\nfloat 128            5129851498771  # 161.762 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         431            # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              4537.973\non_cpu               0.224          # 3.59 \/ 16 cores\nutime                15956.113\nstime                313.231\nnvcsw                17077668       # 93.39%\nnivcsw               1209684        # 6.61%\ninblock              872            # 0.19\/sec\nonblock              2240792        # 493.79\/sec\ncpu-clock            16173096823697 # 16173.097 seconds\ntask-clock           16184087266515 # 16184.087 seconds\npage faults          79247994       # 4896.661\/sec\ncontext switches     18307277       # 1131.190\/sec\ncpu migrations       2784557        # 172.055\/sec\nmajor page faults    105            # 0.006\/sec\nminor page faults    79247889       # 4896.655\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             7230167358835  # 74.859 branches per 1000 inst\nbranch misses        237818000973   # 3.29% branch miss\nconditional          7230168530419  # 74.859 conditional branches per 1000 inst\nindirect             2079543575777  # 21.531 indirect branches per 1000 inst\nslots                152043588937604 #\nretiring             67100901539543 # 44.1% (44.1%)\n-- ucode             4640783240152  #     3.1%\n-- fastpath          62460118299391 #    41.1%\nfrontend             27983850905441 # 18.4% (18.4%)\n-- latency           12653607703945 #     8.3%\n-- bandwidth         15330243201496 #    10.1%\nbackend              33676837503150 # 22.1% (22.1%)\n-- cpu               21959791561468 #    14.4%\n-- memory            11717045941682 #     7.7%\nspeculation          24216123543815 # 15.9% (15.9%)\n-- branch mispredict 23560593854272 #    15.5%\n-- pipeline restart  655529689543   #     0.4%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           57065942417550 # 0.79 GHz\ninstructions         123132261252145 # 2.16 IPC\nl2 access            1879230062109  # 29.384 l2 access per 1000 inst\nl2 miss              324166645172   # 17.25% l2 miss\n<\/code><\/pre>\n\n\n\n<p>Process tree shows different av:: encoders being used for different tests. A lot of short-lived processes.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>36594 processes\n\t10914 ffmpeg               185063.52  3556.32\n\t1080 av:h264:df15         12462.15   276.74\n\t1080 av:h264:df14         12462.13   276.67\n\t1080 av:h264:df13         12462.08   276.64\n\t1080 av:h264:df12         12462.03   276.58\n\t1080 av:h264:df11         12461.98   276.51\n\t1080 av:h264:df10         12461.94   276.41\n\t1080 av:h264:df9          12461.87   276.33\n\t1080 av:h264:df8          12461.82   276.29\n\t1080 av:h264:df7          12461.71   276.27\n\t1080 av:h264:df6          12461.62   276.22\n\t1080 av:h264:df5          12461.59   276.11\n\t1080 av:h264:df4          12461.48   276.06\n\t1080 av:h264:df3          12461.42   275.99\n\t1080 av:h264:df2          12461.36   275.95\n\t1080 av:h264:df1          12461.30   275.91\n\t1080 av:h264:df0          12461.21   275.80\n\t900 dec0:0:h264           9524.23   211.29\n\t900 dmx0:matroska,w       9309.61   207.19\n\t360 mux0:matroska         7341.88   105.99\n\t540 mux0:null             4586.64   125.17\n\t360 dmx1:matroska,w       1036.13    65.92\n\t180 av:hevc:df15           610.90    38.27\n\t180 av:hevc:df14           610.90    38.23\n\t180 av:hevc:df12           610.89    38.23\n\t180 av:hevc:df13           610.89    38.23\n\t180 av:hevc:df11           610.88    38.22\n\t180 av:hevc:df10           610.87    38.22\n\t180 av:hevc:df9            610.87    38.21\n\t180 av:hevc:df7            610.87    38.19\n\t180 av:hevc:df8            610.87    38.19\n\t180 av:hevc:df4            610.87    38.17\n\t180 av:hevc:df5            610.87    38.17\n\t180 av:hevc:df6            610.87    38.17\n\t180 av:hevc:df2            610.86    38.16\n\t180 av:hevc:df3            610.86    38.16\n\t180 av:hevc:df1            610.85    38.16\n\t180 av:hevc:df0            610.84    38.15\n\t180 dec1:0:hevc            607.72    36.37\n\t180 dec1:0:h264            522.94    34.05\n\t1350 ffprobe                427.14     4.24\n\t 64 clinfo                  10.88     3.52\n\t 25 python3                  1.38     0.75\n\t 38 vulkaninfo               0.95     0.77\n\t  6 php                      0.16     0.68\n\t  4 vulkani:disk$0           0.10     0.09\n\t  6 glxinfo:gdrv0            0.05     0.10\n\t  2 llvmpipe-0               0.05     0.05\n\t  2 llvmpipe-1               0.05     0.05\n\t  2 llvmpipe-10              0.05     0.05\n\t  2 llvmpipe-11              0.05     0.05\n\t  2 llvmpipe-12              0.05     0.05\n\t  2 llvmpipe-13              0.05     0.05\n\t  2 llvmpipe-14              0.05     0.05\n\t  2 llvmpipe-15              0.05     0.05\n\t  2 llvmpipe-2               0.05     0.05\n\t  2 llvmpipe-3               0.05     0.05\n\t  2 llvmpipe-4               0.05     0.05\n\t  2 llvmpipe-5               0.05     0.05\n\t  2 llvmpipe-6               0.05     0.05\n\t  2 llvmpipe-7               0.05     0.05\n\t  2 llvmpipe-8               0.05     0.05\n\t  2 llvmpipe-9               0.05     0.05\n\t  2 glxinfo                  0.04     0.04\n\t  2 glxinfo:cs0              0.04     0.04\n\t  2 glxinfo:disk$0           0.04     0.04\n\t  2 glxinfo:shlo0            0.04     0.04\n\t  2 glxinfo:sh0              0.03     0.04\n\t  6 clang                    0.02     0.04\n\t  1 lspci                    0.00     0.02\n\t464 sh                       0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t  9 gsettings                0.00     0.00\n\t  9 stty                     0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  4 gmain                    0.00     0.00\n\t  3 dconf worker             0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 ps                       0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n65 maximum processes\n<\/code><\/pre>\n\n\n\n<p>An example test shows parallelism might be more limited by processes than by available cores.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>     238602) ffmpeg           cpu=7 start=5.64  finish=41.62\n        238603) python3          cpu=6 start=5.64  finish=41.62\n          238604) ffprobe          cpu=1 start=5.65  finish=5.67 \n          238605) ffprobe          cpu=2 start=5.67  finish=5.97 \n          238606) ffprobe          cpu=6 start=5.97  finish=5.99 \n          238607) ffmpeg           cpu=6 start=5.99  finish=7.02 \n            238608) av:h264:df0      cpu=9 start=6.00  finish=7.02 \n            238609) av:h264:df1      cpu=10 start=6.00  finish=7.02 \n            238610) av:h264:df2      cpu=11 start=6.00  finish=7.02 \n            238611) av:h264:df3      cpu=12 start=6.00  finish=7.02 \n            238612) av:h264:df4      cpu=14 start=6.00  finish=7.02 \n            238613) av:h264:df5      cpu=7 start=6.00  finish=7.02 \n            238614) av:h264:df6      cpu=5 start=6.00  finish=7.02 \n            238615) av:h264:df7      cpu=9 start=6.00  finish=7.02 \n            238616) av:h264:df8      cpu=10 start=6.00  finish=7.02 \n            238617) av:h264:df9      cpu=11 start=6.00  finish=7.02 \n            238618) av:h264:df10     cpu=12 start=6.00  finish=7.02 \n            238619) av:h264:df11     cpu=8 start=6.00  finish=7.02 \n            238620) av:h264:df12     cpu=9 start=6.00  finish=7.02 \n            238621) av:h264:df13     cpu=5 start=6.00  finish=7.02 \n            238622) av:h264:df14     cpu=9 start=6.00  finish=7.02 \n            238623) av:h264:df15     cpu=10 start=6.01  finish=7.02 \n            238624) dec0:0:h264      cpu=7 start=6.01  finish=6.98 \n            238625) dmx0:matroska,w  cpu=1 start=6.01  finish=6.97 \n            238626) mux0:matroska    cpu=5 start=6.04  finish=7.02 \n          238627) sh               cpu=9 start=7.03  finish=7.23 \n            238628) ffmpeg           cpu=13 start=7.03  finish=7.22 \n              238629) av:h264:df0      cpu=0 start=7.05  finish=7.22 \n              238630) av:h264:df1      cpu=15 start=7.05  finish=7.22 \n              238631) av:h264:df2      cpu=12 start=7.05  finish=7.22 \n              238632) av:h264:df3      cpu=2 start=7.05  finish=7.22 \n              238633) av:h264:df4      cpu=14 start=7.05  finish=7.22 \n              238634) av:h264:df5      cpu=5 start=7.06  finish=7.22 \n              238635) av:h264:df6      cpu=11 start=7.06  finish=7.22 \n              238636) av:h264:df7      cpu=15 start=7.06  finish=7.22 \n              238637) av:h264:df8      cpu=0 start=7.06  finish=7.22 \n              238638) av:h264:df9      cpu=11 start=7.06  finish=7.22 \n              238639) av:h264:df10     cpu=7 start=7.06  finish=7.22 \n              238640) av:h264:df11     cpu=5 start=7.06  finish=7.22 \n              238641) av:h264:df12     cpu=3 start=7.06  finish=7.22 \n              238642) av:h264:df13     cpu=1 start=7.06  finish=7.22 \n              238643) av:h264:df14     cpu=12 start=7.06  finish=7.22 \n              238644) av:h264:df15     cpu=12 start=7.06  finish=7.22 \n              238645) dec0:0:h264      cpu=6 start=7.06  finish=7.21 \n              238646) av:h264:df0      cpu=3 start=7.06  finish=7.22 \n              238647) av:h264:df1      cpu=10 start=7.06  finish=7.22 \n              238648) av:h264:df2      cpu=4 start=7.06  finish=7.22 \n              238649) av:h264:df3      cpu=9 start=7.06  finish=7.22 \n              238650) av:h264:df4      cpu=3 start=7.06  finish=7.22 \n              238651) av:h264:df5      cpu=7 start=7.06  finish=7.22 \n              238652) av:h264:df6      cpu=2 start=7.06  finish=7.22 \n              238653) av:h264:df7      cpu=12 start=7.06  finish=7.22 \n              238654) av:h264:df8      cpu=10 start=7.06  finish=7.22 \n              238655) av:h264:df9      cpu=0 start=7.06  finish=7.22 \n              238656) av:h264:df10     cpu=14 start=7.06  finish=7.22 \n              238657) av:h264:df11     cpu=2 start=7.06  finish=7.22 \n              238658) av:h264:df12     cpu=10 start=7.06  finish=7.22 \n              238659) av:h264:df13     cpu=7 start=7.06  finish=7.22 \n              238660) av:h264:df14     cpu=11 start=7.06  finish=7.22 \n              238661) av:h264:df15     cpu=14 start=7.06  finish=7.22 \n              238662) dec1:0:h264      cpu=5 start=7.06  finish=7.21 \n              238663) dmx0:matroska,w  cpu=6 start=7.07  finish=7.20 \n              238664) dmx1:matroska,w  cpu=5 start=7.10  finish=7.20 \n              238665) ffmpeg           cpu=6 start=7.13  finish=7.22 \n              238666) ffmpeg           cpu=2 start=7.13  finish=7.22 \n              238667) ffmpeg           cpu=8 start=7.13  finish=7.22 \n              238668) ffmpeg           cpu=11 start=7.13  finish=7.22 \n              238669) ffmpeg           cpu=9 start=7.13  finish=7.22 \n              238670) ffmpeg           cpu=12 start=7.13  finish=7.22 \n              238671) ffmpeg           cpu=5 start=7.13  finish=7.22 \n              238672) ffmpeg           cpu=0 start=7.13  finish=7.22 \n              238673) ffmpeg           cpu=1 start=7.13  finish=7.22 \n              238674) ffmpeg           cpu=14 start=7.13  finish=7.22 \n              238675) ffmpeg           cpu=10 start=7.13  finish=7.22 \n              238676) ffmpeg           cpu=15 start=7.13  finish=7.22 \n              238677) ffmpeg           cpu=4 start=7.13  finish=7.22 \n              238678) ffmpeg           cpu=3 start=7.13  finish=7.22 \n              238679) ffmpeg           cpu=7 start=7.13  finish=7.22 \n              238680) mux0:null        cpu=11 start=7.13  finish=7.21 \n          238681) ffprobe          cpu=12 start=7.23  finish=7.25 \n          238682) ffprobe          cpu=11 start=7.25  finish=7.25 \n          238683) ffprobe          cpu=12 start=7.25  finish=7.45 \n          238684) ffprobe          cpu=6 start=7.45  finish=7.46 \n          238685) ffmpeg           cpu=5 start=7.46  finish=7.98 \n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Benchmark for the ffmpeg multimedia framework, working with various video and image workloads. Eight different workloads with slightly different characteristics but moderate retirement rate. AMD metrics show we&#8217;re spending ~3.62 of cores, with a floating point app and some workloads <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/ffmpeg\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-349","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/349","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=349"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/349\/revisions"}],"predecessor-version":[{"id":354,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/349\/revisions\/354"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=349"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}