{"id":1300,"date":"2024-02-02T12:02:56","date_gmt":"2024-02-02T12:02:56","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=1300"},"modified":"2024-02-03T01:19:45","modified_gmt":"2024-02-03T01:19:45","slug":"x264","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/x264\/","title":{"rendered":"x264"},"content":{"rendered":"\n<p>Testing a multi-threaded x264 video encoder. This encodes a 4K image followed by a 1080p image. It seems to bounce between different numbers of runnable processes.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-12.png\" alt=\"\" class=\"wp-image-1315\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-12.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-12-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/systemtime-12-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows an almost even split between retirement and backend stalls.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-12.png\" alt=\"\" class=\"wp-image-1317\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-12.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-12-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/02\/amdtopdown-12-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show an average of ~10 cores busy. This is floating point code with not as many branches.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              109.681\non_cpu               0.608          # 9.72 \/ 16 cores\nutime                1053.842\nstime                12.299\nnvcsw                361218         # 81.01%\nnivcsw               84696          # 18.99%\ninblock              3256           # 29.69\/sec\nonblock              12888          # 117.50\/sec\ncpu-clock            1066288166468  # 1066.288 seconds\ntask-clock           1066416971546  # 1066.417 seconds\npage faults          721586         # 676.645\/sec\ncontext switches     446284         # 418.489\/sec\ncpu migrations       157506         # 147.696\/sec\nmajor page faults    26             # 0.024\/sec\nminor page faults    721560         # 676.621\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             472780029627   # 74.946 branches per 1000 inst\nbranch misses        12203448367    # 2.58% branch miss\nconditional          277428552179   # 43.978 conditional branches per 1000 inst\nindirect             57936525347    # 9.184 indirect branches per 1000 inst\ncpu-cycles           3999823451251  # 2.28 GHz\ninstructions         6305581355914  # 1.58 IPC\nslots                8002833887928  #\nretiring             2225805517930  # 27.8% (38.2%)\n-- ucode             18730169192    #     0.2%\n-- fastpath          2207075348738  #    27.6%\nfrontend             1162081889243  # 14.5% (20.0%)\n-- latency           844391783640   #    10.6%\n-- bandwidth         317690105603   #     4.0%\nbackend              2194344612641  # 27.4% (37.7%)\n-- cpu               776316315076   #     9.7%\n-- memory            1418028297565  #    17.7%\nspeculation          239678438231   #  3.0% ( 4.1%)\n-- branch mispredict 216097064427   #     2.7%\n-- pipeline restart  23581373804    #     0.3%\nsmt-contention       2180884225504  # 27.3% ( 0.0%)\ncpu-cycles           4004643116766  # 2.28 GHz\ninstructions         6306441819098  # 1.57 IPC\ninstructions         2101933143976  # 53.068 l2 access per 1000 inst\nl2 hit from l1       80012154529    # 6.03% l2 miss\nl2 miss from l1      3004489202     #\nl2 hit from l2 pf    27808913292    #\nl3 hit from l2 pf    1203246849     #\nl3 miss from l2 pf   2520651155     #\ninstructions         2103726837096  # 164.465 float per 1000 inst\nfloat 512            60             # 0.000 AVX-512 per 1000 inst\nfloat 256            3715879476     # 1.766 AVX-256 per 1000 inst\nfloat 128            342273836220   # 162.699 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              575.678\non_cpu               0.693          # 11.08 \/ 16 cores\nutime                6331.656\nstime                47.690\nnvcsw                1493407        # 76.25%\nnivcsw               465204         # 23.75%\ninblock              4939192        # 8579.78\/sec\nonblock              2408           # 4.18\/sec\ncpu-clock            6378815768388  # 6378.816 seconds\ntask-clock           6379289177912  # 6379.289 seconds\npage faults          2716174        # 425.780\/sec\ncontext switches     1961241        # 307.439\/sec\ncpu migrations       703420         # 110.266\/sec\nmajor page faults    15671          # 2.457\/sec\nminor page faults    2700503        # 423.323\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             2289382131781  # 69.437 branches per 1000 inst\nbranch misses        59249358432    # 2.59% branch miss\nconditional          2289382171973  # 69.437 conditional branches per 1000 inst\nindirect             902145820391   # 27.362 indirect branches per 1000 inst\nslots                8778509383352  #\nretiring             4708390332421  # 53.6% (53.6%)\n-- ucode             289611573500   #     3.3%\n-- fastpath          4418778758921  #    50.3%\nfrontend             2069165496342  # 23.6% (23.6%)\n-- latency           824618892139   #     9.4%\n-- bandwidth         1244546604203  #    14.2%\nbackend              1121944522091  # 12.8% (12.8%) low\n-- cpu               574049221261   #     6.5%\n-- memory            547895300830   #     6.2%\nspeculation          833840065989   #  9.5% ( 9.5%)\n-- branch mispredict 804921122278   #     9.2%\n-- pipeline restart  28918943711    #     0.3%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           6445844947005  # 2.15 GHz\ninstructions         11708266075332 # 1.82 IPC\nl2 access            170225224653   # 29.698 l2 access per 1000 inst\nl2 miss              16990061571    # 9.98% l2 miss\n<\/code><\/pre>\n\n\n\n<p>Process overview<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>542 processes\n\t192 x264                 32497.05   319.50\n\t 68 clinfo                  16.52     6.47\n\t 38 vulkaninfo               0.94     1.33\n\t  6 glxinfo:gdrv0            0.16     0.05\n\t  6 glxinfo:gl0              0.16     0.05\n\t  4 vulkani:disk$0           0.10     0.14\n\t  6 php                      0.08     0.07\n\t  2 glxinfo                  0.06     0.03\n\t  2 glxinfo:cs0              0.06     0.03\n\t  2 glxinfo:disk$0           0.06     0.03\n\t  2 glxinfo:sh0              0.06     0.03\n\t  2 glxinfo:shlo0            0.06     0.03\n\t  2 llvmpipe-0               0.05     0.07\n\t  2 llvmpipe-1               0.05     0.07\n\t  2 llvmpipe-10              0.05     0.07\n\t  2 llvmpipe-11              0.05     0.07\n\t  2 llvmpipe-12              0.05     0.07\n\t  2 llvmpipe-13              0.05     0.07\n\t  2 llvmpipe-14              0.05     0.07\n\t  2 llvmpipe-15              0.05     0.07\n\t  2 llvmpipe-2               0.05     0.07\n\t  2 llvmpipe-3               0.05     0.07\n\t  2 llvmpipe-4               0.05     0.07\n\t  2 llvmpipe-5               0.05     0.07\n\t  2 llvmpipe-6               0.05     0.07\n\t  2 llvmpipe-7               0.05     0.07\n\t  2 llvmpipe-8               0.05     0.07\n\t  2 llvmpipe-9               0.05     0.07\n\t  6 clang                    0.04     0.08\n\t  3 rocminfo                 0.03     0.00\n\t  1 lspci                    0.00     0.02\n\t 84 sh                       0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t 12 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  3 gmain                    0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dconf worker             0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 ps                       0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Computation blocks are straightforward<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      190664) x264             cpu=15 start=5.57  finish=25.65\n        190665) x264             cpu=15 start=5.57  finish=25.65\n          190666) x264             cpu=12 start=5.59  finish=25.65\n          190667) x264             cpu=3 start=5.59  finish=25.63\n          190668) x264             cpu=12 start=5.59  finish=25.63\n          190669) x264             cpu=2 start=5.59  finish=25.63\n          190670) x264             cpu=9 start=5.59  finish=25.63\n          190671) x264             cpu=2 start=5.59  finish=25.63\n          190672) x264             cpu=5 start=5.59  finish=25.63\n          190673) x264             cpu=11 start=5.59  finish=25.63\n          190674) x264             cpu=15 start=5.59  finish=25.63\n          190675) x264             cpu=13 start=5.59  finish=25.63\n          190676) x264             cpu=0 start=5.59  finish=25.63\n          190677) x264             cpu=13 start=5.59  finish=25.63\n          190678) x264             cpu=7 start=5.59  finish=25.63\n          190679) x264             cpu=14 start=5.59  finish=25.63\n          190680) x264             cpu=9 start=5.59  finish=25.63\n          190681) x264             cpu=7 start=5.59  finish=25.63\n          190682) x264             cpu=8 start=5.59  finish=25.63\n          190683) x264             cpu=6 start=5.59  finish=25.63\n          190684) x264             cpu=6 start=5.59  finish=25.63\n          190685) x264             cpu=5 start=5.59  finish=25.63\n          190686) x264             cpu=0 start=5.59  finish=25.63\n          190687) x264             cpu=8 start=5.59  finish=25.63\n          190688) x264             cpu=10 start=5.59  finish=25.63\n          190689) x264             cpu=1 start=5.59  finish=25.63\n          190690) x264             cpu=4 start=5.59  finish=25.63\n          190691) x264             cpu=0 start=5.59  finish=25.63\n          190692) x264             cpu=1 start=5.59  finish=25.63\n          190693) x264             cpu=11 start=5.59  finish=25.63\n          190694) x264             cpu=9 start=5.59  finish=25.63\n          190695) x264             cpu=11 start=5.60  finish=24.33\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Testing a multi-threaded x264 video encoder. This encodes a 4K image followed by a 1080p image. It seems to bounce between different numbers of runnable processes. Topdown profile shows an almost even split between retirement and backend stalls. AMD metrics <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/x264\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-1300","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1300","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=1300"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1300\/revisions"}],"predecessor-version":[{"id":1318,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1300\/revisions\/1318"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=1300"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}