{"id":182,"date":"2024-01-03T02:50:34","date_gmt":"2024-01-03T02:50:34","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=182"},"modified":"2024-01-03T02:51:20","modified_gmt":"2024-01-03T02:51:20","slug":"x265","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/x265\/","title":{"rendered":"x265"},"content":{"rendered":"\n<p>x265 benchmark with standard input and two workloads<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-6.png\" alt=\"\" class=\"wp-image-184\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-6.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-6-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-6-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show a 128-bit floating point program with some L2 misses and backend memory but otherwise medium range retiring. We spend ~50% of time on the CPU so threaded but not fully threaded operation?<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              713.905\non_cpu               0.502          # 8.03 \/ 16 cores\nutime                5668.028\nstime                66.483\nnvcsw                2807701        # 92.74%\nnivcsw               219710         # 7.26%\ninblock              6170832\nonblock              27632\ncpu-clock            5730360170277  # 5730.360 seconds\ntask-clock           5731795930958  # 5731.796 seconds\npage faults          6120568        # 1067.827\/sec\ncontext switches     3030769        # 528.764\/sec\ncpu migrations       176614         # 30.813\/sec\nmajor page faults    67             # 0.012\/sec\nminor page faults    6120501        # 1067.816\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             2298223476101  # 64.254 branches per 1000 inst\nbranch misses        38765597182    # 1.69% branch miss\nconditional          1447123909275  # 40.459 conditional branches per 1000 inst\nindirect             184907459078   # 5.170 indirect branches per 1000 inst\ncpu-cycles           6226498530298  # 1.98 GHz\ninstructions         9678813453851  # 1.55 IPC\nslots                12467027650776 #\nretiring             3419335741620  # 27.4% (35.4%)\n-- ucode             47510862855    #     0.4%\n-- fastpath          3371824878765  #    27.0%\nfrontend             1352978437698  # 10.9% (14.0%)\n-- latency           861177646632   #     6.9%\n-- bandwidth         491800791066   #     3.9%\nbackend              4618690961014  # 37.0% (47.8%)\n-- cpu               1412014283462  #    11.3%\n-- memory            3206676677552  #    25.7%\nspeculation          263046398653   #  2.1% ( 2.7%)\n-- branch mispredict 242439249427   #     1.9%\n-- pipeline restart  20607149226    #     0.2%\nsmt-contention       2812859705014  # 22.6% ( 0.0%)\ncpu-cycles           7936497399423  # 2.00 GHz\ninstructions         12283162200513 # 1.55 IPC\ninstructions         4099519478946  # 39.677 l2 access per 1000 inst\nl2 hit from l1       134192432419   # 12.06% l2 miss\nl2 miss from l1      12511817133    #\nl2 hit from l2 pf    21366966534    #\nl3 hit from l2 pf    3874073579     #\nl3 miss from l2 pf   3222493799     #\ninstructions         4091045784055  # 190.915 float per 1000 inst\nfloat 512            71             # 0.000 AVX-512 per 1000 inst\nfloat 256            1060           # 0.000 AVX-256 per 1000 inst\nfloat 128            781040286672   # 190.915 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst<\/code><\/pre>\n\n\n\n<p>Corresponding Intel metrics shows >50% retiring rate.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              788.657\non_cpu               0.610          # 9.76 \/ 16 cores\nutime                7603.378\nstime                90.091\nnvcsw                3688077        # 90.89%\nnivcsw               369791         # 9.11%\ninblock              31549176\nonblock              26776\ncpu-clock            7679288766057  # 7679.289 seconds\ntask-clock           7681427192071  # 7681.427 seconds\npage faults          6542506        # 851.731\/sec\ncontext switches     4061655        # 528.763\/sec\ncpu migrations       594059         # 77.337\/sec\nmajor page faults    793            # 0.103\/sec\nminor page faults    6541713        # 851.627\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             2477806622104  # 64.408 branches per 1000 inst\nbranch misses        43684720181    # 1.76% branch miss\nconditional          2477806644216  # 64.408 conditional branches per 1000 inst\nindirect             880974334257   # 22.900 indirect branches per 1000 inst\nslots                10702457016518 #\nretiring             5872152202319  # 54.9% (54.9%)\n-- ucode             373372570662   #     3.5%\n-- fastpath          5498779631657  #    51.4%\nfrontend             1885584589158  # 17.6% (17.6%)\n-- latency           953145946637   #     8.9%\n-- bandwidth         932438642521   #     8.7%\nbackend              2105534559715  # 19.7% (19.7%)\n-- cpu               981934933436   #     9.2%\n-- memory            1123599626279  #    10.5%\nspeculation          939803939689   #  8.8% ( 8.8%)\n-- branch mispredict 884451371813   #     8.3%\n-- pipeline restart  55352567876    #     0.5%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           6467518831703  # 1.91 GHz\ninstructions         12325634319096 # 1.91 IPC\nl2 access            173564026661   # 31.787 l2 access per 1000 inst\nl2 miss              36974658876    # 21.30% l2 miss\n<\/code><\/pre>\n\n\n\n<p>No surprises from the process summary, all time spent in x265 process<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>434 processes\n\t150 x265                 36013.72   343.12\n\t 38 vulkaninfo               0.76     1.14\n\t  6 glxinfo:gdrv0            0.16     0.13\n\t  2 glxinfo                  0.12     0.05\n\t  2 glxinfo:cs0              0.12     0.05\n\t  2 glxinfo:disk$0           0.12     0.05\n\t  2 glxinfo:sh0              0.12     0.05\n\t  2 glxinfo:shlo0            0.12     0.05\n\t  4 vulkani:disk$0           0.08     0.12\n\t  2 llvmpipe-0               0.04     0.06\n\t  2 llvmpipe-1               0.04     0.06\n\t  2 llvmpipe-10              0.04     0.06\n\t  2 llvmpipe-11              0.04     0.06\n\t  2 llvmpipe-12              0.04     0.06\n\t  2 llvmpipe-13              0.04     0.06\n\t  2 llvmpipe-14              0.04     0.06\n\t  2 llvmpipe-15              0.04     0.06\n\t  2 llvmpipe-2               0.04     0.06\n\t  2 llvmpipe-3               0.04     0.06\n\t  2 llvmpipe-4               0.04     0.06\n\t  2 llvmpipe-5               0.04     0.06\n\t  2 llvmpipe-6               0.04     0.06\n\t  2 llvmpipe-7               0.04     0.06\n\t  2 llvmpipe-8               0.04     0.06\n\t  2 llvmpipe-9               0.04     0.06\n\t  6 php                      0.03     0.14\n\t  6 clang                    0.02     0.06\n\t  1 lspci                    0.00     0.03\n\t 88 sh                       0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t  9 gsettings                0.00     0.00\n\t  9 stty                     0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  4 gmain                    0.00     0.00\n\t  3 dconf worker             0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 ps                       0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n<\/code><\/pre>\n\n\n\n<p>The core blocks start slightly more threads than cores<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      1666842) x265 start=4.92  finish=52.17\n        1666843) x265 start=4.92  finish=52.16\n          1666844) x265 start=4.92  finish=50.25\n          1666845) x265 start=4.92  finish=52.06\n          1666846) x265 start=4.92  finish=52.06\n          1666847) x265 start=4.92  finish=52.06\n          1666848) x265 start=4.92  finish=52.06\n          1666849) x265 start=4.92  finish=52.06\n          1666850) x265 start=4.92  finish=52.06\n          1666851) x265 start=4.92  finish=52.06\n          1666852) x265 start=4.92  finish=52.06\n          1666853) x265 start=4.92  finish=52.06\n          1666854) x265 start=4.92  finish=52.06\n          1666855) x265 start=4.92  finish=52.06\n          1666856) x265 start=4.92  finish=52.06\n          1666857) x265 start=4.92  finish=52.06\n          1666858) x265 start=4.92  finish=52.06\n          1666859) x265 start=4.92  finish=52.06\n          1666860) x265 start=4.92  finish=52.06\n          1666861) x265 start=4.92  finish=52.06\n          1666862) x265 start=4.94  finish=52.06\n          1666863) x265 start=4.94  finish=52.06\n          1666864) x265 start=4.94  finish=52.06\n          1666865) x265 start=4.94  finish=52.16\n          1666866) x265 start=4.94  finish=50.58<\/code><\/pre>\n\n\n\n<p>What accounts for the relatively lower amount of on-core time?  The AMD metrics provide one clue with the number of inblock\/outblock operations (need to update the metrics to provide these as a rate).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>x265 benchmark with standard input and two workloads AMD metrics show a 128-bit floating point program with some L2 misses and backend memory but otherwise medium range retiring. We spend ~50% of time on the CPU so threaded but not <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/x265\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-182","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/182","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=182"}],"version-history":[{"count":1,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/182\/revisions"}],"predecessor-version":[{"id":185,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/182\/revisions\/185"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=182"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}