{"id":1141,"date":"2024-01-31T01:14:23","date_gmt":"2024-01-31T01:14:23","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=1141"},"modified":"2024-01-31T14:41:07","modified_gmt":"2024-01-31T14:41:07","slug":"svt-hevc","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/svt-hevc\/","title":{"rendered":"svt-hevc"},"content":{"rendered":"\n<p>A CPU multi-threaded video encoder for the HEVC\/H.265 format. There are six workloads with varying image sizes and tuning levels. Looks like the first one takes a majority of the time. The number of runable threads is more than the number of cores.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-95.png\" alt=\"\" class=\"wp-image-1198\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-95.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-95-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-95-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows a high retirement rate and moderate backend stalls.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-132.png\" alt=\"\" class=\"wp-image-1199\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-132.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-132-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-132-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics shows heavy floating point code on all cores.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              1259.755\non_cpu               0.894          # 14.31 \/ 16 cores\nutime                17926.743\nstime                96.360\nnvcsw                3023734        # 56.02%\nnivcsw               2373961        # 43.98%\ninblock              8              # 0.01\/sec\nonblock              16312          # 12.95\/sec\ncpu-clock            18024854334710 # 18024.854 seconds\ntask-clock           18025451003454 # 18025.451 seconds\npage faults          18145259       # 1006.647\/sec\ncontext switches     5403743        # 299.784\/sec\ncpu migrations       1035050        # 57.422\/sec\nmajor page faults    51             # 0.003\/sec\nminor page faults    18145208       # 1006.644\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             9413356258625  # 69.028 branches per 1000 inst\nbranch misses        111056551906   # 1.18% branch miss\nconditional          7408495346760  # 54.327 conditional branches per 1000 inst\nindirect             485701895939   # 3.562 indirect branches per 1000 inst\ncpu-cycles           68265816209878 # 3.37 GHz\ninstructions         136364879972170 # 2.00 IPC\nslots                136542772719132 #\nretiring             46223875343055 # 33.9% (52.9%)\n-- ucode             640590595277   #     0.5%\n-- fastpath          45583284747778 #    33.4%\nfrontend             11418441274150 #  8.4% (13.1%)\n-- latency           6554995901088  #     4.8%\n-- bandwidth         4863445373062  #     3.6%\nbackend              28191878898174 # 20.6% (32.3%)\n-- cpu               10511682562640 #     7.7%\n-- memory            17680196335534 #    12.9%\nspeculation          1574332691318  #  1.2% ( 1.8%)\n-- branch mispredict 1553011976213  #     1.1%\n-- pipeline restart  21320715105    #     0.0%\nsmt-contention       49133984298112 # 36.0% ( 0.0%)\ncpu-cycles           68297851033396 # 3.38 GHz\ninstructions         136384828545239 # 2.00 IPC\ninstructions         45460259352217 # 31.719 l2 access per 1000 inst\nl2 hit from l1       1274189877825  # 8.19% l2 miss\nl2 miss from l1      64145422231    #\nl2 hit from l2 pf    113751844215   #\nl3 hit from l2 pf    36331843006    #\nl3 miss from l2 pf   17683150389    #\ninstructions         45445500425374 # 379.998 float per 1000 inst\nfloat 512            83             # 0.000 AVX-512 per 1000 inst\nfloat 256            624            # 0.000 AVX-256 per 1000 inst\nfloat 128            17269177369087 # 379.998 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              1714.719\non_cpu               0.897          # 14.35 \/ 16 cores\nutime                24468.892\nstime                141.290\nnvcsw                5243479        # 57.73%\nnivcsw               3839473        # 42.27%\ninblock              18455616       # 10763.06\/sec\nonblock              4168           # 2.43\/sec\ncpu-clock            24613592672295 # 24613.593 seconds\ntask-clock           24614542400908 # 24614.542 seconds\npage faults          29127127       # 1183.330\/sec\ncontext switches     9091236        # 369.344\/sec\ncpu migrations       2204341        # 89.554\/sec\nmajor page faults    1241           # 0.050\/sec\nminor page faults    29125886       # 1183.280\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             10015279821062 # 69.321 branches per 1000 inst\nbranch misses        136094215113   # 1.36% branch miss\nconditional          10015279994502 # 69.321 conditional branches per 1000 inst\nindirect             3865799930109  # 26.757 indirect branches per 1000 inst\nslots                106053798475136 #\nretiring             75182747859237 # 70.9% (70.9%) high\n-- ucode             4695907390746  #     4.4%\n-- fastpath          70486840468491 #    66.5%\nfrontend             17435557928218 # 16.4% (16.4%)\n-- latency           9242914843772  #     8.7%\n-- bandwidth         8192643084446  #     7.7%\nbackend              7672329946790  #  7.2% ( 7.2%) low\n-- cpu               2660849750913  #     2.5%\n-- memory            5011480195877  #     4.7%\nspeculation          5907766413758  #  5.6% ( 5.6%)\n-- branch mispredict 5653188720079  #     5.3%\n-- pipeline restart  254577693679   #     0.2%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           71536986817867 # 2.55 GHz\ninstructions         152525685987104 # 2.13 IPC\nl2 access            1950062386528  # 25.591 l2 access per 1000 inst\nl2 miss              370887655155   # 19.02% l2 miss\n<\/code><\/pre>\n\n\n\n<p>Process overview<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>3850 processes\n\t3474 SvtHevcEncApp        1732270.96  9158.11\n\t 68 clinfo                  16.85     5.80\n\t 38 vulkaninfo               1.14     0.96\n\t  6 php                      0.13     0.23\n\t  4 vulkani:disk$0           0.12     0.10\n\t  6 glxinfo:gdrv0            0.11     0.07\n\t  6 glxinfo:gl0              0.11     0.07\n\t  6 clang                    0.06     0.06\n\t  2 llvmpipe-0               0.06     0.05\n\t  2 llvmpipe-1               0.06     0.05\n\t  2 llvmpipe-10              0.06     0.05\n\t  2 llvmpipe-11              0.06     0.05\n\t  2 llvmpipe-12              0.06     0.05\n\t  2 llvmpipe-13              0.06     0.05\n\t  2 llvmpipe-14              0.06     0.05\n\t  2 llvmpipe-15              0.06     0.05\n\t  2 llvmpipe-2               0.06     0.05\n\t  2 llvmpipe-3               0.06     0.05\n\t  2 llvmpipe-4               0.06     0.05\n\t  2 llvmpipe-5               0.06     0.05\n\t  2 llvmpipe-6               0.06     0.05\n\t  2 llvmpipe-7               0.06     0.05\n\t  2 llvmpipe-8               0.06     0.05\n\t  2 llvmpipe-9               0.06     0.05\n\t  2 glxinfo                  0.05     0.03\n\t  2 glxinfo:cs0              0.05     0.03\n\t  2 glxinfo:disk$0           0.05     0.03\n\t  2 glxinfo:sh0              0.05     0.03\n\t  2 glxinfo:shlo0            0.05     0.03\n\t  3 rocminfo                 0.03     0.00\n\t  1 lspci                    0.00     0.02\n\t 92 sh                       0.00     0.00\n\t 18 svt-hevc                 0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t 11 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  3 gmain                    0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 dconf worker             0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 ps                       0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n107 maximum processes\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>A CPU multi-threaded video encoder for the HEVC\/H.265 format. There are six workloads with varying image sizes and tuning levels. Looks like the first one takes a majority of the time. The number of runable threads is more than the <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/svt-hevc\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-1141","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1141","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=1141"}],"version-history":[{"count":3,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1141\/revisions"}],"predecessor-version":[{"id":1200,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/1141\/revisions\/1200"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=1141"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}