{"id":2153,"date":"2024-03-22T10:15:41","date_gmt":"2024-03-22T10:15:41","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=2153"},"modified":"2024-03-23T09:58:34","modified_gmt":"2024-03-23T09:58:34","slug":"libraw","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/libraw\/","title":{"rendered":"libraw"},"content":{"rendered":"\n<p>A RAW image decoder library using one test that runs quickly on about half the cores.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-40.png\" alt=\"\" class=\"wp-image-2161\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-40.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-40-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-40-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows a higer retirement rate, some backend stalls and then frontend stalls.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-42.png\" alt=\"\" class=\"wp-image-2163\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-42.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-42-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-42-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show some floating point code and not many L2 accesses<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              92.773\non_cpu               0.245          # 3.92 \/ 16 cores\nutime                344.601\nstime                18.720\nnvcsw                8707           # 74.14%\nnivcsw               3037           # 25.86%\ninblock              0              # 0.00\/sec\nonblock              12976          # 139.87\/sec\ncpu-clock            363253670075   # 363.254 seconds\ntask-clock           363275710684   # 363.276 seconds\npage faults          10061284       # 27696.000\/sec\ncontext switches     11996          # 33.022\/sec\ncpu migrations       1849           # 5.090\/sec\nmajor page faults    4              # 0.011\/sec\nminor page faults    10061280       # 27695.989\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             371921917436   # 95.516 branches per 1000 inst\nbranch misses        11533532006    # 3.10% branch miss\nconditional          288187005963   # 74.011 conditional branches per 1000 inst\nindirect             11398445179    # 2.927 indirect branches per 1000 inst\ncpu-cycles           1510864437836  # 1.08 GHz\ninstructions         3884284187340  # 2.57 IPC\nslots                3024212308902  #\nretiring             1355578218166  # 44.8% (51.1%)\n-- ucode             886801671      #     0.0%\n-- fastpath          1354691416495  #    44.8%\nfrontend             403473766023   # 13.3% (15.2%)\n-- latency           276939449082   #     9.2%\n-- bandwidth         126534316941   #     4.2%\nbackend              667421081203   # 22.1% (25.2%)\n-- cpu               287734107756   #     9.5%\n-- memory            379686973447   #    12.6%\nspeculation          225291888102   #  7.4% ( 8.5%)\n-- branch mispredict 223143227858   #     7.4%\n-- pipeline restart  2148660244     #     0.1%\nsmt-contention       372445243153   # 12.3% ( 0.0%)\ncpu-cycles           1518221305031  # 1.09 GHz\ninstructions         3875946618563  # 2.55 IPC\ninstructions         1297207670686  # 11.784 l2 access per 1000 inst\nl2 hit from l1       8328408905     # 18.65% l2 miss\nl2 miss from l1      185058158      #\nl2 hit from l2 pf    4292229866     #\nl3 hit from l2 pf    342343388      #\nl3 miss from l2 pf   2322779786     #\ninstructions         1299139440391  # 73.167 float per 1000 inst\nfloat 512            53             # 0.000 AVX-512 per 1000 inst\nfloat 256            454            # 0.000 AVX-256 per 1000 inst\nfloat 128            95054465502    # 73.167 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\ninstructions         3890132649249  #\nopcache              546431242470   # 140.466 opcache per 1000 inst\nopcache miss         8678068146     #  1.6% opcache miss rate\nl1 dTLB miss         483224034      # 0.124 L1 dTLB per 1000 inst\nl2 dTLB miss         85475382       # 0.022 L2 dTLB per 1000 inst\ninstructions         3889800003726  #\nicache               19586198023    # 5.035 icache per 1000 inst\nicache miss          511702557      #  2.6% icache miss rate\nl1 iTLB miss         8762679        # 0.002 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            20375          # 0.000 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics confirm not much memory activity<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              110.052\non_cpu               0.276          # 4.41 \/ 16 cores\nutime                467.837\nstime                17.337\nnvcsw                8661           # 65.86%\nnivcsw               4490           # 34.14%\ninblock              624            # 5.67\/sec\nonblock              1688           # 15.34\/sec\ncpu-clock            485020107325   # 485.020 seconds\ntask-clock           485036129248   # 485.036 seconds\npage faults          10056637       # 20733.789\/sec\ncontext switches     13484          # 27.800\/sec\ncpu migrations       3667           # 7.560\/sec\nmajor page faults    0              # 0.000\/sec\nminor page faults    10056637       # 20733.789\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             368733785557   # 94.829 branches per 1000 inst\nbranch misses        10172337462    # 2.76% branch miss\nconditional          368733800853   # 94.829 conditional branches per 1000 inst\nindirect             76920101111    # 19.782 indirect branches per 1000 inst\nslots                4936643839022  #\nretiring             2524863572358  # 51.1% (51.1%)\n-- ucode             155834186802   #     3.2%\n-- fastpath          2369029385556  #    48.0%\nfrontend             547568286220   # 11.1% (11.1%)\n-- latency           258935604686   #     5.2%\n-- bandwidth         288632681534   #     5.8%\nbackend              1136515909022  # 23.0% (23.0%)\n-- cpu               990322802368   #    20.1%\n-- memory            146193106654   #     3.0%\nspeculation          858612251070   # 17.4% (17.4%) high\n-- branch mispredict 849565895371   #    17.2%\n-- pipeline restart  9046355699     #     0.2%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           1729873292296  # 0.83 GHz\ninstructions         4434644648060  # 2.56 IPC\nl2 access            21287223434    # 8.591 l2 access per 1000 inst\nl2 miss              10391816082    # 48.82% l2 miss\ncpu-cycles           967524598650   #  7.7% memory latency\nload stalls          65110295869    #  1.5% l1 bound\nl1 miss              50619430068    #  3.6% l2 bound\nl2 miss              16022667734    #  0.4% l3 bound\nl3 miss              12272259351    #  1.3% dram bound\nstore_stalls         8999117992     #  0.9% store bound\n<\/code><\/pre>\n\n\n\n<p>Process overviews gives name as postprocessing_<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>438 processes\n\t 75 postprocessing_       5494.88   281.28\n\t 68 clinfo                  17.84     7.98\n\t 38 vulkaninfo               1.52     1.32\n\t  4 vulkani:disk$0           0.16     0.14\n\t  6 glxinfo:gdrv0            0.12     0.08\n\t  6 glxinfo:gl0              0.12     0.08\n\t  2 llvmpipe-0               0.08     0.08\n\t  2 llvmpipe-1               0.08     0.08\n\t  2 llvmpipe-10              0.08     0.08\n\t  2 llvmpipe-11              0.08     0.08\n\t  2 llvmpipe-12              0.08     0.08\n\t  2 llvmpipe-13              0.08     0.08\n\t  2 llvmpipe-14              0.08     0.08\n\t  2 llvmpipe-15              0.08     0.08\n\t  2 llvmpipe-2               0.08     0.08\n\t  2 llvmpipe-3               0.08     0.08\n\t  2 llvmpipe-4               0.08     0.08\n\t  2 llvmpipe-5               0.08     0.08\n\t  2 llvmpipe-6               0.08     0.08\n\t  2 llvmpipe-7               0.08     0.08\n\t  2 llvmpipe-8               0.08     0.08\n\t  2 llvmpipe-9               0.08     0.08\n\t  6 php                      0.07     0.06\n\t  2 glxinfo                  0.06     0.04\n\t  2 glxinfo:cs0              0.06     0.04\n\t  2 glxinfo:disk$0           0.06     0.04\n\t  2 glxinfo:sh0              0.06     0.04\n\t  2 glxinfo:shlo0            0.06     0.04\n\t  6 clang                    0.04     0.08\n\t  3 rocminfo                 0.03     0.00\n\t  1 lspci                    0.01     0.02\n\t  1 ps                       0.00     0.01\n\t 82 sh                       0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t 12 gsettings                0.00     0.00\n\t 10 sed                      0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  3 gmain                    0.00     0.00\n\t  3 libraw                   0.00     0.00\n\t  3 ls                       0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dconf worker             0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Computation blocks<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      57005) libraw           cpu=5 start=5.88  finish=29.81\n        57006) postprocessing_  cpu=0 start=5.88  finish=29.80\n          57007) postprocessing_  cpu=3 start=5.88  finish=5.88 \n            57008) postprocessing_  cpu=4 start=5.88  finish=5.88 \n          57009) postprocessing_  cpu=15 start=5.89  finish=5.89 \n          57010) postprocessing_  cpu=2 start=5.89  finish=5.89 \n            57011) postprocessing_  cpu=14 start=5.89  finish=5.89 \n            57012) sed              cpu=0 start=5.89  finish=5.89 \n          57013) postprocessing_  cpu=14 start=5.89  finish=5.89 \n            57014) ls               cpu=3 start=5.89  finish=5.89 \n            57015) sed              cpu=4 start=5.89  finish=5.89 \n          57016) postprocessing_  cpu=2 start=5.89  finish=5.90 \n          57017) postprocessing_  cpu=15 start=5.90  finish=5.90 \n            57018) postprocessing_  cpu=0 start=5.90  finish=5.90 \n            57019) sed              cpu=10 start=5.90  finish=5.90 \n          57020) postprocessing_  cpu=9 start=6.36  finish=29.80\n          57021) postprocessing_  cpu=7 start=6.36  finish=29.80\n          57022) postprocessing_  cpu=5 start=6.36  finish=29.80\n          57023) postprocessing_  cpu=2 start=6.36  finish=29.80\n          57024) postprocessing_  cpu=13 start=6.36  finish=29.80\n          57025) postprocessing_  cpu=11 start=6.36  finish=29.80\n          57026) postprocessing_  cpu=3 start=6.36  finish=29.80\n          57027) postprocessing_  cpu=1 start=6.36  finish=29.80\n          57028) postprocessing_  cpu=4 start=6.36  finish=29.80\n          57029) postprocessing_  cpu=14 start=6.36  finish=29.80\n          57030) postprocessing_  cpu=6 start=6.36  finish=29.80\n          57031) postprocessing_  cpu=8 start=6.36  finish=29.80\n          57032) postprocessing_  cpu=15 start=6.36  finish=29.80\n          57033) postprocessing_  cpu=12 start=6.36  finish=29.80\n          57034) postprocessing_  cpu=10 start=6.36  finish=29.80\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>A RAW image decoder library using one test that runs quickly on about half the cores. Topdown profile shows a higer retirement rate, some backend stalls and then frontend stalls. AMD metrics show some floating point code and not many <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/libraw\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2153","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2153","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=2153"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2153\/revisions"}],"predecessor-version":[{"id":2164,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2153\/revisions\/2164"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=2153"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}