{"id":196,"date":"2024-01-04T01:00:33","date_gmt":"2024-01-04T01:00:33","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=196"},"modified":"2024-01-04T01:00:34","modified_gmt":"2024-01-04T01:00:34","slug":"gimp","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/gimp\/","title":{"rendered":"gimp"},"content":{"rendered":"\n<p>Benchmarking an open source image manipulation program. You can see four different tests with slightly different profiles.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-9.png\" alt=\"\" class=\"wp-image-197\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-9.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-9-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-9-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>This looks like mostly a single-threaded program. This accounts for a low amount of on-cpu time. Somewhat branchy code with a moderate amount of floating point. Backend and particularly memory access seems to take largest share of time.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              262.274\non_cpu               0.121          # 1.94 \/ 16 cores\nutime                366.606\nstime                141.072\nnvcsw                4489040        # 99.83%\nnivcsw               7840           # 0.17%\ninblock              262848         # 1002.19\/sec\nonblock              2932088        # 11179.48\/sec\ncpu-clock            504118685148   # 504.119 seconds\ntask-clock           505485321404   # 505.485 seconds\npage faults          50883528       # 100662.721\/sec\ncontext switches     4497375        # 8897.143\/sec\ncpu migrations       10959          # 21.680\/sec\nmajor page faults    729            # 1.442\/sec\nminor page faults    50882799       # 100661.279\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             361656385246   # 143.514 branches per 1000 inst\nbranch misses        13952901330    # 3.86% branch miss\nconditional          267322482059   # 106.080 conditional branches per 1000 inst\nindirect             7525360183     # 2.986 indirect branches per 1000 inst\ncpu-cycles           2188314398268  # 0.53 GHz\ninstructions         2509486134649  # 1.15 IPC\nslots                4384834911768  #\nretiring             844543083364   # 19.3% (20.6%)\n-- ucode             2508221097     #     0.1%\n-- fastpath          842034862267   #    19.2%\nfrontend             985978905389   # 22.5% (24.0%)\n-- latency           717473756388   #    16.4%\n-- bandwidth         268505149001   #     6.1%\nbackend              2164748146131  # 49.4% (52.7%)\n-- cpu               542851638476   #    12.4%\n-- memory            1621896507655  #    37.0%\nspeculation          113713583073   #  2.6% ( 2.8%)\n-- branch mispredict 107823545927   #     2.5%\n-- pipeline restart  5890037146     #     0.1%\nsmt-contention       275354088418   #  6.3% ( 0.0%)\ncpu-cycles           2177473596639  # 0.53 GHz\ninstructions         2498720982907  # 1.15 IPC\ninstructions         833123582177   # 23.471 l2 access per 1000 inst\nl2 hit from l1       14295187339    # 21.63% l2 miss\nl2 miss from l1      1897261648     #\nl2 hit from l2 pf    2926997800     #\nl3 hit from l2 pf    808079008      #\nl3 miss from l2 pf   1523752563     #\ninstructions         835136271658   # 168.563 float per 1000 inst\nfloat 512            293            # 0.000 AVX-512 per 1000 inst\nfloat 256            276            # 0.000 AVX-256 per 1000 inst\nfloat 128            140773377324   # 168.563 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              323.082\non_cpu               0.131          # 2.10 \/ 16 cores\nutime                531.315\nstime                146.991\nnvcsw                10503543       # 99.85%\nnivcsw               15479          # 0.15%\ninblock              8              # 0.02\/sec\nonblock              2931888        # 9074.76\/sec\ncpu-clock            670147573846   # 670.148 seconds\ntask-clock           671777403316   # 671.777 seconds\npage faults          50351069       # 74952.013\/sec\ncontext switches     10519619       # 15659.382\/sec\ncpu migrations       30361          # 45.195\/sec\nmajor page faults    1              # 0.001\/sec\nminor page faults    50351068       # 74952.012\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             342678860344   # 135.988 branches per 1000 inst\nbranch misses        3385141914     # 0.99% branch miss\nconditional          342679096184   # 135.988 conditional branches per 1000 inst\nindirect             49044340836    # 19.463 indirect branches per 1000 inst\nslots                6042017136368  #\nretiring             2057791788255  # 34.1% (34.1%)\n-- ucode             206973543335   #     3.4%\n-- fastpath          1850818244920  #    30.6%\nfrontend             869596524785   # 14.4% (14.4%)\n-- latency           441086010050   #     7.3%\n-- bandwidth         428510514735   #     7.1%\nbackend              2531687347647  # 41.9% (41.9%)\n-- cpu               625305575545   #    10.3%\n-- memory            1906381772102  #    31.6%\nspeculation          609068467938   # 10.1% (10.1%)\n-- branch mispredict 488700891743   #     8.1%\n-- pipeline restart  120367576195   #     2.0%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           3264854609930  # 0.61 GHz\ninstructions         4152760303612  # 1.27 IPC\nl2 access            47561932109    # 20.313 l2 access per 1000 inst\nl2 miss              20055987975    # 42.17% l2 miss\n<\/code><\/pre>\n\n\n\n<p>The process profile has a large number of &#8220;worker&#8221; threads. These seem to be launched in parallel but not on the CPU at the same time.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>7389 processes\n\t5580 worker                5202.05  1937.17\n\t420 gdbus                  348.48   129.47\n\t425 gmain                  348.39   129.44\n\t 36 gimp                   292.85    67.87\n\t 12 async                  292.80    67.83\n\t360 file-jpeg               53.88    61.30\n\t 12 bzip2                   11.24     0.14\n\t 12 async-ind                1.99     1.98\n\t 12 xz                       1.94     0.15\n\t 36 script-fu                1.62     0.24\n\t 38 vulkaninfo               0.75     1.14\n\t  6 glxinfo:gdrv0            0.09     0.07\n\t  6 php                      0.08     0.13\n\t  4 vulkani:disk$0           0.08     0.12\n\t  2 glxinfo                  0.06     0.03\n\t  2 glxinfo:cs0              0.05     0.03\n\t  2 glxinfo:disk$0           0.05     0.03\n\t  2 glxinfo:sh0              0.05     0.03\n\t  2 glxinfo:shlo0            0.05     0.03\n\t  2 llvmpipe-0               0.04     0.06\n\t  2 llvmpipe-1               0.04     0.06\n\t  2 llvmpipe-10              0.04     0.06\n\t  2 llvmpipe-11              0.04     0.06\n\t  2 llvmpipe-12              0.04     0.06\n\t  2 llvmpipe-13              0.04     0.06\n\t  2 llvmpipe-14              0.04     0.06\n\t  2 llvmpipe-15              0.04     0.06\n\t  2 llvmpipe-2               0.04     0.06\n\t  2 llvmpipe-3               0.04     0.06\n\t  2 llvmpipe-4               0.04     0.06\n\t  2 llvmpipe-5               0.04     0.06\n\t  2 llvmpipe-6               0.04     0.06\n\t  2 llvmpipe-7               0.04     0.06\n\t  2 llvmpipe-8               0.04     0.06\n\t  2 llvmpipe-9               0.04     0.06\n\t  6 clang                    0.03     0.07\n\t 12 rawtherapee              0.03     0.02\n\t 24 tar                      0.02     1.72\n\t 12 swap writer              0.00   292.80\n\t 12 &#91;pango] FcInit           0.00     2.09\n\t  1 lspci                    0.00     0.03\n\t107 sh                       0.00     0.00\n\t 36 file-darktable           0.00     0.00\n\t 16 bash                     0.00     0.00\n\t 16 rm                       0.00     0.00\n\t 12 awk                      0.00     0.00\n\t 12 file-glob                0.00     0.00\n\t 12 file-heif                0.00     0.00\n\t 12 file-rawtherape          0.00     0.00\n\t 12 gcc                      0.00     0.00\n\t 12 head                     0.00     0.00\n\t  9 stty                     0.00     0.00\n\t  8 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  3 dconf worker             0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 cc                       0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 ps                       0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Benchmarking an open source image manipulation program. You can see four different tests with slightly different profiles. This looks like mostly a single-threaded program. This accounts for a low amount of on-cpu time. Somewhat branchy code with a moderate amount <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/gimp\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-196","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/196","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=196"}],"version-history":[{"count":1,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/196\/revisions"}],"predecessor-version":[{"id":198,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/196\/revisions\/198"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=196"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}