{"id":2264,"date":"2024-06-01T08:27:06","date_gmt":"2024-06-01T08:27:06","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=2264"},"modified":"2024-06-01T08:45:33","modified_gmt":"2024-06-01T08:45:33","slug":"gegl","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/gegl\/","title":{"rendered":"gegl"},"content":{"rendered":"\n<p>Generic graphics library, used by GIMP and applications like GNOME photos with nine different operations. Looks mostly single-threaded with small regions of parallel operation.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-1.png\" alt=\"\" class=\"wp-image-2265\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-1.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-1-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/systemtime-1-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows differences with workloads including some with surprising numbers of branch stalls.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-1.png\" alt=\"\" class=\"wp-image-2266\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-1.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-1-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/06\/amdtopdown-1-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show only ~2 cores, moderate floating point and some memory stalls and frontend latency.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              1299.526\non_cpu               0.123          # 1.96 \/ 16 cores\nutime                1684.132\nstime                865.189\nnvcsw                14504168       # 99.47%\nnivcsw               76810          # 0.53%\ninblock              472            # 0.36\/sec\nonblock              25580472       # 19684.47\/sec\ncpu-clock            2542914192252  # 2542.914 seconds\ntask-clock           2547801640165  # 2547.802 seconds\npage faults          69417327       # 27245.970\/sec\ncontext switches     14586272       # 5725.042\/sec\ncpu migrations       28916          # 11.349\/sec\nmajor page faults    2220           # 0.871\/sec\nminor page faults    69415107       # 27245.099\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             2308099026992  # 154.155 branches per 1000 inst\nbranch misses        132254282252   # 5.73% branch miss\nconditional          1758200020186  # 117.428 conditional branches per 1000 inst\nindirect             41308870165    # 2.759 indirect branches per 1000 inst\ncpu-cycles           10462124659789 # 0.50 GHz\ninstructions         14863121118375 # 1.42 IPC\nslots                21295959429906 #\nretiring             5069339259379  # 23.8% (25.6%)\n-- ucode             13818688365    #     0.1%\n-- fastpath          5055520571014  #    23.7%\nfrontend             6552525067104  # 30.8% (33.0%)\n-- latency           4872044219046  #    22.9%\n-- bandwidth         1680480848058  #     7.9%\nbackend              6746815575381  # 31.7% (34.0%)\n-- cpu               1716964005492  #     8.1%\n-- memory            5029851569889  #    23.6%\nspeculation          1456392663684  #  6.8% ( 7.3%)\n-- branch mispredict 1436077332204  #     6.7%\n-- pipeline restart  20315331480    #     0.1%\nsmt-contention       1468796759278  #  6.9% ( 0.0%)\ncpu-cycles           10463773934761 # 0.50 GHz\ninstructions         14873614224957 # 1.42 IPC\ninstructions         4977429685254  # 20.651 l2 access per 1000 inst\nl2 hit from l1       69830506371    # 6.94% l2 miss\nl2 miss from l1      4094424247     #\nl2 hit from l2 pf    29917464683    #\nl3 hit from l2 pf    1040649343     #\nl3 miss from l2 pf   1999101580     #\ninstructions         4983393093725  # 68.287 float per 1000 inst\nfloat 512            236            # 0.000 AVX-512 per 1000 inst\nfloat 256            588            # 0.000 AVX-256 per 1000 inst\nfloat 128            340302305646   # 68.287 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\ninstructions         11764565952558 #\nopcache              2579613405100  # 219.270 opcache per 1000 inst\nopcache miss         298925598456   # 11.6% opcache miss rate\nl1 dTLB miss         13219511975    # 1.124 L1 dTLB per 1000 inst\nl2 dTLB miss         844253850      # 0.072 L2 dTLB per 1000 inst\ninstructions         14934018704737 #\nicache               850810850683   # 56.971 icache per 1000 inst\nicache miss          32271720763    #  3.8% icache miss rate\nl1 iTLB miss         2431998815     # 0.163 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            11032788       # 0.001 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics show higher branch misprediction.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              1401.041\non_cpu               0.128          # 2.05 \/ 16 cores\nutime                1972.594\nstime                906.348\nnvcsw                36522641       # 98.90%\nnivcsw               406132         # 1.10%\ninblock              71376          # 50.94\/sec\nonblock              24186208       # 17263.02\/sec\ncpu-clock            2856527054590  # 2856.527 seconds\ntask-clock           2862881732391  # 2862.882 seconds\npage faults          66834847       # 23345.305\/sec\ncontext switches     36934938       # 12901.315\/sec\ncpu migrations       68489          # 23.923\/sec\nmajor page faults    1735           # 0.606\/sec\nminor page faults    66833112       # 23344.699\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             2440547647375  # 150.034 branches per 1000 inst\nbranch misses        50612278990    # 2.07% branch miss\nconditional          2440548597519  # 150.034 conditional branches per 1000 inst\nindirect             346999416185   # 21.332 indirect branches per 1000 inst\nslots                32977774931438 #\nretiring             12575828621225 # 38.1% (38.1%)\n-- ucode             933363109249   #     2.8%\n-- fastpath          11642465511976 #    35.3%\nfrontend             5511408969044  # 16.7% (16.7%)\n-- latency           2281620609717  #     6.9%\n-- bandwidth         3229788359327  #     9.8%\nbackend              7993444608111  # 24.2% (24.2%)\n-- cpu               3686834556596  #    11.2%\n-- memory            4306610051515  #    13.1%\nspeculation          6983503569301  # 21.2% (21.2%) high\n-- branch mispredict 6736445909989  #    20.4%\n-- pipeline restart  247057659312   #     0.7%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           9444859932545  # 0.39 GHz\ninstructions         17878506070144 # 1.89 IPC\nl2 access            182618165893   # 12.583 l2 access per 1000 inst\nl2 miss              26117313118    # 14.30% l2 miss\ncpu-cycles           7566300983596  # 24.7% memory latency\nload stalls          1710512533932  #  7.7% l1 bound\nl1 miss              1130422828617  #  6.4% l2 bound\nl2 miss              643690694173   #  7.4% l3 bound\nl3 miss              80953651660    #  1.1% dram bound\nstore_stalls         155326487647   #  2.1% store bound\n<\/code><\/pre>\n\n\n\n<p>Process summary shows time spent in both gegl and worker processes.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>23701 processes\n\t15147 gegl                 23527.85 11046.31\n\t6480 worker               21431.51  9087.18\n\t434 gmain                 1429.63   607.01\n\t432 gdbus                 1429.63   607.01\n\t 68 clinfo                  15.87     6.66\n\t 38 vulkaninfo               1.33     1.14\n\t  4 vulkani:disk$0           0.14     0.12\n\t  6 php                      0.12     0.24\n\t  2 llvmpipe-0               0.07     0.06\n\t  2 llvmpipe-1               0.07     0.06\n\t  2 llvmpipe-10              0.07     0.06\n\t  2 llvmpipe-11              0.07     0.06\n\t  2 llvmpipe-12              0.07     0.06\n\t  2 llvmpipe-13              0.07     0.06\n\t  2 llvmpipe-14              0.07     0.06\n\t  2 llvmpipe-15              0.07     0.06\n\t  2 llvmpipe-2               0.07     0.06\n\t  2 llvmpipe-3               0.07     0.06\n\t  2 llvmpipe-4               0.07     0.06\n\t  2 llvmpipe-5               0.07     0.06\n\t  2 llvmpipe-6               0.07     0.06\n\t  2 llvmpipe-7               0.07     0.06\n\t  2 llvmpipe-8               0.07     0.06\n\t  2 llvmpipe-9               0.07     0.06\n\t  6 clang                    0.04     0.08\n\t  3 rocminfo                 0.03     0.00\n\t432 swap writer              0.00  1428.11\n\t432 &#91;pango] FcInit           0.00    98.85\n\t  1 lspci                    0.00     0.02\n\t 99 sh                       0.00     0.00\n\t 13 gsettings                0.00     0.00\n\t 12 gcc                      0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 glxinfo                  0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  2 grep                     0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 setterm                  0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  1 cc                       0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dconf worker             0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 ps                       0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Example of a computation block with many short-run processes.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>        470964) gegl             cpu=0 start=6.48  finish=6.91 \n          470966) worker           cpu=11 start=6.51  finish=6.91 \n          470967) worker           cpu=6 start=6.51  finish=6.91 \n          470968) worker           cpu=7 start=6.51  finish=6.91 \n          470969) worker           cpu=2 start=6.51  finish=6.91 \n          470970) worker           cpu=5 start=6.51  finish=6.91 \n          470971) worker           cpu=1 start=6.51  finish=6.91 \n          470972) worker           cpu=3 start=6.51  finish=6.91 \n          470973) worker           cpu=4 start=6.51  finish=6.91 \n          470974) worker           cpu=8 start=6.51  finish=6.91 \n          470975) worker           cpu=15 start=6.51  finish=6.91 \n          470976) worker           cpu=10 start=6.51  finish=6.91 \n          470977) worker           cpu=13 start=6.51  finish=6.91 \n          470978) worker           cpu=14 start=6.51  finish=6.91 \n          470979) worker           cpu=12 start=6.51  finish=6.91 \n          470980) worker           cpu=9 start=6.51  finish=6.91 \n          470981) gegl             cpu=4 start=6.53  finish=6.53 \n          470982) gegl             cpu=0 start=6.53  finish=6.53 \n          470983) gegl             cpu=5 start=6.53  finish=6.53 \n          470984) gegl             cpu=3 start=6.53  finish=6.53 \n          470985) gegl             cpu=15 start=6.53  finish=6.53 \n          470986) gegl             cpu=1 start=6.53  finish=6.53 \n          470987) gegl             cpu=10 start=6.53  finish=6.53 \n          470988) gegl             cpu=14 start=6.53  finish=6.53 \n          470989) gegl             cpu=12 start=6.53  finish=6.53 \n          470990) gegl             cpu=13 start=6.53  finish=6.53 \n          470991) gegl             cpu=8 start=6.53  finish=6.53 \n          470992) gegl             cpu=9 start=6.53  finish=6.53 \n          470993) gegl             cpu=11 start=6.53  finish=6.53 \n          470994) gegl             cpu=6 start=6.53  finish=6.53 \n          470995) gegl             cpu=7 start=6.53  finish=6.53 \n          470996) gegl             cpu=1 start=6.60  finish=6.91 \n          470997) gegl             cpu=12 start=6.60  finish=6.91 \n          470998) gegl             cpu=13 start=6.60  finish=6.91 \n          470999) gegl             cpu=3 start=6.60  finish=6.91 \n          471000) gegl             cpu=6 start=6.60  finish=6.91 \n          471001) gegl             cpu=7 start=6.60  finish=6.91 \n          471002) gegl             cpu=2 start=6.60  finish=6.91 \n          471003) gegl             cpu=8 start=6.60  finish=6.91 \n          471004) gegl             cpu=9 start=6.60  finish=6.91 \n          471005) gegl             cpu=4 start=6.60  finish=6.91 \n          471006) gegl             cpu=5 start=6.60  finish=6.91 \n          471007) gegl             cpu=11 start=6.60  finish=6.91 \n          471008) gegl             cpu=15 start=6.60  finish=6.91 \n          471009) gegl             cpu=14 start=6.60  finish=6.91 \n          471010) gegl             cpu=10 start=6.60  finish=6.91 \n          471011) &#91;pango] FcInit   cpu=-1 start=6.64  finish=6.66 \n          471012) gegl             cpu=0 start=6.66  finish=6.66 \n          471013) gegl             cpu=0 start=6.66  finish=6.66 \n          471014) gegl             cpu=0 start=6.66  finish=6.66 \n          471015) gegl             cpu=9 start=6.66  finish=6.66 \n          471016) gmain            cpu=1 start=6.66  finish=6.91 \n          471017) gdbus            cpu=2 start=6.66  finish=6.91 \n          471018) swap writer      cpu=-1 start=6.67  finish=6.91 \n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Generic graphics library, used by GIMP and applications like GNOME photos with nine different operations. Looks mostly single-threaded with small regions of parallel operation. Topdown profile shows differences with workloads including some with surprising numbers of branch stalls. AMD metrics <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/gegl\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2264","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2264","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=2264"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2264\/revisions"}],"predecessor-version":[{"id":2276,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2264\/revisions\/2276"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=2264"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}