{"id":510,"date":"2024-01-13T21:43:54","date_gmt":"2024-01-13T21:43:54","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=510"},"modified":"2024-01-14T00:21:08","modified_gmt":"2024-01-14T00:21:08","slug":"graphics-magick","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/graphics-magick\/","title":{"rendered":"graphics-magick"},"content":{"rendered":"\n<p>OpenMP implementation that performs imaging tests. This workload has seven tests. Here is a case where Intel does better on two tests and worse on the other five.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-11.png\" alt=\"\" class=\"wp-image-514\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-11.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-11-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-11-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Overall metrics shows a fairly high retirement rate that gets limited by backend CPU depending on the test. Backend memory is not as much of an issue.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-49.png\" alt=\"\" class=\"wp-image-516\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-49.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-49-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-49-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics show reasonable floating point code and not as many branches.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              1644.398\non_cpu               0.678          # 10.85 \/ 16 cores\nutime                17308.374\nstime                533.184\nnvcsw                210800         # 47.67%\nnivcsw               231435         # 52.33%\ninblock              0              # 0.00\/sec\nonblock              16448          # 10.00\/sec\ncpu-clock            17840403316898 # 17840.403 seconds\ntask-clock           17840962416578 # 17840.962 seconds\npage faults          252232595      # 14137.836\/sec\ncontext switches     450222         # 25.235\/sec\ncpu migrations       1361           # 0.076\/sec\nmajor page faults    7744           # 0.434\/sec\nminor page faults    252224851      # 14137.402\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             12792964509200 # 102.333 branches per 1000 inst\nbranch misses        150169852450   # 1.17% branch miss\nconditional          9860363999725  # 78.875 conditional branches per 1000 inst\nindirect             568740984297   # 4.549 indirect branches per 1000 inst\ncpu-cycles           61383485116311 # 2.76 GHz\ninstructions         104201051892088 # 1.70 IPC\nslots                122743660326036 #\nretiring             37571862517062 # 30.6% (49.6%)\n-- ucode             28373669760    #     0.0%\n-- fastpath          37543488847302 #    30.6%\nfrontend             4144904451957  #  3.4% ( 5.5%)\n-- latency           3064929197046  #     2.5%\n-- bandwidth         1079975254911  #     0.9%\nbackend              32869577134221 # 26.8% (43.4%)\n-- cpu               27447483662627 #    22.4%\n-- memory            5422093471594  #     4.4%\nspeculation          1234412203853  #  1.0% ( 1.6%)\n-- branch mispredict 1212655803309  #     1.0%\n-- pipeline restart  21756400544    #     0.0%\nsmt-contention       46922793804825 # 38.2% ( 0.0%)\ncpu-cycles           61361382873177 # 2.76 GHz\ninstructions         104032576646593 # 1.70 IPC\ninstructions         34674101421249 # 5.491 l2 access per 1000 inst\nl2 hit from l1       115093256997   # 24.08% l2 miss\nl2 miss from l1      20719036688    #\nl2 hit from l2 pf    50157748862    #\nl3 hit from l2 pf    15765270503    #\nl3 miss from l2 pf   9361897652     #\ninstructions         34673981783248 # 297.334 float per 1000 inst\nfloat 512            72             # 0.000 AVX-512 per 1000 inst\nfloat 256            366            # 0.000 AVX-256 per 1000 inst\nfloat 128            10309748438666 # 297.334 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              2541.963\non_cpu               0.664          # 10.62 \/ 16 cores\nutime                26094.262\nstime                908.881\nnvcsw                393269         # 53.97%\nnivcsw               335418         # 46.03%\ninblock              194600         # 76.55\/sec\nonblock              5536           # 2.18\/sec\ncpu-clock            27000229963741 # 27000.230 seconds\ntask-clock           27000708582783 # 27000.709 seconds\npage faults          560961765      # 20775.816\/sec\ncontext switches     741142         # 27.449\/sec\ncpu migrations       93087          # 3.448\/sec\nmajor page faults    11609          # 0.430\/sec\nminor page faults    560950155      # 20775.386\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             20986332656634 # 124.239 branches per 1000 inst\nbranch misses        60224093511    # 0.29% branch miss\nconditional          20986332688570 # 124.239 conditional branches per 1000 inst\nindirect             4793102645941  # 28.375 indirect branches per 1000 inst\nslots                91805315111198 #\nretiring             58807660233086 # 64.1% (64.1%)\n-- ucode             6235444695905  #     6.8%\n-- fastpath          52572215537181 #    57.3%\nfrontend             12103639365990 # 13.2% (13.2%)\n-- latency           10394033274777 #    11.3%\n-- bandwidth         1709606091213  #     1.9%\nbackend              17605664593201 # 19.2% (19.2%)\n-- cpu               13055563780626 #    14.2%\n-- memory            4550100812575  #     5.0%\nspeculation          3311541093858  #  3.6% ( 3.6%)\n-- branch mispredict 3145341811481  #     3.4%\n-- pipeline restart  166199282377   #     0.2%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           53809972855771 # 2.22 GHz\ninstructions         104620904496198 # 1.94 IPC\nl2 access            405500583142   # 7.270 l2 access per 1000 inst\nl2 miss              186008133845   # 45.87% l2 miss\n<\/code><\/pre>\n\n\n\n<p>Process structure shows the gm process is the workhorse.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>666 processes\n\t291 gm                   228017.81  6656.12\n\t 68 clinfo                  16.54     5.99\n\t 38 vulkaninfo               0.95     1.33\n\t  6 glxinfo:gdrv0            0.16     0.07\n\t  6 php                      0.15     0.22\n\t  4 vulkani:disk$0           0.10     0.14\n\t  2 glxinfo                  0.07     0.03\n\t  2 glxinfo:cs0              0.07     0.03\n\t  2 glxinfo:disk$0           0.07     0.03\n\t  2 glxinfo:sh0              0.07     0.03\n\t  2 glxinfo:shlo0            0.07     0.03\n\t  6 clang                    0.06     0.05\n\t  2 llvmpipe-0               0.05     0.07\n\t  2 llvmpipe-1               0.05     0.07\n\t  2 llvmpipe-10              0.05     0.07\n\t  2 llvmpipe-11              0.05     0.07\n\t  2 llvmpipe-12              0.05     0.07\n\t  2 llvmpipe-13              0.05     0.07\n\t  2 llvmpipe-14              0.05     0.07\n\t  2 llvmpipe-15              0.05     0.07\n\t  2 llvmpipe-2               0.05     0.07\n\t  2 llvmpipe-3               0.05     0.07\n\t  2 llvmpipe-4               0.05     0.07\n\t  2 llvmpipe-5               0.05     0.07\n\t  2 llvmpipe-6               0.05     0.07\n\t  2 llvmpipe-7               0.05     0.07\n\t  2 llvmpipe-8               0.05     0.07\n\t  2 llvmpipe-9               0.05     0.07\n\t  3 rocminfo                 0.00     0.03\n\t  1 lspci                    0.00     0.02\n\t 94 sh                       0.00     0.00\n\t 21 graphics-magick          0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t  9 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  4 gmain                    0.00     0.00\n\t  3 dconf worker             0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 ps                       0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>We seem to start one thread on each core.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      98515) graphics-magick  cpu=14 start=6.18  finish=66.22\n        98516) gm               cpu=7 start=6.18  finish=66.22\n          98517) gm               cpu=15 start=6.19  finish=66.22\n          98518) gm               cpu=2 start=6.19  finish=66.22\n          98519) gm               cpu=12 start=6.19  finish=66.22\n          98520) gm               cpu=5 start=6.19  finish=66.22\n          98521) gm               cpu=4 start=6.19  finish=66.22\n          98522) gm               cpu=1 start=6.19  finish=66.22\n          98523) gm               cpu=0 start=6.19  finish=66.22\n          98524) gm               cpu=14 start=6.19  finish=66.22\n          98525) gm               cpu=10 start=6.19  finish=66.22\n          98526) gm               cpu=11 start=6.19  finish=66.22\n          98527) gm               cpu=8 start=6.19  finish=66.22\n          98528) gm               cpu=3 start=6.19  finish=66.22\n          98529) gm               cpu=13 start=6.19  finish=66.22\n          98530) gm               cpu=9 start=6.19  finish=66.22\n          98531) gm               cpu=6 start=6.19  finish=66.22<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>OpenMP implementation that performs imaging tests. This workload has seven tests. Here is a case where Intel does better on two tests and worse on the other five. Overall metrics shows a fairly high retirement rate that gets limited by <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/graphics-magick\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-510","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/510","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=510"}],"version-history":[{"count":2,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/510\/revisions"}],"predecessor-version":[{"id":517,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/510\/revisions\/517"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=510"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}