{"id":651,"date":"2024-01-17T11:37:17","date_gmt":"2024-01-17T11:37:17","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=651"},"modified":"2024-01-17T11:37:17","modified_gmt":"2024-01-17T11:37:17","slug":"opencv","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/opencv\/","title":{"rendered":"opencv"},"content":{"rendered":"\n<p>Eight different built-in benchmarks for opencv operations. Looks like a mixture of single-threaded and multi-core tests. Some of them take some time to stabilize so get run more than three times.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-30.png\" alt=\"\" class=\"wp-image-652\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-30.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-30-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/systemtime-30-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown shows a mixture of profiles depending on the test.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-67.png\" alt=\"\" class=\"wp-image-653\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-67.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-67-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/01\/amdtopdown-67-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics reflect mixture of on-core behavior. Generally floating point code with a small amount of L2 access and moderate branches.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              4652.674\non_cpu               0.387          # 6.20 \/ 16 cores\nutime                22478.001\nstime                6365.386\nnvcsw                4855478        # 73.46%\nnivcsw               1753786        # 26.54%\ninblock              8              # 0.00\/sec\nonblock              117920         # 25.34\/sec\ncpu-clock            28815667699825 # 28815.668 seconds\ntask-clock           28821382684850 # 28821.383 seconds\npage faults          307497285      # 10669.068\/sec\ncontext switches     6632219        # 230.115\/sec\ncpu migrations       37169          # 1.290\/sec\nmajor page faults    294            # 0.010\/sec\nminor page faults    307496991      # 10669.058\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             16403069524758 # 106.832 branches per 1000 inst\nbranch misses        828455317656   # 5.05% branch miss\nconditional          11923539266664 # 77.657 conditional branches per 1000 inst\nindirect             422732816500   # 2.753 indirect branches per 1000 inst\ncpu-cycles           134335559445133 # 1.71 GHz\ninstructions         177139632100759 # 1.32 IPC\nslots                268758248314950 #\nretiring             61647781137976 # 22.9% (29.1%)\n-- ucode             124149216346   #     0.0%\n-- fastpath          61523631921630 #    22.9%\nfrontend             54787629853380 # 20.4% (25.9%)\n-- latency           41293315615422 #    15.4%\n-- bandwidth         13494314237958 #     5.0%\nbackend              91911217413741 # 34.2% (43.4%)\n-- cpu               34999936609993 #    13.0%\n-- memory            56911280803748 #    21.2%\nspeculation          3311643758552  #  1.2% ( 1.6%)\n-- branch mispredict 3291586145649  #     1.2%\n-- pipeline restart  20057612903    #     0.0%\nsmt-contention       57099414604412 # 21.2% ( 0.0%)\ncpu-cycles           91565832999745 # 1.60 GHz\ninstructions         125015233884237 # 1.37 IPC\ninstructions         41670344100784 # 34.699 l2 access per 1000 inst\nl2 hit from l1       886460797566   # 20.52% l2 miss\nl2 miss from l1      64299878440    #\nl2 hit from l2 pf    327036437902   #\nl3 hit from l2 pf    175255051987   #\nl3 miss from l2 pf   57158618370    #\ninstructions         41599072769052 # 265.827 float per 1000 inst\nfloat 512            93             # 0.000 AVX-512 per 1000 inst\nfloat 256            4009590064     # 0.096 AVX-256 per 1000 inst\nfloat 128            11054155455194 # 265.731 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              8929.748\non_cpu               0.302          # 4.83 \/ 16 cores\nutime                35037.226\nstime                8114.568\nnvcsw                16144279       # 65.79%\nnivcsw               8396621        # 34.21%\ninblock              92520          # 10.36\/sec\nonblock              84432          # 9.46\/sec\ncpu-clock            42918505907879 # 42918.506 seconds\ntask-clock           42933876988368 # 42933.877 seconds\npage faults          789930954      # 18398.780\/sec\ncontext switches     24585261       # 572.631\/sec\ncpu migrations       714157         # 16.634\/sec\nmajor page faults    810            # 0.019\/sec\nminor page faults    789930144      # 18398.761\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             26769265019370 # 123.191 branches per 1000 inst\nbranch misses        171790003420   # 0.64% branch miss\nconditional          26769265226570 # 123.191 conditional branches per 1000 inst\nindirect             6811153187756  # 31.345 indirect branches per 1000 inst\nslots                316190860328552 #\nretiring             152939542929502 # 48.4% (48.4%)\n-- ucode             15947069922576 #     5.0%\n-- fastpath          136992473006926 #    43.3%\nfrontend             41215234315610 # 13.0% (13.0%)\n-- latency           23077500486912 #     7.3%\n-- bandwidth         18137733828698 #     5.7%\nbackend              105108579100965 # 33.2% (33.2%)\n-- cpu               48161090633115 #    15.2%\n-- memory            56947488467850 #    18.0%\nspeculation          20264503084589 #  6.4% ( 6.4%)\n-- branch mispredict 19084866553039 #     6.0%\n-- pipeline restart  1179636531550  #     0.4%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           169947932831273 # 0.98 GHz\ninstructions         307478565821240 # 1.81 IPC\nl2 access            4221508301051  # 24.988 l2 access per 1000 inst\nl2 miss              1732467749070  # 41.04% l2 miss\n<\/code><\/pre>\n\n\n\n<p>Process summary, this crashed partway through so not a full reflection. From the names, looks like a different executable for each test.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>5843 processes\n\t1200 opencv_perf_gap      1427508.34 348263.04\n\t3840 opencv_perf_vid      348468.47 88338.60\n\t 48 opencv_perf_img      67596.32 17327.20\n\t240 opencv_perf_obj      27428.01 12202.04\n\t 48 opencv_perf_dnn      18839.04  1405.28\n\t 48 opencv_perf_fea      18809.28  9131.84\n\t 48 opencv_perf_sti      14596.96  8618.88\n\t 64 opencv_perf_cor       8730.40  6199.36\n\t 34 clinfo                   9.59     3.66\n\t 19 vulkaninfo               0.95     0.38\n\t  2 vulkani:disk$0           0.10     0.04\n\t  6 clang                    0.09     0.03\n\t  3 glxinfo:gdrv0            0.07     0.06\n\t  1 llvmpipe-0               0.05     0.02\n\t  1 llvmpipe-1               0.05     0.02\n\t  1 llvmpipe-10              0.05     0.02\n\t  1 llvmpipe-11              0.05     0.02\n\t  1 llvmpipe-12              0.05     0.02\n\t  1 llvmpipe-13              0.05     0.02\n\t  1 llvmpipe-14              0.05     0.02\n\t  1 llvmpipe-15              0.05     0.02\n\t  1 llvmpipe-2               0.05     0.02\n\t  1 llvmpipe-3               0.05     0.02\n\t  1 llvmpipe-4               0.05     0.02\n\t  1 llvmpipe-5               0.05     0.02\n\t  1 llvmpipe-6               0.05     0.02\n\t  1 llvmpipe-7               0.05     0.02\n\t  1 llvmpipe-8               0.05     0.02\n\t  1 llvmpipe-9               0.05     0.02\n\t  1 glxinfo                  0.03     0.02\n\t  1 glxinfo:cs0              0.03     0.02\n\t  1 glxinfo:disk$0           0.03     0.02\n\t  1 glxinfo:sh0              0.03     0.02\n\t  1 glxinfo:shlo0            0.03     0.02\n\t  1 ps                       0.00     0.01\n\t 72 sh                       0.00     0.00\n\t 49 opencv                   0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  7 stat                     0.00     0.00\n\t  6 gsettings                0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 gmain                    0.00     0.00\n\t  4 dconf worker             0.00     0.00\n\t  4 phoronix-test-s          0.00     0.00\n\t  2 which                    0.00     0.00\n\t  1 cc                       0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lscpu                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n\t  1 xset                     0.00     0.00\n26 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>A typical launch pattern with one process per core.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      1945253) opencv           cpu=0 start=142.15 finish=208.66\n        1945254) opencv_perf_cor  cpu=9 start=142.15 finish=208.65\n          1945255) opencv_perf_cor  cpu=11 start=142.17 finish=208.65\n          1945256) opencv_perf_cor  cpu=12 start=142.17 finish=208.65\n          1945257) opencv_perf_cor  cpu=4 start=142.17 finish=208.65\n          1945258) opencv_perf_cor  cpu=5 start=142.17 finish=208.65\n          1945259) opencv_perf_cor  cpu=8 start=142.17 finish=208.65\n          1945260) opencv_perf_cor  cpu=14 start=142.17 finish=208.65\n          1945261) opencv_perf_cor  cpu=10 start=142.17 finish=208.65\n          1945262) opencv_perf_cor  cpu=0 start=142.17 finish=208.65\n          1945263) opencv_perf_cor  cpu=13 start=142.17 finish=208.65\n          1945264) opencv_perf_cor  cpu=6 start=142.17 finish=208.65\n          1945265) opencv_perf_cor  cpu=15 start=142.17 finish=208.65\n          1945266) opencv_perf_cor  cpu=7 start=142.17 finish=208.65\n          1945267) opencv_perf_cor  cpu=2 start=142.17 finish=208.65\n          1945268) opencv_perf_cor  cpu=3 start=142.17 finish=208.65\n          1945269) opencv_perf_cor  cpu=1 start=142.17 finish=208.65\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Eight different built-in benchmarks for opencv operations. Looks like a mixture of single-threaded and multi-core tests. Some of them take some time to stabilize so get run more than three times. Topdown shows a mixture of profiles depending on the <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/opencv\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-651","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/651","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=651"}],"version-history":[{"count":1,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/651\/revisions"}],"predecessor-version":[{"id":654,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/651\/revisions\/654"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=651"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}