{"id":2491,"date":"2024-07-13T20:12:07","date_gmt":"2024-07-13T20:12:07","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=2491"},"modified":"2024-07-13T21:38:13","modified_gmt":"2024-07-13T21:38:13","slug":"tjbench","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/tjbench\/","title":{"rendered":"tjbench"},"content":{"rendered":"\n<p>A JPEG compression\/decompression benchmark. This reports as a single number of decompression throughput that runs in ~30 seconds. This test looks to be single-threaded.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/07\/systemtime-1.png\" alt=\"\" class=\"wp-image-2503\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/07\/systemtime-1.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/07\/systemtime-1-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/07\/systemtime-1-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown metrics have a moderately high retirement rate with some backend stalls. There are also higher than average branch mis-predictions.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/07\/amdtopdown-1.png\" alt=\"\" class=\"wp-image-2505\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/07\/amdtopdown-1.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/07\/amdtopdown-1-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/07\/amdtopdown-1-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics confirm this runs on one core. There is little floating point code.  Only ~90 branches per 1000 instructions but still a 3% branch mis-prediction rate.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              90.610\non_cpu               0.053          # 0.85 \/ 16 cores\nutime                75.875\nstime                0.862\nnvcsw                1770           # 76.92%\nnivcsw               531            # 23.08%\ninblock              0              # 0.00\/sec\nonblock              12648          # 139.59\/sec\ncpu-clock            76766383142    # 76.766 seconds\ntask-clock           76769375887    # 76.769 seconds\npage faults          190680         # 2483.803\/sec\ncontext switches     2586           # 33.685\/sec\ncpu migrations       265            # 3.452\/sec\nmajor page faults    2              # 0.026\/sec\nminor page faults    190678         # 2483.777\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             93460735704    # 89.021 branches per 1000 inst\nbranch misses        3008416876     # 3.22% branch miss\nconditional          86580033243    # 82.467 conditional branches per 1000 inst\nindirect             1032761588     # 0.984 indirect branches per 1000 inst\ncpu-cycles           351394771396   # 0.24 GHz\ninstructions         1050118967313  # 2.99 IPC\nslots                704878734078   #\nretiring             334650976850   # 47.5% (47.5%)\n-- ucode             70653896       #     0.0%\n-- fastpath          334580322954   #    47.5%\nfrontend             61203322757    #  8.7% ( 8.7%)\n-- latency           47074633296    #     6.7%\n-- bandwidth         14128689461    #     2.0%\nbackend              209972253736   # 29.8% (29.8%)\n-- cpu               73322992101    #    10.4%\n-- memory            136649261635   #    19.4%\nspeculation          98809174500    # 14.0% (14.0%) high\n-- branch mispredict 98683006907    #    14.0%\n-- pipeline restart  126167593      #     0.0%\nsmt-contention       242650350      #  0.0% ( 0.0%)\ncpu-cycles           351908233616   # 0.24 GHz\ninstructions         1046603000920  # 2.97 IPC\ninstructions         349033167457   # 5.604 l2 access per 1000 inst\nl2 hit from l1       1278995515     # 15.64% l2 miss\nl2 miss from l1      34123235       #\nl2 hit from l2 pf    405098336      #\nl3 hit from l2 pf    24702013       #\nl3 miss from l2 pf   247064594      #\ninstructions         349083448347   # 9.194 float per 1000 inst\nfloat 512            80             # 0.000 AVX-512 per 1000 inst\nfloat 256            674            # 0.000 AVX-256 per 1000 inst\nfloat 128            3209459937     # 9.194 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         2              # 0.000 scalar per 1000 inst\ninstructions         1047363530640  #\nopcache              175933785516   # 167.978 opcache per 1000 inst\nopcache miss         686228671      #  0.4% opcache miss rate\nl1 dTLB miss         39189917       # 0.037 L1 dTLB per 1000 inst\nl2 dTLB miss         19063155       # 0.018 L2 dTLB per 1000 inst\ninstructions         1053405213672  #\nicache               1421453866     # 1.349 icache per 1000 inst\nicache miss          141139750      #  9.9% icache miss rate\nl1 iTLB miss         904104         # 0.001 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            20701          # 0.000 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel metrics also show high branch mis-prediction.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              90.210\non_cpu               0.053          # 0.85 \/ 16 cores\nutime                75.892\nstime                0.420\nnvcsw                1477           # 78.02%\nnivcsw               416            # 21.98%\ninblock              24             # 0.27\/sec\nonblock              1368           # 15.16\/sec\ncpu-clock            76324507443    # 76.325 seconds\ntask-clock           76326910376    # 76.327 seconds\npage faults          147820         # 1936.670\/sec\ncontext switches     2174           # 28.483\/sec\ncpu migrations       189            # 2.476\/sec\nmajor page faults    0              # 0.000\/sec\nminor page faults    147820         # 1936.670\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             69225074899    # 88.832 branches per 1000 inst\nbranch misses        2127831269     # 3.07% branch miss\nconditional          69225084755    # 88.832 conditional branches per 1000 inst\nindirect             767701560      # 0.985 indirect branches per 1000 inst\nslots                1729062295772  #\nretiring             757616713936   # 43.8% (43.8%)\n-- ucode             49787088500    #     2.9%\n-- fastpath          707829625436   #    40.9%\nfrontend             84350730758    #  4.9% ( 4.9%) low\n-- latency           44595436672    #     2.6%\n-- bandwidth         39755294086    #     2.3%\nbackend              533296092080   # 30.8% (30.8%)\n-- cpu               459462063373   #    26.6%\n-- memory            73834028707    #     4.3%\nspeculation          362832053071   # 21.0% (21.0%) high\n-- branch mispredict 362770144112   #    21.0%\n-- pipeline restart  61908959       #     0.0%\nsmt-contention       0              #  0.0% ( 0.0%)\ncpu-cycles           288157381870   # 0.20 GHz\ninstructions         779238641030   # 2.70 IPC\nl2 access            3709656569     # 4.761 l2 access per 1000 inst\nl2 miss              1548267647     # 41.74% l2 miss\ncpu-cycles           288650034648   #  8.9% memory latency\nload stalls          25393223674    #  8.2% l1 bound\nl1 miss              1812571872     #  0.5% l2 bound\nl2 miss              500138858      #  0.1% l3 bound\nl3 miss              350603520      #  0.1% dram bound\nstore_stalls         325944469      #  0.1% store bound\n<\/code><\/pre>\n\n\n\n<p>Process overview shows everything as single invocations of tjbench<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>322 processes\n\t  6 tjbench                 75.07     0.14\n\t 36 clinfo                   3.94     2.08\n\t 38 vulkaninfo               1.33     1.15\n\t  4 vulkani:disk$0           0.14     0.13\n\t  6 glxinfo:gdrv0            0.09     0.07\n\t  6 glxinfo:gl0              0.09     0.07\n\t  6 php                      0.08     0.05\n\t  2 llvmpipe-0               0.07     0.07\n\t  2 llvmpipe-1               0.07     0.07\n\t  2 llvmpipe-10              0.07     0.07\n\t  2 llvmpipe-11              0.07     0.07\n\t  2 llvmpipe-12              0.07     0.07\n\t  2 llvmpipe-13              0.07     0.07\n\t  2 llvmpipe-14              0.07     0.07\n\t  2 llvmpipe-15              0.07     0.07\n\t  2 llvmpipe-2               0.07     0.07\n\t  2 llvmpipe-3               0.07     0.07\n\t  2 llvmpipe-4               0.07     0.07\n\t  2 llvmpipe-5               0.07     0.07\n\t  2 llvmpipe-6               0.07     0.07\n\t  2 llvmpipe-7               0.07     0.07\n\t  2 llvmpipe-8               0.07     0.07\n\t  2 llvmpipe-9               0.07     0.07\n\t  6 clang                    0.06     0.06\n\t  2 glxinfo                  0.05     0.04\n\t  2 glxinfo:cs0              0.05     0.04\n\t  2 glxinfo:disk$0           0.05     0.04\n\t  2 glxinfo:sh0              0.05     0.04\n\t  2 glxinfo:shlo0            0.05     0.04\n\t  3 rocminfo                 0.00     0.03\n\t  1 lspci                    0.00     0.02\n\t 82 sh                       0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t 12 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  3 gmain                    0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dconf worker             0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 ps                       0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Very simple computation blocks<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      64403) tjbench          cpu=3 start=5.45  finish=30.51\n        64404) tjbench          cpu=13 start=5.45  finish=30.51\n      64409) tjbench          cpu=0 start=34.51 finish=59.62\n        64410) tjbench          cpu=1 start=34.52 finish=59.62\n      64412) tjbench          cpu=0 start=63.62 finish=88.71\n        64413) tjbench          cpu=1 start=63.62 finish=88.71\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>A JPEG compression\/decompression benchmark. This reports as a single number of decompression throughput that runs in ~30 seconds. This test looks to be single-threaded. Topdown metrics have a moderately high retirement rate with some backend stalls. There are also higher <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/tjbench\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2491","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2491","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=2491"}],"version-history":[{"count":4,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2491\/revisions"}],"predecessor-version":[{"id":2511,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2491\/revisions\/2511"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=2491"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}