{"id":2082,"date":"2024-03-16T09:44:03","date_gmt":"2024-03-16T09:44:03","guid":{"rendered":"https:\/\/mvermeulen.org\/perf\/?page_id=2082"},"modified":"2024-03-17T19:40:00","modified_gmt":"2024-03-17T19:40:00","slug":"srsran","status":"publish","type":"page","link":"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/srsran\/","title":{"rendered":"srsran"},"content":{"rendered":"\n<p>Open Radio Access Network (ORAN) solution to build software-defined radio. There are four workloads The first two workloads look to be parallel, the last two are sequential.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-33.png\" alt=\"\" class=\"wp-image-2097\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-33.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-33-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/systemtime-33-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>Topdown profile shows each worload with slightly different profile. The first two have half of time spent with backend stalls and a ~40% retirement rate. The third has a higher retirement rate and the last is closer to 50% retirement rate.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"960\" src=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-33.png\" alt=\"\" class=\"wp-image-2099\" srcset=\"https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-33.png 1280w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-33-1024x768.png 1024w, https:\/\/mvermeulen.org\/perf\/wp-content\/uploads\/sites\/7\/2024\/03\/amdtopdown-33-768x576.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>AMD metrics confirm a ~40% retirement rate overall and a higher backend stalls. This has some floating point but not much.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>elapsed              184.315\non_cpu               0.304          # 4.86 \/ 16 cores\nutime                858.389\nstime                37.924\nnvcsw                7080326        # 97.64%\nnivcsw               171397         # 2.36%\ninblock              16             # 0.09\/sec\nonblock              18120          # 98.31\/sec\ncpu-clock            888938811651   # 888.939 seconds\ntask-clock           891136487519   # 891.136 seconds\npage faults          325548         # 365.318\/sec\ncontext switches     7252444        # 8138.421\/sec\ncpu migrations       3043963        # 3415.821\/sec\nmajor page faults    20             # 0.022\/sec\nminor page faults    325528         # 365.295\/sec\nalignment faults     0              # 0.000\/sec\nemulation faults     0              # 0.000\/sec\nbranches             1115058070415  # 168.313 branches per 1000 inst\nbranch misses        2159669589     # 0.19% branch miss\nconditional          1031833810350  # 155.751 conditional branches per 1000 inst\nindirect             12837134731    # 1.938 indirect branches per 1000 inst\ncpu-cycles           3384434849412  # 1.15 GHz\ninstructions         6634137352435  # 1.96 IPC\nslots                6739184752614  #\nretiring             2175581102386  # 32.3% (42.1%)\n-- ucode             3289331883     #     0.0%\n-- fastpath          2172291770503  #    32.2%\nfrontend             517226906119   #  7.7% (10.0%)\n-- latency           238154875566   #     3.5%\n-- bandwidth         279072030553   #     4.1%\nbackend              2445323400774  # 36.3% (47.4%)\n-- cpu               881640170675   #    13.1%\n-- memory            1563683230099  #    23.2%\nspeculation          25160402672    #  0.4% ( 0.5%) low\n-- branch mispredict 22567719323    #     0.3%\n-- pipeline restart  2592683349     #     0.0%\nsmt-contention       1575202293052  # 23.4% ( 0.0%)\ncpu-cycles           3373182492320  # 1.15 GHz\ninstructions         6633293431705  # 1.97 IPC\ninstructions         2207163897959  # 37.149 l2 access per 1000 inst\nl2 hit from l1       57435278307    # 7.04% l2 miss\nl2 miss from l1      1601444587     #\nl2 hit from l2 pf    20391479329    #\nl3 hit from l2 pf    2534642588     #\nl3 miss from l2 pf   1633034321     #\ninstructions         2203789809437  # 39.588 float per 1000 inst\nfloat 512            53             # 0.000 AVX-512 per 1000 inst\nfloat 256            2743661914     # 1.245 AVX-256 per 1000 inst\nfloat 128            84500288655    # 38.343 AVX-128 per 1000 inst\nfloat MMX            0              # 0.000 MMX per 1000 inst\nfloat scalar         0              # 0.000 scalar per 1000 inst\ninstructions         6624926556106  #\nopcache              666508147179   # 100.606 opcache per 1000 inst\nopcache miss         19798144485    #  3.0% opcache miss rate\nl1 dTLB miss         1809004813     # 0.273 L1 dTLB per 1000 inst\nl2 dTLB miss         149171427      # 0.023 L2 dTLB per 1000 inst\ninstructions         6804331263951  #\nicache               34604771673    # 5.086 icache per 1000 inst\nicache miss          7194374108     # 20.8% icache miss rate\nl1 iTLB miss         82777390       # 0.012 L1 iTLB per 1000 inst\nl2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst\ntlb flush            23454          # 0.000 TLB flush per 1000 inst\n<\/code><\/pre>\n\n\n\n<p>Intel version appears to hang.  It is unclear why.  No entries in syslog and there is enough memory<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>mev@hobart:~$ free\n               total        used        free      shared  buff\/cache   available\nMem:        16128408     1411212    10187172      868684     4530024    13537592\nSwap:        2097148           0     2097148\n<\/code><\/pre>\n\n\n\n<p>It is in the second workload after having run multiple versions already<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>srsRAN Project 23.10.1-20240219:\n    pts\/srsran-2.2.0 &#91;Test: PUSCH Processor Benchmark, Throughput Total]\n    Test 2 of 4\n    Estimated Trial Run Count:    3                     \n    Estimated Test Run-Time:      1 Minute              \n    Estimated Time To Completion: 3 Minutes &#91;14:14 CDT] \n        Started Run 1 @ 14:12:10\n        Started Run 2 @ 14:12:30\n        Started Run 3 @ 14:12:51\n        Started Run 4 @ 14:13:12 *\n        Started Run 5 @ 14:13:34 *\n        Started Run 6 @ 14:13:55 *\n        Started Run 7 @ 14:14:17 *\n        Started Run 8 @ 14:14:38 *\n        Started Run 9 @ 14:15:00 *\n        Started Run 10 @ 14:15:21 *\n        Started Run 11 @ 14:15:43 *\n        Started Run 12 @ 14:16:04 *\n        Started Run 13 @ 14:16:25 *\n        Started Run 14 @ 14:16:47 *\n\n<\/code><\/pre>\n\n\n\n<p>Nothing immediately obvious from the thread profile &#8211; all the children of the pusch-processor_benchmark appear to have exited but the parent still appears to be hung.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>mev         5831    3627  0 14:11 pts\/0    00:00:00           \/bin\/bash .\/run_test.sh\nmev         5833    5831  0 14:11 pts\/0    00:00:00             \/home\/mev\/source\/wspy\/wspy -o software.branch.txt --rusage --software --branch --no-ipc phoronix-test-suite batch-run srsran\nmev         5835    5833  0 14:11 pts\/0    00:00:00               \/bin\/sh \/usr\/bin\/phoronix-test-suite batch-run srsran\nmev         5848    5835  0 14:11 pts\/0    00:00:00                 sh -c  php \/usr\/share\/phoronix-test-suite\/\/pts-core\/phoronix-test-suite.php batch-run srsran\nmev         5849    5848  0 14:11 pts\/0    00:00:00                   Phoronix Test Suite\nmev         5880    5849  0 14:11 pts\/0    00:00:00                     sh -c php -S localhost:8211 -t \/usr\/share\/phoronix-test-suite\/pts-core\/static\/dynamic-result-viewer\/ \nmev         5881    5880  0 14:11 pts\/0    00:00:00                       php -S localhost:8211 -t \/usr\/share\/phoronix-test-suite\/pts-core\/static\/dynamic-result-viewer\/\nmev         5882    5881  0 14:11 pts\/0    00:00:00                         php -S localhost:8211 -t \/usr\/share\/phoronix-test-suite\/pts-core\/static\/dynamic-result-viewer\/\nmev         5883    5881  0 14:11 pts\/0    00:00:00                         php -S localhost:8211 -t \/usr\/share\/phoronix-test-suite\/pts-core\/static\/dynamic-result-viewer\/\nmev         5884    5881  0 14:11 pts\/0    00:00:00                         php -S localhost:8211 -t \/usr\/share\/phoronix-test-suite\/pts-core\/static\/dynamic-result-viewer\/\nmev         5885    5881  0 14:11 pts\/0    00:00:00                         php -S localhost:8211 -t \/usr\/share\/phoronix-test-suite\/pts-core\/static\/dynamic-result-viewer\/\nmev         6686    5849  0 14:16 pts\/0    00:00:00                     \/bin\/sh .\/srsran tests\/benchmarks\/phy\/upper\/channel_processors\/pusch\/pusch_processor_benchmark -m throughput_total -R 100 -B 10 -P pusch_scs30_100MHz_256qam_max\nmev         6687    6686 46 14:16 pts\/0    00:07:14                       .\/tests\/benchmarks\/phy\/upper\/channel_processors\/pusch\/pusch_processor_benchmark -m throughput_total -R 100 -B 10 -P pusch_scs30_100MHz_256qam_max\nmev         5834    5831  0 14:11 pts\/0    00:00:00             tee intel.srsran.out\n<\/code><\/pre>\n\n\n\n<p>Process overview gives explicitly named threads<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>55 processes\n\t 18 thread_0               959.10    34.70\n\t  9 thread_1               898.78    33.68\n\t  9 thread_11              898.78    33.68\n\t  9 thread_12              898.78    33.68\n\t  9 thread_13              898.78    33.68\n\t  9 thread_14              898.78    33.68\n\t  9 thread_2               898.78    33.68\n\t  9 thread_3               898.78    33.68\n\t  9 thread_4               898.78    33.68\n\t  9 thread_5               898.78    33.68\n\t  9 thread_6               898.78    33.68\n\t  9 thread_7               898.78    33.68\n\t  9 thread_8               898.78    33.68\n\t  9 thread_9               898.78    33.68\n\t  9 thread_10              898.77    33.68\n\t  9 thread_15              898.77    33.68\n\t  6 pdsch_processor        471.26     4.86\n\t  6 pusch_processor        370.21    23.42\n\t  3 decoder#0              332.26    22.85\n\t  3 decoder#1              332.26    22.85\n\t  3 decoder#2              332.26    22.85\n\t  3 decoder#3              332.26    22.85\n\t  3 decoder#4              332.26    22.85\n\t  3 decoder#5              332.26    22.85\n\t  3 decoder#6              332.26    22.85\n\t  3 decoder#7              332.26    22.85\n\t 68 clinfo                  15.86     6.99\n\t 38 vulkaninfo               0.94     1.52\n\t  4 vulkani:disk$0           0.10     0.16\n\t  6 glxinfo:gdrv0            0.09     0.06\n\t  6 glxinfo:gl0              0.09     0.06\n\t  6 php                      0.07     0.11\n\t  2 glxinfo                  0.06     0.02\n\t  2 llvmpipe-0               0.05     0.08\n\t  2 llvmpipe-1               0.05     0.08\n\t  2 llvmpipe-10              0.05     0.08\n\t  2 llvmpipe-11              0.05     0.08\n\t  2 llvmpipe-12              0.05     0.08\n\t  2 llvmpipe-13              0.05     0.08\n\t  2 llvmpipe-14              0.05     0.08\n\t  2 llvmpipe-15              0.05     0.08\n\t  2 llvmpipe-2               0.05     0.08\n\t  2 llvmpipe-3               0.05     0.08\n\t  2 llvmpipe-4               0.05     0.08\n\t  2 llvmpipe-5               0.05     0.08\n\t  2 llvmpipe-6               0.05     0.08\n\t  2 llvmpipe-7               0.05     0.08\n\t  2 llvmpipe-8               0.05     0.08\n\t  2 llvmpipe-9               0.05     0.08\n\t  2 glxinfo:cs0              0.05     0.02\n\t  2 glxinfo:disk$0           0.05     0.02\n\t  2 glxinfo:sh0              0.05     0.02\n\t  2 glxinfo:shlo0            0.05     0.02\n\t  6 clang                    0.04     0.08\n\t  3 rocminfo                 0.03     0.00\n\t  1 lspci                    0.01     0.02\n\t  1 ps                       0.00     0.01\n\t 88 sh                       0.00     0.00\n\t 13 gcc                      0.00     0.00\n\t 12 srsran                   0.00     0.00\n\t 10 gsettings                0.00     0.00\n\t  8 stat                     0.00     0.00\n\t  8 systemd-detect-          0.00     0.00\n\t  6 llvm-link                0.00     0.00\n\t  5 phoronix-test-s          0.00     0.00\n\t  4 gmain                    0.00     0.00\n\t  2 cc                       0.00     0.00\n\t  2 dconf worker             0.00     0.00\n\t  2 lscpu                    0.00     0.00\n\t  2 uname                    0.00     0.00\n\t  2 which                    0.00     0.00\n\t  2 xset                     0.00     0.00\n\t  1 date                     0.00     0.00\n\t  1 dirname                  0.00     0.00\n\t  1 dmesg                    0.00     0.00\n\t  1 dmidecode                0.00     0.00\n\t  1 grep                     0.00     0.00\n\t  1 ifconfig                 0.00     0.00\n\t  1 ip                       0.00     0.00\n\t  1 lsmod                    0.00     0.00\n\t  1 mktemp                   0.00     0.00\n\t  1 qdbus                    0.00     0.00\n\t  1 readlink                 0.00     0.00\n\t  1 realpath                 0.00     0.00\n\t  1 sed                      0.00     0.00\n\t  1 sort                     0.00     0.00\n\t  1 stty                     0.00     0.00\n\t  1 systemctl                0.00     0.00\n\t  1 template.sh              0.00     0.00\n\t  1 wc                       0.00     0.00\n\t  1 xrandr                   0.00     0.00\n0 processes running\n47 maximum processes\n<\/code><\/pre>\n\n\n\n<p>Computation blocks<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>      64692) srsran           cpu=2 start=5.62  finish=15.68\n        64693) pdsch_processor  cpu=1 start=5.63  finish=15.68\n          64694) thread_0         cpu=3 start=5.65  finish=15.68\n          64695) thread_1         cpu=15 start=5.65  finish=15.68\n          64696) thread_2         cpu=14 start=5.65  finish=15.68\n          64697) thread_3         cpu=4 start=5.65  finish=15.68\n          64698) thread_4         cpu=9 start=5.65  finish=15.68\n          64699) thread_5         cpu=12 start=5.66  finish=15.68\n          64700) thread_6         cpu=7 start=5.66  finish=15.68\n          64701) thread_7         cpu=11 start=5.66  finish=15.68\n          64702) thread_8         cpu=5 start=5.66  finish=15.68\n          64703) thread_9         cpu=6 start=5.66  finish=15.68\n          64704) thread_10        cpu=8 start=5.66  finish=15.68\n          64705) thread_11        cpu=13 start=5.66  finish=15.68\n          64706) thread_12        cpu=11 start=5.66  finish=15.68\n          64707) thread_13        cpu=2 start=5.66  finish=15.68\n          64708) thread_14        cpu=0 start=5.66  finish=15.68\n          64709) thread_15        cpu=10 start=5.66  finish=15.68\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Open Radio Access Network (ORAN) solution to build software-defined radio. There are four workloads The first two workloads look to be parallel, the last two are sequential. Topdown profile shows each worload with slightly different profile. The first two have <span class=\"excerpt-dots\">&hellip;<\/span> <a class=\"more-link\" href=\"https:\/\/mvermeulen.org\/perf\/workloads\/phoronix\/srsran\/\"><span class=\"more-msg\">Continue reading &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":58,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2082","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2082","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/comments?post=2082"}],"version-history":[{"count":4,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2082\/revisions"}],"predecessor-version":[{"id":2104,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/2082\/revisions\/2104"}],"up":[{"embeddable":true,"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/pages\/58"}],"wp:attachment":[{"href":"https:\/\/mvermeulen.org\/perf\/wp-json\/wp\/v2\/media?parent=2082"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}