Open Radio Access Network (ORAN) solution to build software-defined radio. There are four workloads The first two workloads look to be parallel, the last two are sequential.

Topdown profile shows each worload with slightly different profile. The first two have half of time spent with backend stalls and a ~40% retirement rate. The third has a higher retirement rate and the last is closer to 50% retirement rate.

AMD metrics confirm a ~40% retirement rate overall and a higher backend stalls. This has some floating point but not much.
elapsed 184.315
on_cpu 0.304 # 4.86 / 16 cores
utime 858.389
stime 37.924
nvcsw 7080326 # 97.64%
nivcsw 171397 # 2.36%
inblock 16 # 0.09/sec
onblock 18120 # 98.31/sec
cpu-clock 888938811651 # 888.939 seconds
task-clock 891136487519 # 891.136 seconds
page faults 325548 # 365.318/sec
context switches 7252444 # 8138.421/sec
cpu migrations 3043963 # 3415.821/sec
major page faults 20 # 0.022/sec
minor page faults 325528 # 365.295/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 1115058070415 # 168.313 branches per 1000 inst
branch misses 2159669589 # 0.19% branch miss
conditional 1031833810350 # 155.751 conditional branches per 1000 inst
indirect 12837134731 # 1.938 indirect branches per 1000 inst
cpu-cycles 3384434849412 # 1.15 GHz
instructions 6634137352435 # 1.96 IPC
slots 6739184752614 #
retiring 2175581102386 # 32.3% (42.1%)
-- ucode 3289331883 # 0.0%
-- fastpath 2172291770503 # 32.2%
frontend 517226906119 # 7.7% (10.0%)
-- latency 238154875566 # 3.5%
-- bandwidth 279072030553 # 4.1%
backend 2445323400774 # 36.3% (47.4%)
-- cpu 881640170675 # 13.1%
-- memory 1563683230099 # 23.2%
speculation 25160402672 # 0.4% ( 0.5%) low
-- branch mispredict 22567719323 # 0.3%
-- pipeline restart 2592683349 # 0.0%
smt-contention 1575202293052 # 23.4% ( 0.0%)
cpu-cycles 3373182492320 # 1.15 GHz
instructions 6633293431705 # 1.97 IPC
instructions 2207163897959 # 37.149 l2 access per 1000 inst
l2 hit from l1 57435278307 # 7.04% l2 miss
l2 miss from l1 1601444587 #
l2 hit from l2 pf 20391479329 #
l3 hit from l2 pf 2534642588 #
l3 miss from l2 pf 1633034321 #
instructions 2203789809437 # 39.588 float per 1000 inst
float 512 53 # 0.000 AVX-512 per 1000 inst
float 256 2743661914 # 1.245 AVX-256 per 1000 inst
float 128 84500288655 # 38.343 AVX-128 per 1000 inst
float MMX 0 # 0.000 MMX per 1000 inst
float scalar 0 # 0.000 scalar per 1000 inst
instructions 6624926556106 #
opcache 666508147179 # 100.606 opcache per 1000 inst
opcache miss 19798144485 # 3.0% opcache miss rate
l1 dTLB miss 1809004813 # 0.273 L1 dTLB per 1000 inst
l2 dTLB miss 149171427 # 0.023 L2 dTLB per 1000 inst
instructions 6804331263951 #
icache 34604771673 # 5.086 icache per 1000 inst
icache miss 7194374108 # 20.8% icache miss rate
l1 iTLB miss 82777390 # 0.012 L1 iTLB per 1000 inst
l2 iTLB miss 0 # 0.000 L2 iTLB per 1000 inst
tlb flush 23454 # 0.000 TLB flush per 1000 inst
Intel version appears to hang. It is unclear why. No entries in syslog and there is enough memory
mev@hobart:~$ free
total used free shared buff/cache available
Mem: 16128408 1411212 10187172 868684 4530024 13537592
Swap: 2097148 0 2097148
It is in the second workload after having run multiple versions already
srsRAN Project 23.10.1-20240219:
pts/srsran-2.2.0 [Test: PUSCH Processor Benchmark, Throughput Total]
Test 2 of 4
Estimated Trial Run Count: 3
Estimated Test Run-Time: 1 Minute
Estimated Time To Completion: 3 Minutes [14:14 CDT]
Started Run 1 @ 14:12:10
Started Run 2 @ 14:12:30
Started Run 3 @ 14:12:51
Started Run 4 @ 14:13:12 *
Started Run 5 @ 14:13:34 *
Started Run 6 @ 14:13:55 *
Started Run 7 @ 14:14:17 *
Started Run 8 @ 14:14:38 *
Started Run 9 @ 14:15:00 *
Started Run 10 @ 14:15:21 *
Started Run 11 @ 14:15:43 *
Started Run 12 @ 14:16:04 *
Started Run 13 @ 14:16:25 *
Started Run 14 @ 14:16:47 *
Nothing immediately obvious from the thread profile – all the children of the pusch-processor_benchmark appear to have exited but the parent still appears to be hung.
mev 5831 3627 0 14:11 pts/0 00:00:00 /bin/bash ./run_test.sh
mev 5833 5831 0 14:11 pts/0 00:00:00 /home/mev/source/wspy/wspy -o software.branch.txt --rusage --software --branch --no-ipc phoronix-test-suite batch-run srsran
mev 5835 5833 0 14:11 pts/0 00:00:00 /bin/sh /usr/bin/phoronix-test-suite batch-run srsran
mev 5848 5835 0 14:11 pts/0 00:00:00 sh -c php /usr/share/phoronix-test-suite//pts-core/phoronix-test-suite.php batch-run srsran
mev 5849 5848 0 14:11 pts/0 00:00:00 Phoronix Test Suite
mev 5880 5849 0 14:11 pts/0 00:00:00 sh -c php -S localhost:8211 -t /usr/share/phoronix-test-suite/pts-core/static/dynamic-result-viewer/
mev 5881 5880 0 14:11 pts/0 00:00:00 php -S localhost:8211 -t /usr/share/phoronix-test-suite/pts-core/static/dynamic-result-viewer/
mev 5882 5881 0 14:11 pts/0 00:00:00 php -S localhost:8211 -t /usr/share/phoronix-test-suite/pts-core/static/dynamic-result-viewer/
mev 5883 5881 0 14:11 pts/0 00:00:00 php -S localhost:8211 -t /usr/share/phoronix-test-suite/pts-core/static/dynamic-result-viewer/
mev 5884 5881 0 14:11 pts/0 00:00:00 php -S localhost:8211 -t /usr/share/phoronix-test-suite/pts-core/static/dynamic-result-viewer/
mev 5885 5881 0 14:11 pts/0 00:00:00 php -S localhost:8211 -t /usr/share/phoronix-test-suite/pts-core/static/dynamic-result-viewer/
mev 6686 5849 0 14:16 pts/0 00:00:00 /bin/sh ./srsran tests/benchmarks/phy/upper/channel_processors/pusch/pusch_processor_benchmark -m throughput_total -R 100 -B 10 -P pusch_scs30_100MHz_256qam_max
mev 6687 6686 46 14:16 pts/0 00:07:14 ./tests/benchmarks/phy/upper/channel_processors/pusch/pusch_processor_benchmark -m throughput_total -R 100 -B 10 -P pusch_scs30_100MHz_256qam_max
mev 5834 5831 0 14:11 pts/0 00:00:00 tee intel.srsran.out
Process overview gives explicitly named threads
55 processes
18 thread_0 959.10 34.70
9 thread_1 898.78 33.68
9 thread_11 898.78 33.68
9 thread_12 898.78 33.68
9 thread_13 898.78 33.68
9 thread_14 898.78 33.68
9 thread_2 898.78 33.68
9 thread_3 898.78 33.68
9 thread_4 898.78 33.68
9 thread_5 898.78 33.68
9 thread_6 898.78 33.68
9 thread_7 898.78 33.68
9 thread_8 898.78 33.68
9 thread_9 898.78 33.68
9 thread_10 898.77 33.68
9 thread_15 898.77 33.68
6 pdsch_processor 471.26 4.86
6 pusch_processor 370.21 23.42
3 decoder#0 332.26 22.85
3 decoder#1 332.26 22.85
3 decoder#2 332.26 22.85
3 decoder#3 332.26 22.85
3 decoder#4 332.26 22.85
3 decoder#5 332.26 22.85
3 decoder#6 332.26 22.85
3 decoder#7 332.26 22.85
68 clinfo 15.86 6.99
38 vulkaninfo 0.94 1.52
4 vulkani:disk$0 0.10 0.16
6 glxinfo:gdrv0 0.09 0.06
6 glxinfo:gl0 0.09 0.06
6 php 0.07 0.11
2 glxinfo 0.06 0.02
2 llvmpipe-0 0.05 0.08
2 llvmpipe-1 0.05 0.08
2 llvmpipe-10 0.05 0.08
2 llvmpipe-11 0.05 0.08
2 llvmpipe-12 0.05 0.08
2 llvmpipe-13 0.05 0.08
2 llvmpipe-14 0.05 0.08
2 llvmpipe-15 0.05 0.08
2 llvmpipe-2 0.05 0.08
2 llvmpipe-3 0.05 0.08
2 llvmpipe-4 0.05 0.08
2 llvmpipe-5 0.05 0.08
2 llvmpipe-6 0.05 0.08
2 llvmpipe-7 0.05 0.08
2 llvmpipe-8 0.05 0.08
2 llvmpipe-9 0.05 0.08
2 glxinfo:cs0 0.05 0.02
2 glxinfo:disk$0 0.05 0.02
2 glxinfo:sh0 0.05 0.02
2 glxinfo:shlo0 0.05 0.02
6 clang 0.04 0.08
3 rocminfo 0.03 0.00
1 lspci 0.01 0.02
1 ps 0.00 0.01
88 sh 0.00 0.00
13 gcc 0.00 0.00
12 srsran 0.00 0.00
10 gsettings 0.00 0.00
8 stat 0.00 0.00
8 systemd-detect- 0.00 0.00
6 llvm-link 0.00 0.00
5 phoronix-test-s 0.00 0.00
4 gmain 0.00 0.00
2 cc 0.00 0.00
2 dconf worker 0.00 0.00
2 lscpu 0.00 0.00
2 uname 0.00 0.00
2 which 0.00 0.00
2 xset 0.00 0.00
1 date 0.00 0.00
1 dirname 0.00 0.00
1 dmesg 0.00 0.00
1 dmidecode 0.00 0.00
1 grep 0.00 0.00
1 ifconfig 0.00 0.00
1 ip 0.00 0.00
1 lsmod 0.00 0.00
1 mktemp 0.00 0.00
1 qdbus 0.00 0.00
1 readlink 0.00 0.00
1 realpath 0.00 0.00
1 sed 0.00 0.00
1 sort 0.00 0.00
1 stty 0.00 0.00
1 systemctl 0.00 0.00
1 template.sh 0.00 0.00
1 wc 0.00 0.00
1 xrandr 0.00 0.00
0 processes running
47 maximum processes
Computation blocks
64692) srsran cpu=2 start=5.62 finish=15.68
64693) pdsch_processor cpu=1 start=5.63 finish=15.68
64694) thread_0 cpu=3 start=5.65 finish=15.68
64695) thread_1 cpu=15 start=5.65 finish=15.68
64696) thread_2 cpu=14 start=5.65 finish=15.68
64697) thread_3 cpu=4 start=5.65 finish=15.68
64698) thread_4 cpu=9 start=5.65 finish=15.68
64699) thread_5 cpu=12 start=5.66 finish=15.68
64700) thread_6 cpu=7 start=5.66 finish=15.68
64701) thread_7 cpu=11 start=5.66 finish=15.68
64702) thread_8 cpu=5 start=5.66 finish=15.68
64703) thread_9 cpu=6 start=5.66 finish=15.68
64704) thread_10 cpu=8 start=5.66 finish=15.68
64705) thread_11 cpu=13 start=5.66 finish=15.68
64706) thread_12 cpu=11 start=5.66 finish=15.68
64707) thread_13 cpu=2 start=5.66 finish=15.68
64708) thread_14 cpu=0 start=5.66 finish=15.68
64709) thread_15 cpu=10 start=5.66 finish=15.68
