Testing the fast fourier transform library with FFTs in 32-different sizes and dimensions. OVerall benchmark looks single-threadedand varies on how much the CPU cores are busy.

Topdown overview varies by benchmark but most have little frontend stalls and more backend stalls. Also seems to vary with backend retirement. This is case where I expect contrasts if you pull apart different size ffts.

AMD topdown metrics show almost 40% floating point with few branches. A moderate L2 miss rate with memory dominating backend stalls over floating point.
elapsed 6035.507
on_cpu 0.052 # 0.84 / 16 cores
utime 5057.912
stime 4.091
nvcsw 3674 # 13.24%
nivcsw 24069 # 86.76%
inblock 2608 # 0.43/sec
onblock 132160 # 21.90/sec
cpu-clock 5063022032326 # 5063.022 seconds
task-clock 5063105505005 # 5063.106 seconds
page faults 1403691 # 277.239/sec
context switches 57281 # 11.313/sec
cpu migrations 1619 # 0.320/sec
major page faults 2 # 0.000/sec
minor page faults 1403689 # 277.239/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 1965580397202 # 39.916 branches per 1000 inst
branch misses 3298066028 # 0.17% branch miss
conditional 1656507779012 # 33.640 conditional branches per 1000 inst
indirect 91167644504 # 1.851 indirect branches per 1000 inst
cpu-cycles 19871548371102 # 0.23 GHz
instructions 41895510041240 # 2.11 IPC
slots 39757752428328 #
retiring 14486190179724 # 36.4% (36.4%)
-- ucode 16153741583 # 0.0%
-- fastpath 14470036438141 # 36.4%
frontend 1299222028903 # 3.3% ( 3.3%)
-- latency 488647318608 # 1.2%
-- bandwidth 810574710295 # 2.0%
backend 23765685993487 # 59.8% (59.8%)
-- cpu 6143636408133 # 15.5%
-- memory 17622049585354 # 44.3%
speculation 205999054100 # 0.5% ( 0.5%)
-- branch mispredict 115970837458 # 0.3%
-- pipeline restart 90028216642 # 0.2%
smt-contention 654137508 # 0.0% ( 0.0%)
cpu-cycles 22621150909378 # 0.23 GHz
instructions 46306295646487 # 2.05 IPC
instructions 15440006308981 # 54.750 l2 access per 1000 inst
l2 hit from l1 551234904807 # 19.92% l2 miss
l2 miss from l1 60719432531 #
l2 hit from l2 pf 186431059131 #
l3 hit from l2 pf 48964245685 #
l3 miss from l2 pf 58708583119 #
instructions 15435614877141 # 384.957 float per 1000 inst
float 512 230 # 0.000 AVX-512 per 1000 inst
float 256 390 # 0.000 AVX-256 per 1000 inst
float 128 5942046309998 # 384.957 AVX-128 per 1000 inst
float MMX 0 # 0.000 MMX per 1000 inst
float scalar 0 # 0.000 scalar per 1000 inst
Intel metrics
elapsed 3010.244
on_cpu 0.049 # 0.78 / 16 cores
utime 2335.260
stime 2.102
nvcsw 2964 # 20.89%
nivcsw 11223 # 79.11%
inblock 24 # 0.01/sec
onblock 105536 # 35.06/sec
cpu-clock 2337833277758 # 2337.833 seconds
task-clock 2337870126196 # 2337.870 seconds
page faults 683541 # 292.378/sec
context switches 28760 # 12.302/sec
cpu migrations 651 # 0.278/sec
major page faults 0 # 0.000/sec
minor page faults 683541 # 292.378/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 827918918657 # 38.741 branches per 1000 inst
branch misses 2461654182 # 0.30% branch miss
conditional 827918940033 # 38.741 conditional branches per 1000 inst
indirect 41023014118 # 1.920 indirect branches per 1000 inst
slots 86916464289776 #
retiring 39007424491498 # 44.9% (44.9%)
-- ucode 1317253774803 # 1.5%
-- fastpath 37690170716695 # 43.4%
frontend 3309488424692 # 3.8% ( 3.8%)
-- latency 1285710019402 # 1.5%
-- bandwidth 2023778405290 # 2.3%
backend 46132471585328 # 53.1% (53.1%)
-- cpu 10736832716887 # 12.4%
-- memory 35395638868441 # 40.7%
speculation 1999383497821 # 2.3% ( 2.3%)
-- branch mispredict 1441628404359 # 1.7%
-- pipeline restart 557755093462 # 0.6%
smt-contention 0 # 0.0% ( 0.0%)
cpu-cycles 8626088451177 # 0.17 GHz
instructions 22951385088498 # 2.66 IPC
l2 access 725794661607 # 31.630 l2 access per 1000 inst
l2 miss 323765912211 # 44.61% l2 miss
Process overview shows not many processes and using an internal bench program. This did crash towards end of first run.
613 processes
161 bench 2461.12 0.84
34 clinfo 9.74 3.33
19 vulkaninfo 0.76 0.57
2 vulkani:disk$0 0.08 0.06
3 glxinfo:gdrv0 0.07 0.06
6 clang 0.05 0.07
1 llvmpipe-0 0.04 0.03
1 llvmpipe-1 0.04 0.03
1 llvmpipe-10 0.04 0.03
1 llvmpipe-11 0.04 0.03
1 llvmpipe-12 0.04 0.03
1 llvmpipe-13 0.04 0.03
1 llvmpipe-14 0.04 0.03
1 llvmpipe-15 0.04 0.03
1 llvmpipe-2 0.04 0.03
1 llvmpipe-3 0.04 0.03
1 llvmpipe-4 0.04 0.03
1 llvmpipe-5 0.04 0.03
1 llvmpipe-6 0.04 0.03
1 llvmpipe-7 0.04 0.03
1 llvmpipe-8 0.04 0.03
1 llvmpipe-9 0.04 0.03
1 glxinfo 0.04 0.02
1 glxinfo:cs0 0.04 0.02
1 glxinfo:disk$0 0.03 0.02
1 glxinfo:sh0 0.03 0.02
1 glxinfo:shlo0 0.03 0.02
1 ps 0.00 0.01
281 sh 0.00 0.00
13 gcc 0.00 0.00
8 gsettings 0.00 0.00
8 systemd-detect- 0.00 0.00
7 stat 0.00 0.00
6 llvm-link 0.00 0.00
5 gmain 0.00 0.00
4 phoronix-test-s 0.00 0.00
2 dconf worker 0.00 0.00
2 which 0.00 0.00
1 cc 0.00 0.00
1 date 0.00 0.00
1 dirname 0.00 0.00
1 grep 0.00 0.00
1 ifconfig 0.00 0.00
1 ip 0.00 0.00
1 lscpu 0.00 0.00
1 mktemp 0.00 0.00
1 qdbus 0.00 0.00
1 readlink 0.00 0.00
1 realpath 0.00 0.00
1 sed 0.00 0.00
1 sort 0.00 0.00
1 stty 0.00 0.00
1 systemctl 0.00 0.00
1 template.sh 0.00 0.00
1 wc 0.00 0.00
1 xrandr 0.00 0.00
1 xset 0.00 0.00
11 processes running
47 maximum processes
Computation is repeated invocations of bench, e.g.
294703) sh cpu=0 start=29.18 finish=32.71
294704) bench cpu=4 start=29.18 finish=32.71
294705) sh cpu=0 start=36.72 finish=40.27
294706) bench cpu=9 start=36.72 finish=40.27
294707) sh cpu=0 start=44.27 finish=47.81
294708) bench cpu=1 start=44.27 finish=47.81
294709) sh cpu=0 start=47.81 finish=47.82
294710) sh cpu=1 start=47.81 finish=47.82
294711) sh cpu=1 start=58.22 finish=60.70
294712) bench cpu=2 start=58.23 finish=60.70
294713) sh cpu=8 start=64.70 finish=67.14
294714) bench cpu=9 start=64.70 finish=67.14
294715) sh cpu=8 start=71.14 finish=73.58
294716) bench cpu=1 start=71.14 finish=73.58
294717) sh cpu=10 start=73.58 finish=73.58
294718) sh cpu=11 start=73.58 finish=73.58
294719) sh cpu=10 start=92.97 finish=96.45
294720) bench cpu=3 start=92.97 finish=96.45
294721) sh cpu=10 start=100.45 finish=103.90
294722) bench cpu=11 start=100.45 finish=103.90
294723) sh cpu=2 start=107.90 finish=111.35
294724) bench cpu=3 start=107.91 finish=111.35
294725) sh cpu=3 start=111.35 finish=111.35
294726) sh cpu=12 start=111.35 finish=111.35
294727) sh cpu=2 start=123.78 finish=126.36
294728) bench cpu=3 start=123.78 finish=126.36
294729) sh cpu=2 start=130.37 finish=132.90
294730) bench cpu=11 start=130.37 finish=132.89
294731) sh cpu=10 start=136.90 finish=139.41
294732) bench cpu=3 start=136.90 finish=139.41
294733) sh cpu=12 start=139.41 finish=139.42
294734) sh cpu=5 start=139.41 finish=139.41
294735) sh cpu=2 start=155.84 finish=158.60
294736) bench cpu=3 start=155.84 finish=158.60
294737) sh cpu=10 start=162.61 finish=165.38
294738) bench cpu=3 start=162.61 finish=165.38
294740) sh cpu=10 start=169.38 finish=172.19
294741) bench cpu=11 start=169.38 finish=172.19
294742) sh cpu=10 start=172.19 finish=172.19
294743) sh cpu=11 start=172.19 finish=172.19
294745) sh cpu=2 start=183.37 finish=186.57
294746) bench cpu=11 start=183.37 finish=186.57
294747) sh cpu=2 start=190.57 finish=193.77
294748) bench cpu=3 start=190.57 finish=193.76
294750) sh cpu=2 start=197.77 finish=200.99
294751) bench cpu=3 start=197.77 finish=200.98
294752) sh cpu=2 start=200.99 finish=200.99
294753) sh cpu=3 start=200.99 finish=200.99
