Generating prime numbers using a sieve of Eratosthenes implementation. There are two tests with the 1e12 running in ~20 seconds and the 1e13 running in ~140 seconds. This runs almost continuously on all cores.

Topdown profile shows some backend stalls but overall a moderate retirement rate and not as many frontend stalls.

AMD metrics show floating point code and high amount of L2 access but not many L2 misses.
elapsed 730.330
on_cpu 0.951 # 15.22 / 16 cores
utime 11103.872
stime 11.206
nvcsw 2644 # 2.39%
nivcsw 107828 # 97.61%
inblock 2320 # 3.18/sec
onblock 12712 # 17.41/sec
cpu-clock 11116198275008 # 11116.198 seconds
task-clock 11116292624354 # 11116.293 seconds
page faults 3680743 # 331.112/sec
context switches 113946 # 10.250/sec
cpu migrations 345 # 0.031/sec
major page faults 2 # 0.000/sec
minor page faults 3680741 # 331.112/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 6467736098155 # 121.405 branches per 1000 inst
branch misses 294969256552 # 4.56% branch miss
conditional 6085289151533 # 114.226 conditional branches per 1000 inst
indirect 192854270631 # 3.620 indirect branches per 1000 inst
cpu-cycles 41078877047663 # 3.48 GHz
instructions 53292434507748 # 1.30 IPC
slots 82146024099060 #
retiring 19963907311493 # 24.3% (36.0%)
-- ucode 113680648 # 0.0%
-- fastpath 19963793630845 # 24.3%
frontend 7928985516467 # 9.7% (14.3%)
-- latency 5791287028344 # 7.0%
-- bandwidth 2137698488123 # 2.6%
backend 24231602760213 # 29.5% (43.7%)
-- cpu 3875531411835 # 4.7%
-- memory 20356071348378 # 24.8%
speculation 3299885239007 # 4.0% ( 6.0%)
-- branch mispredict 3215528826507 # 3.9%
-- pipeline restart 84356412500 # 0.1%
smt-contention 26721582492463 # 32.5% ( 0.0%)
cpu-cycles 41078257306170 # 3.48 GHz
instructions 53280411545354 # 1.30 IPC
instructions 17760253441937 # 236.184 l2 access per 1000 inst
l2 hit from l1 2541215747844 # 2.26% l2 miss
l2 miss from l1 21608662125 #
l2 hit from l2 pf 1580442451810 #
l3 hit from l2 pf 55955821749 #
l3 miss from l2 pf 17075963566 #
instructions 17754714307026 # 121.509 float per 1000 inst
float 512 60 # 0.000 AVX-512 per 1000 inst
float 256 388 # 0.000 AVX-256 per 1000 inst
float 128 2157353809593 # 121.509 AVX-128 per 1000 inst
float MMX 0 # 0.000 MMX per 1000 inst
float scalar 0 # 0.000 scalar per 1000 inst
instructions 2662617 #
opcache 992622 # 372.799 opcache per 1000 inst
opcache miss 532809 # 53.7% opcache miss rate
l1 dTLB miss 5747 # 2.158 L1 dTLB per 1000 inst
l2 dTLB miss 1222 # 0.459 L2 dTLB per 1000 inst
instructions 2694295 #
icache 1310298 # 486.323 icache per 1000 inst
icache miss 112841 # 8.6% icache miss rate
l1 iTLB miss 19 # 0.007 L1 iTLB per 1000 inst
l2 iTLB miss 0 # 0.000 L2 iTLB per 1000 inst
tlb flush 20 # 0.007 TLB flush per 1000 inst
Intel metrics confirm this mostly fits in L2.
elapsed 1734.703
on_cpu 0.954 # 15.27 / 16 cores
utime 26481.930
stime 8.946
nvcsw 4448 # 1.94%
nivcsw 224452 # 98.06%
inblock 232192 # 133.85/sec
onblock 1672 # 0.96/sec
cpu-clock 26492114005862 # 26492.114 seconds
task-clock 26492213728024 # 26492.214 seconds
page faults 4176737 # 157.659/sec
context switches 237392 # 8.961/sec
cpu migrations 720 # 0.027/sec
major page faults 1185 # 0.045/sec
minor page faults 4175552 # 157.614/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 7993213284230 # 125.831 branches per 1000 inst
branch misses 317680959033 # 3.97% branch miss
conditional 7993213304070 # 125.831 conditional branches per 1000 inst
indirect 957180953650 # 15.068 indirect branches per 1000 inst
slots 101458979008178 #
retiring 38305050542440 # 37.8% (37.8%)
-- ucode 15622965302918 # 15.4%
-- fastpath 22682085239522 # 22.4%
frontend 15044634792875 # 14.8% (14.8%)
-- latency 9756766671882 # 9.6%
-- bandwidth 5287868120993 # 5.2%
backend 36285486754358 # 35.8% (35.8%)
-- cpu 19056232806245 # 18.8%
-- memory 17229253948113 # 17.0%
speculation 12508840876154 # 12.3% (12.3%) high
-- branch mispredict 12503313301386 # 12.3%
-- pipeline restart 5527574768 # 0.0%
smt-contention 0 # 0.0% ( 0.0%)
cpu-cycles 58413973424562 # 2.63 GHz
instructions 66873400693973 # 1.14 IPC
l2 access 4534993088315 # 117.338 l2 access per 1000 inst
l2 miss 124937358537 # 2.75% l2 miss
cpu-cycles 42799687469272 # 32.2% memory latency
load stalls 12514227325963 # 0.0% l1 bound
l1 miss 24697721254255 # 57.1% l2 bound
l2 miss 245017596355 # 0.4% l3 bound
l3 miss 85590861371 # 0.2% dram bound
store_stalls 1285007532453 # 3.0% store bound
Process overview shows one primesieve process taking most of the time.
458 processes
102 primesieve 188624.58 174.53
68 clinfo 16.53 6.32
38 vulkaninfo 1.15 0.96
4 vulkani:disk$0 0.13 0.11
6 glxinfo:gdrv0 0.13 0.06
6 glxinfo:gl0 0.13 0.06
6 php 0.08 0.15
2 llvmpipe-0 0.07 0.06
2 llvmpipe-1 0.07 0.06
2 llvmpipe-2 0.07 0.06
2 llvmpipe-3 0.07 0.06
2 glxinfo 0.07 0.02
2 glxinfo:cs0 0.07 0.02
2 glxinfo:disk$0 0.07 0.02
2 glxinfo:sh0 0.07 0.02
2 glxinfo:shlo0 0.07 0.02
6 clang 0.06 0.06
2 llvmpipe-10 0.06 0.05
2 llvmpipe-11 0.06 0.05
2 llvmpipe-12 0.06 0.05
2 llvmpipe-13 0.06 0.05
2 llvmpipe-14 0.06 0.05
2 llvmpipe-15 0.06 0.05
2 llvmpipe-4 0.06 0.05
2 llvmpipe-5 0.06 0.05
2 llvmpipe-6 0.06 0.05
2 llvmpipe-7 0.06 0.05
2 llvmpipe-8 0.06 0.05
2 llvmpipe-9 0.06 0.05
3 rocminfo 0.03 0.00
1 lspci 0.00 0.02
1 ps 0.00 0.01
84 sh 0.00 0.00
13 gcc 0.00 0.00
10 gsettings 0.00 0.00
8 stat 0.00 0.00
8 systemd-detect- 0.00 0.00
6 llvm-link 0.00 0.00
6 primesieve-test 0.00 0.00
5 phoronix-test-s 0.00 0.00
4 gmain 0.00 0.00
2 cc 0.00 0.00
2 dconf worker 0.00 0.00
2 lscpu 0.00 0.00
2 uname 0.00 0.00
2 which 0.00 0.00
2 xset 0.00 0.00
1 date 0.00 0.00
1 dirname 0.00 0.00
1 dmesg 0.00 0.00
1 dmidecode 0.00 0.00
1 grep 0.00 0.00
1 ifconfig 0.00 0.00
1 ip 0.00 0.00
1 lsmod 0.00 0.00
1 mktemp 0.00 0.00
1 qdbus 0.00 0.00
1 readlink 0.00 0.00
1 realpath 0.00 0.00
1 sed 0.00 0.00
1 sort 0.00 0.00
1 stty 0.00 0.00
1 systemctl 0.00 0.00
1 template.sh 0.00 0.00
1 wc 0.00 0.00
1 xrandr 0.00 0.00
0 processes running
47 maximum processes
Computation structure is straightforward.
405387) primesieve-test cpu=4 start=6.71 finish=23.67
405388) primesieve cpu=0 start=6.71 finish=23.67
405389) primesieve cpu=6 start=6.71 finish=23.60
405390) primesieve cpu=1 start=6.71 finish=23.59
405391) primesieve cpu=15 start=6.71 finish=23.60
405392) primesieve cpu=8 start=6.71 finish=23.57
405393) primesieve cpu=9 start=6.71 finish=23.57
405394) primesieve cpu=3 start=6.71 finish=23.57
405395) primesieve cpu=4 start=6.72 finish=23.62
405396) primesieve cpu=0 start=6.72 finish=23.58
405397) primesieve cpu=11 start=6.72 finish=23.67
405398) primesieve cpu=10 start=6.72 finish=23.64
405399) primesieve cpu=1 start=6.72 finish=23.65
405400) primesieve cpu=7 start=6.72 finish=23.63
405401) primesieve cpu=11 start=6.72 finish=23.57
405402) primesieve cpu=14 start=6.72 finish=23.60
405403) primesieve cpu=12 start=6.72 finish=23.62
405404) primesieve cpu=13 start=6.72 finish=23.67
