A Mandelbrot fractal generator with four workloads using different parallelism methods: TBB, OpenMP, C++ tasks and C++ threads. Also showing different levels of parallelism with C++ threads the highest and C++ tasks moderately higher and OpenMP/TBB matching the number of cores.

Topdown profile is surprising in how much the four methods are still similar.

AMD metrics confirm this is floating point code that has very little L2 access. Retirement rate is high.
elapsed 534.380
on_cpu 0.863 # 13.81 / 16 cores
utime 7377.820
stime 2.399
nvcsw 3143 # 0.44%
nivcsw 716953 # 99.56%
inblock 0 # 0.00/sec
onblock 14408 # 26.96/sec
cpu-clock 7382597325204 # 7382.597 seconds
task-clock 7382631028362 # 7382.631 seconds
page faults 345620 # 46.815/sec
context switches 722570 # 97.874/sec
cpu migrations 2794 # 0.378/sec
major page faults 2 # 0.000/sec
minor page faults 345618 # 46.815/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 8083876110088 # 132.798 branches per 1000 inst
branch misses 80137273285 # 0.99% branch miss
conditional 7915372159838 # 130.030 conditional branches per 1000 inst
indirect 64437985819 # 1.059 indirect branches per 1000 inst
cpu-cycles 31071237393434 # 3.63 GHz
instructions 60879442564782 # 1.96 IPC
slots 62176734680100 #
retiring 22110032250570 # 35.6% (64.9%) high
-- ucode 10771596099 # 0.0%
-- fastpath 22099260654471 # 35.5%
frontend 1744368334608 # 2.8% ( 5.1%)
-- latency 894253211706 # 1.4%
-- bandwidth 850115122902 # 1.4%
backend 9010849789479 # 14.5% (26.5%)
-- cpu 8877823608547 # 14.3%
-- memory 133026180932 # 0.2%
speculation 1192503018886 # 1.9% ( 3.5%)
-- branch mispredict 1192479459812 # 1.9%
-- pipeline restart 23559074 # 0.0%
smt-contention 28118884517461 # 45.2% ( 0.0%)
cpu-cycles 31064294398784 # 3.62 GHz
instructions 60860623406151 # 1.96 IPC
instructions 20295654465016 # 0.039 l2 access per 1000 inst
l2 hit from l1 717271512 # 9.91% l2 miss
l2 miss from l1 39658005 #
l2 hit from l2 pf 31850550 #
l3 hit from l2 pf 24623451 #
l3 miss from l2 pf 13728835 #
instructions 20287877027354 # 363.504 float per 1000 inst
float 512 59 # 0.000 AVX-512 per 1000 inst
float 256 616 # 0.000 AVX-256 per 1000 inst
float 128 7374729455923 # 363.504 AVX-128 per 1000 inst
float MMX 0 # 0.000 MMX per 1000 inst
float scalar 1 # 0.000 scalar per 1000 inst
instructions 60874438174522 #
opcache 5446273322626 # 89.467 opcache per 1000 inst
opcache miss 3328964533 # 0.1% opcache miss rate
l1 dTLB miss 111328090 # 0.002 L1 dTLB per 1000 inst
l2 dTLB miss 23225462 # 0.000 L2 dTLB per 1000 inst
instructions 60875026269815 #
icache 7052704318 # 0.116 icache per 1000 inst
icache miss 652940617 # 9.3% icache miss rate
l1 iTLB miss 9231663 # 0.000 L1 iTLB per 1000 inst
l2 iTLB miss 0 # 0.000 L2 iTLB per 1000 inst
tlb flush 38525 # 0.000 TLB flush per 1000 inst
Intel metrics also show high retirement rate and low backend stalls.
elapsed 638.535
on_cpu 0.858 # 13.72 / 16 cores
utime 8759.240
stime 1.467
nvcsw 4116 # 0.49%
nivcsw 842516 # 99.51%
inblock 12336 # 19.32/sec
onblock 3176 # 4.97/sec
cpu-clock 8763097608450 # 8763.098 seconds
task-clock 8763121289241 # 8763.121 seconds
page faults 332556 # 37.949/sec
context switches 849614 # 96.953/sec
cpu migrations 3215 # 0.367/sec
major page faults 93 # 0.011/sec
minor page faults 332463 # 37.939/sec
alignment faults 0 # 0.000/sec
emulation faults 0 # 0.000/sec
branches 8083416801050 # 132.788 branches per 1000 inst
branch misses 56866033035 # 0.70% branch miss
conditional 8083416852346 # 132.788 conditional branches per 1000 inst
indirect 1384972721313 # 22.751 indirect branches per 1000 inst
slots 47671596043970 #
retiring 31887711190056 # 66.9% (66.9%) high
-- ucode 15614157267 # 0.0%
-- fastpath 31872097032789 # 66.9%
frontend 9325764960018 # 19.6% (19.6%)
-- latency 8097339304921 # 17.0%
-- bandwidth 1228425655097 # 2.6%
backend 3227716735681 # 6.8% ( 6.8%) low
-- cpu 2892801822478 # 6.1%
-- memory 334914913203 # 0.7%
speculation 3365729603445 # 7.1% ( 7.1%)
-- branch mispredict 3363574411730 # 7.1%
-- pipeline restart 2155191715 # 0.0%
smt-contention 0 # 0.0% ( 0.0%)
cpu-cycles 25692550556350 # 2.33 GHz
instructions 54368912295309 # 2.12 IPC
l2 access 626319059 # 0.019 l2 access per 1000 inst
l2 miss 189667419 # 30.28% l2 miss
cpu-cycles 16897128544726 # 7.4% memory latency
load stalls 1244125349928 # 7.4% l1 bound
l1 miss 1996402438 # 0.0% l2 bound
l2 miss 1032469931 # 0.0% l3 bound
l3 miss 322943920 # 0.0% dram bound
store_stalls 377235985 # 0.0% store bound
Process overview shows “rm*” for each different type of parallelism
1602 processes
963 rmSTD_THREADS 526068.01 145.63
195 rmSTD_TASKS 117381.36 19.48
48 rmOpenMP 29588.00 2.56
30 rmTBB 18339.68 4.18
68 clinfo 16.52 6.33
38 vulkaninfo 1.34 1.14
4 vulkani:disk$0 0.15 0.12
6 php 0.09 0.09
2 llvmpipe-0 0.07 0.06
2 llvmpipe-1 0.07 0.06
2 llvmpipe-10 0.07 0.06
2 llvmpipe-11 0.07 0.06
2 llvmpipe-12 0.07 0.06
2 llvmpipe-13 0.07 0.06
2 llvmpipe-14 0.07 0.06
2 llvmpipe-15 0.07 0.06
2 llvmpipe-2 0.07 0.06
2 llvmpipe-3 0.07 0.06
2 llvmpipe-4 0.07 0.06
2 llvmpipe-5 0.07 0.06
2 llvmpipe-6 0.07 0.06
2 llvmpipe-7 0.07 0.06
2 llvmpipe-8 0.07 0.06
2 llvmpipe-9 0.07 0.06
6 clang 0.04 0.08
1 lspci 0.00 0.02
3 rocminfo 0.00 0.01
1 ps 0.00 0.01
89 sh 0.00 0.00
14 gsettings 0.00 0.00
13 gcc 0.00 0.00
12 toybrot 0.00 0.00
8 stat 0.00 0.00
8 systemd-detect- 0.00 0.00
6 llvm-link 0.00 0.00
5 phoronix-test-s 0.00 0.00
4 glxinfo 0.00 0.00
2 cc 0.00 0.00
2 lscpu 0.00 0.00
2 setterm 0.00 0.00
2 uname 0.00 0.00
2 which 0.00 0.00
1 date 0.00 0.00
1 dconf worker 0.00 0.00
1 dirname 0.00 0.00
1 dmesg 0.00 0.00
1 dmidecode 0.00 0.00
1 gmain 0.00 0.00
1 grep 0.00 0.00
1 ifconfig 0.00 0.00
1 ip 0.00 0.00
1 lsmod 0.00 0.00
1 mktemp 0.00 0.00
1 qdbus 0.00 0.00
1 readlink 0.00 0.00
1 realpath 0.00 0.00
1 sed 0.00 0.00
1 sort 0.00 0.00
1 stty 0.00 0.00
1 systemctl 0.00 0.00
1 template.sh 0.00 0.00
1 wc 0.00 0.00
18 processes running
349 maximum processes
TBB and OpenMP sections
189095) toybrot cpu=8 start=90.09 finish=128.74
189096) rmTBB cpu=9 start=90.10 finish=128.74
189097) ?? cpu=0 start=90.13 finish=0.00
189099) rmTBB cpu=5 start=90.13 finish=128.74
189102) rmTBB cpu=4 start=90.13 finish=128.74
189109) ?? cpu=0 start=90.13 finish=0.00
189105) rmTBB cpu=14 start=90.13 finish=128.74
189100) rmTBB cpu=8 start=90.13 finish=128.74
189108) rmTBB cpu=12 start=90.13 finish=128.74
189111) rmTBB cpu=9 start=90.13 finish=128.74
189098) rmTBB cpu=3 start=90.13 finish=128.74
189101) rmTBB cpu=6 start=90.13 finish=128.74
189104) rmTBB cpu=13 start=90.13 finish=128.74
189107) rmTBB cpu=15 start=90.13 finish=128.74
189103) rmTBB cpu=7 start=90.13 finish=128.74
189106) rmTBB cpu=0 start=90.13 finish=128.74
189110) rmTBB cpu=11 start=90.13 finish=128.74
189112) sh cpu=5 start=128.74 finish=128.74
189113) sh cpu=2 start=128.74 finish=128.74
189114) toybrot cpu=12 start=138.95 finish=177.74
189115) rmOpenMP cpu=6 start=138.96 finish=177.74
189116) rmOpenMP cpu=9 start=138.99 finish=177.74
189117) rmOpenMP cpu=15 start=138.99 finish=177.74
189118) rmOpenMP cpu=8 start=138.99 finish=177.74
189119) rmOpenMP cpu=5 start=138.99 finish=177.74
189120) rmOpenMP cpu=14 start=138.99 finish=177.74
189121) rmOpenMP cpu=3 start=138.99 finish=177.74
189122) rmOpenMP cpu=4 start=138.99 finish=177.74
189123) rmOpenMP cpu=13 start=138.99 finish=177.74
189124) rmOpenMP cpu=7 start=138.99 finish=177.74
189125) rmOpenMP cpu=0 start=138.99 finish=177.74
189126) rmOpenMP cpu=2 start=138.99 finish=177.74
189127) rmOpenMP cpu=10 start=138.99 finish=177.74
189128) rmOpenMP cpu=1 start=138.99 finish=177.74
189129) rmOpenMP cpu=11 start=138.99 finish=177.74
189130) rmOpenMP cpu=12 start=138.99 finish=177.74
Tasks (and threads) look more like
189167) toybrot cpu=12 start=273.86 finish=312.40
189168) rmSTD_TASKS cpu=7 start=273.86 finish=312.40
189169) rmSTD_TASKS cpu=5 start=273.90 finish=311.35
189170) rmSTD_TASKS cpu=5 start=273.90 finish=311.38
189171) rmSTD_TASKS cpu=10 start=273.90 finish=310.99
189172) rmSTD_TASKS cpu=1 start=273.90 finish=310.67
189173) rmSTD_TASKS cpu=2 start=273.90 finish=310.94
189174) rmSTD_TASKS cpu=11 start=273.90 finish=310.64
189175) rmSTD_TASKS cpu=4 start=273.90 finish=310.25
189176) rmSTD_TASKS cpu=13 start=273.90 finish=309.85
189177) rmSTD_TASKS cpu=8 start=273.90 finish=309.94
189178) rmSTD_TASKS cpu=9 start=273.90 finish=310.13
189179) rmSTD_TASKS cpu=10 start=273.90 finish=310.34
189180) rmSTD_TASKS cpu=5 start=273.90 finish=310.51
189181) rmSTD_TASKS cpu=11 start=273.90 finish=311.01
189182) rmSTD_TASKS cpu=6 start=273.90 finish=310.93
189183) rmSTD_TASKS cpu=4 start=273.90 finish=310.99
189184) rmSTD_TASKS cpu=5 start=273.90 finish=310.71
189185) rmSTD_TASKS cpu=5 start=273.90 finish=310.55
189186) rmSTD_TASKS cpu=13 start=273.90 finish=310.43
189187) rmSTD_TASKS cpu=13 start=273.90 finish=310.40
189188) rmSTD_TASKS cpu=5 start=273.90 finish=310.53
189189) rmSTD_TASKS cpu=0 start=273.90 finish=311.49
189190) rmSTD_TASKS cpu=7 start=273.90 finish=311.45
189191) rmSTD_TASKS cpu=2 start=273.90 finish=311.15
189192) rmSTD_TASKS cpu=11 start=273.90 finish=311.49
189193) rmSTD_TASKS cpu=14 start=273.90 finish=311.55
189194) rmSTD_TASKS cpu=5 start=273.90 finish=311.49
189195) rmSTD_TASKS cpu=12 start=273.90 finish=311.50
189196) rmSTD_TASKS cpu=8 start=273.90 finish=311.93
189197) rmSTD_TASKS cpu=9 start=273.90 finish=311.96
189198) rmSTD_TASKS cpu=10 start=273.90 finish=311.89
189199) rmSTD_TASKS cpu=14 start=273.90 finish=312.27
189200) rmSTD_TASKS cpu=6 start=273.90 finish=312.13
189201) rmSTD_TASKS cpu=13 start=273.90 finish=312.13
189202) rmSTD_TASKS cpu=10 start=273.90 finish=312.14
189203) rmSTD_TASKS cpu=1 start=273.90 finish=312.21
189204) rmSTD_TASKS cpu=2 start=273.90 finish=312.27
189205) rmSTD_TASKS cpu=5 start=273.90 finish=311.94
189206) rmSTD_TASKS cpu=5 start=273.90 finish=312.32
189207) rmSTD_TASKS cpu=0 start=273.90 finish=312.33
189208) rmSTD_TASKS cpu=15 start=273.90 finish=312.39
189209) rmSTD_TASKS cpu=2 start=273.90 finish=312.08
189210) rmSTD_TASKS cpu=2 start=273.90 finish=312.16
189211) rmSTD_TASKS cpu=9 start=273.90 finish=312.29
189212) rmSTD_TASKS cpu=4 start=273.90 finish=312.31
189213) rmSTD_TASKS cpu=11 start=273.90 finish=312.29
189214) rmSTD_TASKS cpu=8 start=273.90 finish=312.25
189215) rmSTD_TASKS cpu=10 start=273.90 finish=311.96
189216) rmSTD_TASKS cpu=4 start=273.90 finish=312.10
189217) rmSTD_TASKS cpu=14 start=273.90 finish=312.01
189218) rmSTD_TASKS cpu=7 start=273.90 finish=312.13
189219) rmSTD_TASKS cpu=0 start=273.90 finish=312.11
189220) rmSTD_TASKS cpu=12 start=273.90 finish=312.16
189221) rmSTD_TASKS cpu=12 start=273.90 finish=312.03
189222) rmSTD_TASKS cpu=14 start=273.90 finish=312.03
189223) rmSTD_TASKS cpu=10 start=273.90 finish=311.37
189224) rmSTD_TASKS cpu=13 start=273.90 finish=311.55
189225) rmSTD_TASKS cpu=9 start=273.90 finish=311.38
189226) rmSTD_TASKS cpu=0 start=273.90 finish=311.51
189227) rmSTD_TASKS cpu=3 start=273.90 finish=311.26
189228) rmSTD_TASKS cpu=1 start=273.90 finish=311.36
189229) rmSTD_TASKS cpu=11 start=273.90 finish=311.53
189230) rmSTD_TASKS cpu=2 start=273.90 finish=310.92
189231) rmSTD_TASKS cpu=14 start=273.90 finish=310.84
189232) rmSTD_TASKS cpu=6 start=273.90 finish=310.83
