Open source library for quantitative finance written in C++. Two workloads, the first runs on all 16 cores and the latter is only two cores (listed as single-threaded).

This code has a high retirement rate and a low number of frontend stalls.

AMD metrics show on average we run on half the cores with a moderate amount of floating point, not many L2 misses and a high retirement rate.

elapsed              239.451
on_cpu               0.541          # 8.66 / 16 cores
utime                2065.423
stime                7.805
nvcsw                4633           # 14.38%
nivcsw               27581          # 85.62%
inblock              0              # 0.00/sec
onblock              13160          # 54.96/sec
cpu-clock            2073306190419  # 2073.306 seconds
task-clock           2073321140694  # 2073.321 seconds
page faults          2947742        # 1421.749/sec
context switches     33177          # 16.002/sec
cpu migrations       1346           # 0.649/sec
major page faults    82             # 0.040/sec
minor page faults    2947660        # 1421.709/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             2559158486088  # 137.454 branches per 1000 inst
branch misses        4546709879     # 0.18% branch miss
conditional          1792036393402  # 96.251 conditional branches per 1000 inst
indirect             296331568425   # 15.916 indirect branches per 1000 inst
cpu-cycles           8147071650405  # 2.12 GHz
instructions         18623586468548 # 2.29 IPC
slots                16294264396806 #
retiring             6411101162579  # 39.3% (63.0%)
-- ucode             51614158594    #     0.3%
-- fastpath          6359487003985  #    39.0%
frontend             973719191859   #  6.0% ( 9.6%)
-- latency           661631081958   #     4.1%
-- bandwidth         312088109901   #     1.9%
backend              2670978253953  # 16.4% (26.3%)
-- cpu               1852113077242  #    11.4%
-- memory            818865176711   #     5.0%
speculation          112642574434   #  0.7% ( 1.1%)
-- branch mispredict 89165824746    #     0.5%
-- pipeline restart  23476749688    #     0.1%
smt-contention       6125810500089  # 37.6% ( 0.0%)
cpu-cycles           8153806283473  # 2.12 GHz
instructions         18614267485562 # 2.28 IPC
instructions         6204973623074  # 13.313 l2 access per 1000 inst
l2 hit from l1       65552599571    # 6.74% l2 miss
l2 miss from l1      1533617239     #
l2 hit from l2 pf    13019558189    #
l3 hit from l2 pf    3987750228     #
l3 miss from l2 pf   45252945       #
instructions         6207650287054  # 131.654 float per 1000 inst
float 512            107            # 0.000 AVX-512 per 1000 inst
float 256            77880          # 0.000 AVX-256 per 1000 inst
float 128            817261944049   # 131.654 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         32             # 0.000 scalar per 1000 inst

Intel metrics

elapsed              976.488
on_cpu               0.778          # 12.44 / 16 cores
utime                12129.067
stime                19.617
nvcsw                12945          # 9.55%
nivcsw               122641         # 90.45%
inblock              52656          # 53.92/sec
onblock              2240           # 2.29/sec
cpu-clock            12148886067864 # 12148.886 seconds
task-clock           12148922790298 # 12148.923 seconds
page faults          11001944       # 905.590/sec
context switches     140037         # 11.527/sec
cpu migrations       3387           # 0.279/sec
major page faults    594            # 0.049/sec
minor page faults    11001350       # 905.541/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             12200124995178 # 137.798 branches per 1000 inst
branch misses        25542654236    # 0.21% branch miss
conditional          12200125023466 # 137.798 conditional branches per 1000 inst
indirect             3350072945374  # 37.838 indirect branches per 1000 inst
slots                60608212198784 #
retiring             48586807424801 # 80.2% (80.2%)
-- ucode             3956469197097  #     6.5%
-- fastpath          44630338227704 #    73.6%
frontend             9060412040521  # 14.9% (14.9%)
-- latency           3984241307781  #     6.6%
-- bandwidth         5076170732740  #     8.4%
backend              2034236705457  #  3.4% ( 3.4%)
-- cpu               1227430698305  #     2.0%
-- memory            806806007152   #     1.3%
speculation          1657739725109  #  2.7% ( 2.7%)
-- branch mispredict 1471736909095  #     2.4%
-- pipeline restart  186002816014   #     0.3%
smt-contention       0              #  0.0% ( 0.0%)
cpu-cycles           16929758726185 # 1.70 GHz
instructions         44769453310654 # 2.64 IPC
l2 access            241331604127   # 8.271 l2 access per 1000 inst
l2 miss              17781280151    # 7.37% l2 miss

Process structure is straightforward

452 processes
	102 quantlib-benchm       2058.50     6.04
	 68 clinfo                  16.59     5.98
	 38 vulkaninfo               0.57     1.68
	  6 glxinfo:gdrv0            0.14     0.07
	  6 php                      0.07     0.09
	  6 clang                    0.07     0.05
	  2 glxinfo                  0.07     0.03
	  2 glxinfo:cs0              0.07     0.03
	  2 glxinfo:disk$0           0.07     0.03
	  2 glxinfo:sh0              0.07     0.03
	  2 glxinfo:shlo0            0.07     0.03
	  4 vulkani:disk$0           0.06     0.17
	  2 llvmpipe-0               0.03     0.09
	  2 llvmpipe-1               0.03     0.09
	  2 llvmpipe-10              0.03     0.09
	  2 llvmpipe-11              0.03     0.09
	  2 llvmpipe-12              0.03     0.09
	  2 llvmpipe-13              0.03     0.09
	  2 llvmpipe-14              0.03     0.09
	  2 llvmpipe-15              0.03     0.09
	  2 llvmpipe-2               0.03     0.09
	  2 llvmpipe-3               0.03     0.09
	  2 llvmpipe-4               0.03     0.09
	  2 llvmpipe-5               0.03     0.09
	  2 llvmpipe-6               0.03     0.09
	  2 llvmpipe-7               0.03     0.09
	  2 llvmpipe-8               0.03     0.09
	  2 llvmpipe-9               0.03     0.09
	  3 rocminfo                 0.03     0.00
	  1 lspci                    0.00     0.02
	  1 ps                       0.00     0.01
	 84 sh                       0.00     0.00
	 13 gcc                      0.00     0.00
	 11 gsettings                0.00     0.00
	  8 stat                     0.00     0.00
	  8 systemd-detect-          0.00     0.00
	  6 llvm-link                0.00     0.00
	  6 quantlib                 0.00     0.00
	  5 phoronix-test-s          0.00     0.00
	  4 gmain                    0.00     0.00
	  2 cc                       0.00     0.00
	  2 lscpu                    0.00     0.00
	  2 uname                    0.00     0.00
	  2 which                    0.00     0.00
	  2 xset                     0.00     0.00
	  1 date                     0.00     0.00
	  1 dconf worker             0.00     0.00
	  1 dirname                  0.00     0.00
	  1 dmesg                    0.00     0.00
	  1 dmidecode                0.00     0.00
	  1 grep                     0.00     0.00
	  1 ifconfig                 0.00     0.00
	  1 ip                       0.00     0.00
	  1 lsmod                    0.00     0.00
	  1 mktemp                   0.00     0.00
	  1 qdbus                    0.00     0.00
	  1 readlink                 0.00     0.00
	  1 realpath                 0.00     0.00
	  1 sed                      0.00     0.00
	  1 sort                     0.00     0.00
	  1 stty                     0.00     0.00
	  1 systemctl                0.00     0.00
	  1 template.sh              0.00     0.00
	  1 wc                       0.00     0.00
	  1 xrandr                   0.00     0.00
0 processes running
47 maximum processes

The core parallel code sections

      12611) quantlib         cpu=9 start=5.87  finish=52.15
        12612) quantlib-benchm  cpu=12 start=5.87  finish=52.15
          12613) quantlib-benchm  cpu=4 start=5.88  finish=52.14
            12615) quantlib-benchm  cpu=0 start=5.88  finish=52.14
          12614) quantlib-benchm  cpu=0 start=5.88  finish=52.14
            12617) quantlib-benchm  cpu=0 start=5.88  finish=52.14
          12616) quantlib-benchm  cpu=10 start=5.88  finish=52.15
            12619) quantlib-benchm  cpu=9 start=5.88  finish=52.14
          12618) quantlib-benchm  cpu=2 start=5.88  finish=52.14
            12621) quantlib-benchm  cpu=2 start=5.88  finish=52.14
          12620) quantlib-benchm  cpu=5 start=5.88  finish=52.15
            12623) quantlib-benchm  cpu=5 start=5.88  finish=52.14
          12622) quantlib-benchm  cpu=14 start=5.88  finish=52.15
            12625) quantlib-benchm  cpu=15 start=5.88  finish=52.14
          12624) quantlib-benchm  cpu=4 start=5.88  finish=52.15
            12627) quantlib-benchm  cpu=11 start=5.88  finish=52.14
          12626) quantlib-benchm  cpu=13 start=5.88  finish=52.15
            12629) quantlib-benchm  cpu=3 start=5.88  finish=52.14
          12628) quantlib-benchm  cpu=6 start=5.88  finish=52.14
            12631) quantlib-benchm  cpu=6 start=5.88  finish=52.14
          12630) quantlib-benchm  cpu=4 start=5.88  finish=52.14
            12632) quantlib-benchm  cpu=13 start=5.88  finish=52.14
          12633) quantlib-benchm  cpu=7 start=5.88  finish=52.15
            12635) quantlib-benchm  cpu=7 start=5.88  finish=52.14
          12634) quantlib-benchm  cpu=8 start=5.88  finish=52.14
            12637) quantlib-benchm  cpu=4 start=5.88  finish=52.14
          12636) quantlib-benchm  cpu=10 start=5.88  finish=52.14
            12639) quantlib-benchm  cpu=10 start=5.88  finish=52.14
          12638) quantlib-benchm  cpu=8 start=5.88  finish=52.14
            12641) quantlib-benchm  cpu=8 start=5.88  finish=52.14
          12640) quantlib-benchm  cpu=4 start=5.88  finish=52.15
            12643) quantlib-benchm  cpu=14 start=5.89  finish=52.14
          12642) quantlib-benchm  cpu=10 start=5.88  finish=52.15
            12644) quantlib-benchm  cpu=1 start=5.89  finish=52.14