Faiss has difficulty installing on my AMD and Intel systems with the following error.

-- Found SWIG: /usr/bin/swig4.0 (found version "4.0.2") found components: python
CMake Error at /usr/local/cmake-3.24.2-linux-x86_64/share/cmake-3.24/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find Python (missing: Python_INCLUDE_DIRS Python_LIBRARIES
  Python_NumPy_INCLUDE_DIRS Development NumPy Development.Module
  Development.Embed)
Call Stack (most recent call first):
  /usr/local/cmake-3.24.2-linux-x86_64/share/cmake-3.24/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
  /usr/local/cmake-3.24.2-linux-x86_64/share/cmake-3.24/Modules/FindPython.cmake:561 (find_package_handle_standard_args)
  faiss/python/CMakeLists.txt:122 (find_package)


-- Configuring incomplete, errors occurred!
See also "/home/mev/.phoronix-test-suite/installed-tests/pts/faiss-1.0.1/faiss-1.7.4/build/CMakeFiles/CMakeOutput.log".
make: Nothing to be done for 'faiss'.
make: *** No rule to make target 'demo_sift1M'.  Stop.

It does run on my 7950x and so numbers below are running on 7950x. There are two separate test cases, the sift1M appears to run in parallel and the polysemous_sift1m appears to run mostly sequentially.

Topdown profile suggests the first workload is mostly backend bound and the second workload has high amounts of branch mis-prediction. There is also a increasing retirement rate at end of the second runs.

AMD metrics are a composite of the two workloads (and probably useful separating these out). This has a fair amount of both floating point and branch instructions. The L2 access is moderate with a reasonable amount of misses. The backend stalls are more from memory than cpu. Speculation summary is only 1% which seems low compared to the chart above. Perhaps some of the on-cpu also plays a factor here? In any case useful to separate out since the workloads are different.

elapsed              869.631
on_cpu               0.363          # 11.63 / 32 cores
utime                7824.815
stime                2286.367
nvcsw                345115         # 0.36%
nivcsw               95899855       # 99.64%
inblock              32             # 0.04/sec
onblock              45312          # 52.10/sec
cpu-clock            10112254056407 # 10112.254 seconds
task-clock           10112344576263 # 10112.345 seconds
page faults          5412507        # 535.238/sec
context switches     96249164       # 9517.987/sec
cpu migrations       18452          # 1.825/sec
major page faults    91             # 0.009/sec
minor page faults    5412416        # 535.229/sec
alignment faults     0              # 0.000/sec
emulation faults     0              # 0.000/sec
branches             4484265783595  # 148.633 branches per 1000 inst
branch misses        376658179764   # 8.40% branch miss
conditional          3031234199466  # 100.471 conditional branches per 1000 inst
indirect             24800777321    # 0.822 indirect branches per 1000 inst
cpu-cycles           55108837758302 # 1.94 GHz
instructions         31810485809678 # 0.58 IPC low
slots                109890768993654 #
retiring             11452160474280 # 10.4% (11.8%) low
-- ucode             87545229664    #     0.1%
-- fastpath          11364615244616 #    10.3%
frontend             23175513915315 # 21.1% (23.9%)
-- latency           19277081880546 #    17.5%
-- bandwidth         3898432034769  #     3.5%
backend              61367037088759 # 55.8% (63.2%)
-- cpu               11100116100320 #    10.1%
-- memory            50266920988439 #    45.7%
speculation          1062297792803  #  1.0% ( 1.1%)
-- branch mispredict 1061377780753  #     1.0%
-- pipeline restart  920012050      #     0.0%
smt-contention       12833670979854 # 11.7% ( 0.0%)
cpu-cycles           55493401749010 # 1.95 GHz
instructions         31934426546886 # 0.58 IPC low
instructions         10611812357394 # 24.240 l2 access per 1000 inst
l2 hit from l1       140728666702   # 37.90% l2 miss
l2 miss from l1      7684990404     #
l2 hit from l2 pf    26683890472    #
l3 hit from l2 pf    36223074555    #
l3 miss from l2 pf   53590562992    #
instructions         10607128804634 # 148.187 float per 1000 inst
float 512            62             # 0.000 AVX-512 per 1000 inst
float 256            988            # 0.000 AVX-256 per 1000 inst
float 128            1571838885481  # 148.187 AVX-128 per 1000 inst
float MMX            0              # 0.000 MMX per 1000 inst
float scalar         4              # 0.000 scalar per 1000 inst
instructions         2600738        #
opcache              995459         # 382.760 opcache per 1000 inst
opcache miss         531806         # 53.4% opcache miss rate
l1 dTLB miss         5466           # 2.102 L1 dTLB per 1000 inst
l2 dTLB miss         1442           # 0.554 L2 dTLB per 1000 inst
instructions         2673428        #
icache               1336023        # 499.742 icache per 1000 inst
icache miss          112389         #  8.4% icache miss rate
l1 iTLB miss         10             # 0.004 L1 iTLB per 1000 inst
l2 iTLB miss         0              # 0.000 L2 iTLB per 1000 inst
tlb flush            19             # 0.007 TLB flush per 1000 inst

Process tree overview has demo_sift1M and python processes about equal amounts of user time, though demo_sift1M has a lot more system time.

760 processes
	189 demo_sift1M          451914.12 154385.23
	238 python3              41545.00   489.77
	 70 vulkaninfo               4.13     2.79
	  6 vulkani:disk$0           0.36     0.24
	  6 clinfo                   0.20     0.14
	  2 llvmpipe-0               0.12     0.08
	  2 llvmpipe-1               0.12     0.08
	  2 llvmpipe-10              0.12     0.08
	  2 llvmpipe-11              0.12     0.08
	  2 llvmpipe-12              0.12     0.08
	  2 llvmpipe-13              0.12     0.08
	  2 llvmpipe-14              0.12     0.08
	  2 llvmpipe-15              0.12     0.08
	  2 llvmpipe-16              0.12     0.08
	  2 llvmpipe-17              0.12     0.08
	  2 llvmpipe-18              0.12     0.08
	  2 llvmpipe-19              0.12     0.08
	  2 llvmpipe-2               0.12     0.08
	  2 llvmpipe-20              0.12     0.08
	  2 llvmpipe-21              0.12     0.08
	  2 llvmpipe-22              0.12     0.08
	  2 llvmpipe-23              0.12     0.08
	  2 llvmpipe-24              0.12     0.08
	  2 llvmpipe-25              0.12     0.08
	  2 llvmpipe-26              0.12     0.08
	  2 llvmpipe-27              0.12     0.08
	  2 llvmpipe-28              0.12     0.08
	  2 llvmpipe-29              0.12     0.08
	  2 llvmpipe-3               0.12     0.08
	  2 llvmpipe-30              0.12     0.08
	  2 llvmpipe-31              0.12     0.08
	  2 llvmpipe-4               0.12     0.08
	  2 llvmpipe-5               0.12     0.08
	  2 llvmpipe-6               0.12     0.08
	  2 llvmpipe-7               0.12     0.08
	  2 llvmpipe-8               0.12     0.08
	  2 llvmpipe-9               0.12     0.08
	  6 php                      0.05     0.25
	  6 glxinfo:gdrv0            0.05     0.06
	  6 glxinfo:gl0              0.05     0.06
	  2 glxinfo                  0.03     0.02
	  2 glxinfo:cs0              0.03     0.02
	  2 glxinfo:disk$0           0.03     0.02
	  2 glxinfo:sh0              0.03     0.02
	  2 glxinfo:shlo0            0.03     0.02
	 72 sh                       0.00     0.00
	 13 gcc                      0.00     0.00
	  8 stat                     0.00     0.00
	  8 systemd-detect-          0.00     0.00
	  6 faiss                    0.00     0.00
	  6 gsettings                0.00     0.00
	  5 dconf worker             0.00     0.00
	  5 gmain                    0.00     0.00
	  5 phoronix-test-s          0.00     0.00
	  2 cc                       0.00     0.00
	  2 lscpu                    0.00     0.00
	  2 uname                    0.00     0.00
	  2 which                    0.00     0.00
	  2 xset                     0.00     0.00
	  1 date                     0.00     0.00
	  1 dirname                  0.00     0.00
	  1 dmesg                    0.00     0.00
	  1 dmidecode                0.00     0.00
	  1 grep                     0.00     0.00
	  1 ifconfig                 0.00     0.00
	  1 ip                       0.00     0.00
	  1 lsmod                    0.00     0.00
	  1 lspci                    0.00     0.00
	  1 mktemp                   0.00     0.00
	  1 ps                       0.00     0.00
	  1 python                   0.00     0.00
	  1 readlink                 0.00     0.00
	  1 realpath                 0.00     0.00
	  1 sed                      0.00     0.00
	  1 sort                     0.00     0.00
	  1 stty                     0.00     0.00
	  1 systemctl                0.00     0.00
	  1 template.sh              0.00     0.00
	  1 wc                       0.00     0.00
	  1 xrandr                   0.00     0.00
0 processes running
80 maximum processes

Computation block for the initial workload

      312167) faiss            cpu=31 start=4.86  finish=107.25
        312168) demo_sift1M      cpu=7 start=4.87  finish=107.23
          312169) demo_sift1M      cpu=25 start=4.87  finish=107.22
          312170) demo_sift1M      cpu=10 start=4.87  finish=107.23
          312171) demo_sift1M      cpu=29 start=4.87  finish=107.23
          312172) demo_sift1M      cpu=30 start=4.87  finish=107.23
          312173) demo_sift1M      cpu=15 start=4.87  finish=107.23
          312174) demo_sift1M      cpu=4 start=4.87  finish=107.23
          312175) demo_sift1M      cpu=9 start=4.87  finish=107.23
          312176) demo_sift1M      cpu=20 start=4.87  finish=107.23
          312177) demo_sift1M      cpu=5 start=4.87  finish=107.23
          312178) demo_sift1M      cpu=10 start=4.87  finish=107.23
          312179) demo_sift1M      cpu=16 start=4.87  finish=107.23
          312180) demo_sift1M      cpu=28 start=4.87  finish=107.22
          312181) demo_sift1M      cpu=2 start=4.87  finish=107.22
          312182) demo_sift1M      cpu=18 start=4.87  finish=107.22
          312183) demo_sift1M      cpu=27 start=4.87  finish=107.22
          312184) demo_sift1M      cpu=23 start=4.87  finish=107.22
          312185) demo_sift1M      cpu=12 start=4.87  finish=107.22
          312186) demo_sift1M      cpu=26 start=4.87  finish=107.22
          312187) demo_sift1M      cpu=31 start=4.87  finish=107.22
          312188) demo_sift1M      cpu=1 start=4.87  finish=107.22
          312189) demo_sift1M      cpu=31 start=4.87  finish=107.22
          312190) demo_sift1M      cpu=0 start=4.87  finish=107.22
          312191) demo_sift1M      cpu=14 start=4.87  finish=107.22
          312192) demo_sift1M      cpu=24 start=4.87  finish=107.22
          312193) demo_sift1M      cpu=13 start=4.87  finish=107.22
          312194) demo_sift1M      cpu=15 start=4.87  finish=107.22
          312195) demo_sift1M      cpu=29 start=4.87  finish=107.22
          312196) demo_sift1M      cpu=3 start=4.87  finish=107.22
          312197) demo_sift1M      cpu=11 start=4.87  finish=107.22
          312198) demo_sift1M      cpu=8 start=4.87  finish=107.22
          312199) demo_sift1M      cpu=24 start=4.87  finish=107.22
          312200) demo_sift1M      cpu=4 start=4.91  finish=107.23
          312201) demo_sift1M      cpu=15 start=4.91  finish=107.23
          312202) demo_sift1M      cpu=23 start=4.91  finish=107.23
          312203) demo_sift1M      cpu=17 start=4.91  finish=107.23
          312204) demo_sift1M      cpu=2 start=4.91  finish=107.23
          312205) demo_sift1M      cpu=19 start=4.91  finish=107.23
          312206) demo_sift1M      cpu=6 start=4.91  finish=107.23
          312207) demo_sift1M      cpu=21 start=4.91  finish=107.23
          312208) demo_sift1M      cpu=16 start=4.91  finish=107.23
          312209) demo_sift1M      cpu=22 start=4.91  finish=107.23
          312210) demo_sift1M      cpu=1 start=4.91  finish=107.23
          312211) demo_sift1M      cpu=3 start=4.91  finish=107.23
          312212) demo_sift1M      cpu=18 start=4.91  finish=107.23
          312213) demo_sift1M      cpu=31 start=4.91  finish=107.23
          312214) demo_sift1M      cpu=20 start=4.91  finish=107.23
          312215) demo_sift1M      cpu=5 start=4.91  finish=107.23
          312216) demo_sift1M      cpu=0 start=4.91  finish=107.23
          312217) demo_sift1M      cpu=17 start=4.91  finish=107.23
          312218) demo_sift1M      cpu=4 start=4.91  finish=107.23
          312219) demo_sift1M      cpu=7 start=4.91  finish=107.23
          312220) demo_sift1M      cpu=3 start=4.91  finish=107.23
          312221) demo_sift1M      cpu=5 start=4.91  finish=107.23
          312222) demo_sift1M      cpu=22 start=4.91  finish=107.23
          312223) demo_sift1M      cpu=23 start=4.91  finish=107.23
          312224) demo_sift1M      cpu=20 start=4.91  finish=107.23
          312225) demo_sift1M      cpu=15 start=4.91  finish=107.23
          312226) demo_sift1M      cpu=31 start=4.91  finish=107.23
          312227) demo_sift1M      cpu=21 start=4.91  finish=107.23
          312228) demo_sift1M      cpu=1 start=4.91  finish=107.23
          312229) demo_sift1M      cpu=18 start=4.91  finish=107.23
          312230) demo_sift1M      cpu=2 start=4.91  finish=107.23

Computation blocks for the second workload

      312381) faiss            cpu=13 start=338.65 finish=516.02
        312382) python3          cpu=29 start=338.66 finish=516.02
          312383) python3          cpu=21 start=338.67 finish=516.02
          312384) python3          cpu=31 start=338.67 finish=516.02
          312385) python3          cpu=4 start=338.67 finish=516.02
          312386) python3          cpu=24 start=338.67 finish=516.02
          312387) python3          cpu=0 start=338.67 finish=516.02
          312388) python3          cpu=9 start=338.67 finish=516.02
          312389) python3          cpu=2 start=338.67 finish=516.02
          312390) python3          cpu=26 start=338.67 finish=516.02
          312391) python3          cpu=19 start=338.67 finish=516.02
          312392) python3          cpu=27 start=338.67 finish=516.02
          312393) python3          cpu=6 start=338.67 finish=516.02
          312394) python3          cpu=12 start=338.67 finish=516.02
          312395) python3          cpu=7 start=338.67 finish=516.02
          312396) python3          cpu=14 start=338.67 finish=516.02
          312397) python3          cpu=1 start=338.67 finish=516.02
          312398) python3          cpu=13 start=338.67 finish=516.02
          312399) python3          cpu=18 start=338.67 finish=516.02
          312400) python3          cpu=15 start=338.67 finish=516.02
          312401) python3          cpu=22 start=338.67 finish=516.02
          312402) python3          cpu=8 start=338.67 finish=516.02
          312403) python3          cpu=23 start=338.67 finish=516.02
          312404) python3          cpu=25 start=338.67 finish=516.02
          312405) python3          cpu=3 start=338.67 finish=516.02
          312406) python3          cpu=10 start=338.67 finish=516.02
          312407) python3          cpu=5 start=338.67 finish=516.02
          312408) python3          cpu=11 start=338.67 finish=516.02
          312409) python3          cpu=20 start=338.67 finish=516.02
          312410) python3          cpu=28 start=338.67 finish=516.02
          312411) python3          cpu=16 start=338.67 finish=516.02
          312412) python3          cpu=30 start=338.67 finish=516.02
          312413) python3          cpu=17 start=338.67 finish=516.02
          312414) python3          cpu=23 start=338.88 finish=516.02
          312415) python3          cpu=14 start=338.88 finish=516.02
          312416) python3          cpu=16 start=338.88 finish=516.02
          312417) python3          cpu=15 start=338.88 finish=516.02
          312418) python3          cpu=18 start=338.88 finish=516.02
          312419) python3          cpu=24 start=338.88 finish=516.02
          312420) python3          cpu=3 start=338.88 finish=516.02
          312421) python3          cpu=9 start=338.88 finish=516.02
          312422) python3          cpu=5 start=338.88 finish=516.02
          312423) python3          cpu=10 start=338.88 finish=516.02
          312424) python3          cpu=6 start=338.88 finish=516.02
          312425) python3          cpu=27 start=338.88 finish=516.02
          312426) python3          cpu=17 start=338.88 finish=516.02
          312427) python3          cpu=28 start=338.88 finish=516.02
          312428) python3          cpu=4 start=338.88 finish=516.02
          312429) python3          cpu=29 start=338.88 finish=339.21
          312430) python3          cpu=2 start=338.88 finish=339.21
          312431) python3          cpu=30 start=338.88 finish=339.21
          312432) python3          cpu=16 start=338.88 finish=339.21
          312433) python3          cpu=31 start=338.88 finish=339.21
          312434) python3          cpu=4 start=338.88 finish=339.21
          312435) python3          cpu=8 start=338.88 finish=339.21
          312436) python3          cpu=22 start=338.88 finish=339.21
          312437) python3          cpu=9 start=338.88 finish=339.21
          312438) python3          cpu=1 start=338.88 finish=339.21
          312439) python3          cpu=10 start=338.88 finish=339.21
          312440) python3          cpu=3 start=338.88 finish=339.21
          312441) python3          cpu=11 start=338.88 finish=339.21
          312442) python3          cpu=4 start=338.88 finish=339.21
          312443) python3          cpu=12 start=338.88 finish=339.21
          312444) python3          cpu=21 start=338.88 finish=339.21
          312445) python3          cpu=13 start=340.83 finish=516.02
          312446) python3          cpu=19 start=340.83 finish=516.02
          312447) python3          cpu=30 start=340.83 finish=516.02
          312448) python3          cpu=0 start=340.83 finish=516.02
          312449) python3          cpu=31 start=340.83 finish=516.02
          312450) python3          cpu=7 start=340.83 finish=516.02
          312451) python3          cpu=8 start=340.83 finish=516.02
          312452) python3          cpu=2 start=340.83 finish=516.02
          312453) python3          cpu=25 start=340.83 finish=516.02
          312454) python3          cpu=21 start=340.83 finish=516.02
          312455) python3          cpu=26 start=340.83 finish=516.02
          312456) python3          cpu=22 start=340.83 finish=516.02
          312457) python3          cpu=11 start=340.83 finish=516.02
          312458) python3          cpu=1 start=340.83 finish=516.02
          312459) python3          cpu=12 start=340.83 finish=516.02
          312460) python3          cpu=20 start=340.83 finish=516.02