Copyright (C) 2008 Georges Quénot, LIG-CNRS. Version 1.01 Last revision: December 22, 2008. This file is part of KNNLSB (K Nearest Neighbors Linear Scan Baseline). KNNLSB is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. KNNLSB is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with KNNLSB. If not, see . KNNLSB is an implementation of a linear scan search for the K Nearest Neighbors (KNN) optimized for multi-core architectures using threads. The search for nearest neigbors is done for one or more "query" vector(s) in a set of "document" vectors. Cache use is also optimized and the program is especially efficient when the k nearest neighbors are searched for for several query vectors simultaneously. The search is done by the knnp() function. The getnorm() function can help to speed up the search when the norm of the document vectors can be pre-computed but its use is not necessary. The nearest neighbors can be searched for either using the classical Euclidian distance or the "cosine distance", which correspond to the angle between the vectors, or the chi square distance d(x,y) = sqrt(sum((xi-yi)²/(xi+yi))). This software can be used: - for any application using the KNN search; - for measuring the performance of a multi-core machine, a C compiler or a combination of both; - as a baseline for other programs whose goal is to do better than linear scan, either for exact or approximate KNN search. SYNOPSIS #include "knnp.h" fdata *getnorm(fdata **m, size_t n, size_t nc, int dc); KEY **knnp(fdata **ma, fdata **mb, fdata *sa, size_t na, size_t nb, size_t nc, size_t nk, size_t np, size_t cs, int dc, int sta); DESCRIPTION knnp() computes the nk nearest neighbors of the mb vectors within the ma vectors. na : number of "document" vectors; nb : number of "query" vectors; nc : number of vector components; nk : number of searched nearest neighbors; ma : document vectors, two-dimensional (na*nk) array of fdata; mb : query vectors, two-dimensional (nb*nk) array of fdata; sa : pre-computed norm of document vectors using getnorm or NULL. np : number of threads to be run in parallel or 0 to let the program set it to the number of available cores if get_nb_cores() works. cs : specify the cache size on the system or 0 to let the program get it from the system if get_cache_size() works. dc : 2 : use chi square distance; 1 : use angle between vectors; 0 : use Euclidian distance. sta : print some statistcal information : knnp execution time, number of floating point operations per second, main memory bandwidth, cache bandwidth (per core). fdata is defined in knnp.h. It may be either float or double. FDATA_MAX should be defined accordingly as FLT_MAX or DBL_MAX. Default is float. getnorm() pre-computes the norms of a set of vectors. The square of the norm for the euclidian distance and the inverse of the norm for the "cosine" distance is actually computed and returned. RETURN VALUE knnp() returns a two-dimensional (nb*nk) arrays of KEY structures: typedef struct { int k; fdata d; } KEY; There is one vector of KEY structures for each query vector. k : index of the jth nearest neighbor; d : distance to the jth nearest neighbor; getnorm() returns a one-dimensional array of fdata. On failure for any reason, knnp() and getnorm() return NULL. NOTES New in version 1.01: chi2 "distance" implementation and speed improvement for large numbers of nearest neighbors. knnp() and getnorm() are the functions doing the main job (calling also the knn() function for the multi-threaded part). malloc1a(), malloc2a, free1a() and free2a() can be used to allocate and free one- and two-dimensional arrays of objects. They keep track of the current and maximum amount of used memory. These can be retrieved using UsedMemory() and MaxUsedMemory(). get_nb_cores() and get_cache_size() returns the number of available cores and the cache size respectively. knnc() is a function which is compatible with knnp(). It is provided for illustration and checking purposes. Its code should be much easier to understand but it is also much slower. The knncheck() function can be used to compare the results of the knnp() and knnc() functions. There may be slight differences since the computations are done in a different way and in a different order. The differences are normally only due to rounding errors in floating points operations. The computed distances should be very close anyway, even when computations are done in single precision. The order of the nearest neighbors could also be different for tied or quasi tied. knnp() is especially efficicent if na >> nk, nb >> 1 and nc >> 1. It is still efficient anyway even if some of these conditions are not satisfied. The asymptotic execution time is in O(na*nb*nc/np) (np is here the number of available cores). The value of nk has little influence if it remains small (up to a few tens). Using float or double for fdata has little influence on computation time in most cases. It also has very little effect on the result accuracy. It indeed has a 2:1 effect on the amount of memory necessary to store the data. A version supporting sparse vectors might be released soon. BUGS No known bug so far. get_nb_cores() and get_cache_size() may not be very portable. SAMPLE PROGRAM The sample program test.c is provided to illustrate the use of the provided functions. It generates random document and query vectors and executes the KNN search on them while printing performance statistics and do some checking. Its arguments are: -NA (int) : specifies a value for na, default is 100000; -NB (int) : specifies a value for nb, default is 100; -NC (int) : specifies a value for nc, default is 1000; -NK (int) : specifies a value for nk, default is 10; -NP (int) : specifies a maximum value for np, default is the number of available cores in the system; -COS : use cosine distance instead of euclidian; -PRE : pre-compute the norms of document vectors; -CHK : check knnp() against knnc(). test.c can be compiled with: gcc -O -o test test.c knnp.c -lm -lpthread A 64-bit compiled version using the Intel C Compiler is also provided for evaluation purposes. It should run on recent versions of 64-bit Linux. It was compiled using: icc -static -O -o test test.c knnp.c -lm -lpthread The following performance was obtained for this version on an 8-core (two Intel(R) Xeon(R) CPU E5430 @ 2.66GHz, 6 MBytes cache, FSB @ 1.33GHz) system (test program run with all default options): 1-thread: 6.778133 seconds, 2.95 Gflops, 0.012 GBytes/s, 11.803 GBytes/s 2-thread: 3.392843 seconds, 5.89 Gflops, 0.024 GBytes/s, 11.790 GBytes/s 3-thread: 2.267715 seconds, 8.82 Gflops, 0.035 GBytes/s, 11.759 GBytes/s 4-thread: 1.703759 seconds, 11.74 Gflops, 0.047 GBytes/s, 11.739 GBytes/s 5-thread: 1.403265 seconds, 14.25 Gflops, 0.057 GBytes/s, 11.402 GBytes/s 6-thread: 1.195420 seconds, 16.73 Gflops, 0.067 GBytes/s, 11.154 GBytes/s 7-thread: 1.053076 seconds, 18.99 Gflops, 0.076 GBytes/s, 10.853 GBytes/s 8-thread: 0.890687 seconds, 22.45 Gflops, 0.090 GBytes/s, 11.227 GBytes/s The floating point performance is computed on the scalar product computations only ("d += vb[ic]*va[ic];" equals 2 flops). The first bandwidth measure is the one between the main memory and all cores for the fetching of the document vectors (everything else is supposed to remain in the cache). The second bandwidth measure is the one between the L2 cache and a processing core for the fetching of the query vectors (this bandwidth is per core). CONTACT Feedback to: Georges.Quenot@imag.fr