GPU-Based Locality Sensitive Hashing for Approximate k-Nearest Neighbour Search

Over recent years, the amount of structured data has increased immensely, where new approaches are required for fast retrieval of similar data samples. Locality sensitive hashing (LSH) is a new method for finding approximate neighbours within high-dimensional data space. LSH works by projecting high-dimensional data into random low-dimensional planes, which are then also bucketed. Two high-dimensional data samples should be with high probability in the same or neighbouring buckets. Therefore, the accuracy speed trade-off can be tuned for specific datasets, as the accuracy depends on the number of planes considered. LSH is generally used for finding approximate k-nearest neighbours (k-nn). Although LSH itself has sublinear time for the retrieval of k-nn of a given data sample, it still takes considerable amount of time for processing of large data, especially when creating a k-nn graph. We have introduced a parallelization of the LSH method using general purpose computing on graphical processing unit (GPGPU), based on NVIDIA CUDA technology. The method was applied for the content-based image retrieval over large dataset of optical and SAR (synthetic aperture radar) satellite imagery, as shown in the above figure.