We’ve got a paper — Large-Scale Distributed Locality-Sensitive Hashing for General Metric Data — accepted at the upcoming International Conference on Similarity Search and Applications (SISAP’2014).
I’ll be presenting the paper next week : if you’re planning to be there, please, come to say hi ! My session will be on Wednesday afternoon (October 30th at 14h).
The paper is part of an ongoing cooperation with my colleague Prof. George Teodoro (University of Brasilia), with my former M.Sc. student Eliezer Silva, and Petrobras researcher Thiago Teixeira. Here’s the abstract :
Locality-Sensitive Hashing (LSH) is extremely competitive for similarity search, but works under the assumption of uniform access cost to the data, and for just a handful of dissimilarities for which locality-sensitive families are available. In this work we propose Parallel Voronoi LSH, an approach that addresses those two limitations of LSH: it makes LSH efficient for distributed- memory architectures, and it works for very general dissimilarities (in particular, it works for all metric dissimilarities). Each hash table of Voronoi LSH works by selecting a sample of the dataset to be used as seeds of a Voronoi diagram. The Voronoi cells are then used to hash the data. Because Voronoi diagrams depend only on the distance, the technique is very general. Implementing LSH in distributed-memory systems is very challenging because it lacks referential locality in its access to the data: if care is not taken, excessive message-passing ruins the index performance. Therefore, another important contribution of this work is the parallel design needed to allow the scalability of the index, which we evaluate in a dataset of a thousand million multimedia features.