My colleague George Teodoro and I, together with a handful of collaborators, had a paper accepted at the prestigious VLDB Journal, “Approximate Similarity Search for Online Multimedia Services on Distributed CPU–GPU Platforms”. The paper considerably extends our previous work on parallel similarity search engines, incorporating the use of accelerators (GPUs) and needed scheduler algorithms to make the use of those accelerators efficient for online applications.
Here’s the abstract:
Similarity search in high-dimensional spaces is a pivotal operation for several database applications, including online content-based multimedia services. With the increasing popularity of multimedia applications, these services are facing new challenges regarding (i) the very large and growing volumes of data to be indexed/searched and (ii) the necessity of reducing the response times as observed by end-users. In addition, the nature of the interactions between users and online services creates fluctuating query request rates throughout execution, which requires a similarity search engine to adapt to better use the computation platform and minimize response times. In this work, we address these challenges with Hypercurves, a flexible framework for answering approximate k-nearest neighbor (kNN) queries for very large multimedia databases. Hypercurves executes in hybrid CPU–GPU environments and isable to attain massive query processing rates through the cooperative use of these devices. Hypercurves also changes its CPU–GPU task partitioning dynamically according to the observed load, aiming for optimal response times. In our empirical evaluation, dynamic task partitioning reduced query response times by approximately 50% compared to the best static task partition. Due to a probabilistic proof of equivalence to the sequential kNN algorithm, the CPU–GPU execution of Hypercurves in distributed (multi-node) environments can be aggressively optimized, attaining superlinear scalability while still guaranteeing, with high probability, results at least as good as those from the sequential algorithm.
The paper is yet to appear, but the last preprint before publisher corrections is available at my publications page.
Update 31/07 : The publisher’s version is already online, with DOI 10.1007/s00778-013-0329-7.