Upcoming Talk: kNN Search and CBIR

I’ve been invited by Prof. Francisco Pelaez and Prof. Camila Barione of the Centre of Mathematics Computing and Cognition of the Federal University of ABC to give my talk on kNN Search and CBIR (Content Based Image Retrieval).

I will discuss my past work, showing the three methods I’ve proposed during my thesis on high-dimensional multimedia indexing on large databases. But I also discuss some of my new research pursuits, related to the use of very discriminant local descriptors, like SIFT, on complex semantic queries, which require generalisation.

The talk, in Portuguese, will be on Tuesday May 26, at the Block B, room A801 of the Federal University of the ABC, which is located at the Rua Santa Adélia, 166, Santo André — SP, Brazil, CEP 09210-170. Their phone number is +55 11 4996-3166.

If you have a Google Calendar, you can save the date by clicking below:

Upcoming Talk: Three New Methods for kNN Search

Prof. Ricardo Torres has invited me to the Institute of Computing of the State University of Campinas, where I am giving a talk on the work I’ve done on my thesis. I will explore the challenges of kNN search (also known as k nearest neighbours search, or simply similarity search) and discuss the three original methods I’ve proposed: the 3-way trees, which are based on the traditional KD-Tree with the addition of redundant overlapping nodes; the projection KD-Forests, my first attempt of using an index composed of multiple moderate-dimensional sub-indexes; and finally the Multicurves, an index based on the use of multiple moderate-dimensional space-filling curves, which has several nice properties like ease of implementation, dynamicity (tolerance to insertions and deletions without performance degradation) and avoidance of random accesses (thus making secondary-memory implementation easier).

The talk will be in Portuguese.

Language Mishaps: Looking for Perfection

I moved through the thesis using a mixture of 90% Java, 7% Python, 2% BASH scripting and 1% C programming (not C++, but to my defence, I’ve always used the C99 standard – so I am not that old-fashioned).

When I’ve first mentioned that I intended to do all the thesis prototyping in Java, reactions were overwhelmingly negative, with gloomy cautionary tales of insufferable slowness, and unbearable memory footprint. I’ve kept those criticisms in mind, but imbued of scientific scepticism, I started prudently to perform some of the experiments in Java.

Python from XKCD, by Randall MunroeSoon I noticed that my prototypes were slower indeed, but not that slower (certainly less than 10× slower, usually less than 2× slower) and that the memory footprint was higher indeed, but not so high that experiments would become unfeasible. But mainly I’ve noticed that my prototypes were much more reliable than the ones written by the C/C++ programmers (everyone else), thus having less tendency to dump the core in front of the Ph.D. supervisor at the meetings. (It’s not that I program better than my colleagues, it’s just the wonders of having a garbage collector).

I was happy with my choice, and I’ve even gently mocked my colleagues when they defended that real programmers do pointer arithmetic and manual memory deallocation (but they know that I was kidding when I said that C++ is a language whose only purpose is to print at random one of the messages “Hello, world !” and “Segmentation Fault: Core Dumped”).

* * *

But despite the success of that “scientific prototyping experience”, Java has never conquered my heart. All languages have their small annoyances, but my gripe with Java comes from quite an essential design choice: the separation between primitive and reference types. This is one of those essential character flaws, and would only be forgivable if Java provided both autoboxing and a very clever optimising compiler. So far, only half of the solution is available: you can pretend at design time that your Integers are ints (and vice-versa), but try doing it in a critical inner loop, and at runtime you’ll become the laughing stock of all C++ programmers of the lab.

I’ve recently flirted with C# (which apparently handles this problem in a better way), but the lack of a general multiplatform implementation has made me shy of adopting it. One of the wonderful things of Java is that I can develop in Windows, my desktop system, and then run in whatever system is available at the University infrastructure: Windows, Linux, Sun’s UNIX – the integration is absolutely seamless.

I have also considered taking the “XKCD Approach” and do everything in Python, but though I really like Python for scripting, I shudder at the idea of relying exclusively on dynamic typing for anything larger than 500 lines of code.

* * *

In a way, this quest for perfection is all the fault of Bertrand Meyer, and of having read his books in my young naïve years. I am still waiting for The Definitive Programming Language. The one with a static type system which make theoreticians drool, and yet is uncannily intuitive to use. The one with a huge Standard Library, yet with a gentle learning curve. The one whose spoonfuls of syntactic sugar helps the medicinal exception handling code go down in a most elegant way.

I suspect I shall be waiting for a good while.

Paper on CIKM ’08

I’ve presented the paper “High-Dimensional Descriptor Indexing for Large Multimedia Databases” in the Session 5C : Information Retrieval — Multilingual and Multimedia of the 17th ACM Conference on Information and Knowledge Management — CIKM 2008. This paper presents the third (and most interesting, in my opinion) method for multidimentional indexing proposed in my Ph.D. thesis.

Here’s the abstract:

“In this paper we address the subject of large multimedia database indexing for content-based retrieval. We introduce multicurves, a new scheme for indexing high-dimensional descriptors. This technique, based on the simultaneous use of moderate-dimensional space-filling curves, has as main advantages the ability to handle high-dimensional data (100 dimensions and over), to allow the easy maintenance of the indexes (inclusion and deletion of data), and to adapt well to secondary storage, thus providing scalability to huge databases (millions, or even thousands of millions of descriptors). We use multicurves to perform the approximate k nearest neighbors search with a very good compromise between precision and speed. The evaluation of multicurves, carried out on large databases, demonstrates that the strategy compares well to other up-to-date k nearest neighbor search strategies. We also test multicurves on the real-world application of image identification for cultural institutions. In this application, which requires the fast search of a large amount of local descriptors, multicurves allows a dramatic speed-up in comparison to the brute-force strategy of sequential search, without any noticeable precision loss.”

The paper and some related material can be found in my publications page.

Paper on DocEng ’08

I’ve presented the paper “Fast Identification of Visual Documents Using Local Descriptors” on the 8th ACM Symposium on Document Engineering — DocEng 2008. This paper had a stronger emphasis in the application (image identification, also known as copy detection or — my least favourite nomenclature — near-duplicate detection) than in the technicalities of multimedia indexing and descriptor matching (which were, in the end of the day, the staple of my Ph.D. work). But it was the opportunity to tell the world about the projection KD-forest, the second (among three) original technique for high-dimensional  indexing.

Anyway, here’s the abstract: “In this paper we introduce a system for the identification of visual documents. Since it stems from content-based document indexing and retrieval, our system does not need to rely on textual annotations, watermarks or other metadata, which can be missing or incorrect. Our retrieval system is based on local descriptors, which have been shown to provide accurate and robust description. Because of the high computational costs associated to the matching of local descriptors, we propose Projection KD-Forest: an indexing technique which allows efficient approximate k nearest neighbors search. Experiments demonstrate that the Projection KD-Forest allows the system to provide prompt results with negligible loss on accuracy. The Projection KD-Forest also compares well when contrasted to other strategies of k nearest neighbors search.”

I guess that the dream of all conference organisers is to amass hundreds of participants and to have an acceptance rate as close to zero as possible. This edition of DocEng was nothing like that — I think that we were 60 or 70 counting everyone, authors and participants. Yet, in terms of exchange of scientific ideas, it was one of the best conferences I’ve ever participated.

The paper, and related material, can be found in my publications page.

Doctor, at last !

I’ve passed the viva-voce defense of my Ph.D. thesis “Local-Descriptor Matching for Image Identification Systems”, thus completing the program.

I won’t pretend that it is not a relief  to have survived the process, it certainly wasn’t easy!

But I have been lucky enough to work with a subject about which I am passionate, and with people who are amazing. So, even though I’ve got the much sought after title of “Dr.” I expect to keep on the same fruitful research track (subject, team) for a while.

The abstract of the thesis: “Image identification (or copy detection) consists in retrieving the original from which a query image possibly derives, as well as any related metadata, such as titles, authors, copyright information, etc. The task is challenging because of the variety of transformations that the original image may have suffered. Image identification systems based on local descriptors have shown excellent efficacy, but often suffer from efficiency issues, since hundreds, even thousands of descriptors, have to be matched in order to find a single image. The objective of our work is to provide fast methods for descriptor matching, by creating efficient ways to perform the k-nearest neighbours search in high-dimensional spaces. In this way, we can gain the advantages from the use of local descriptors, while minimising the efficiency issues. We propose three new methods for the k-nearest neighbours search: the 3-way trees — an improvement over the KD-trees using redundant, overlapping nodes; the projection KD-forests — a technique which uses multiple moderate dimensional KD-trees; and the multicurves, which is based on multiple moderate dimensional Hilbert space-filling curves. Those techniques try to reduce the amount of random access to the data, in order to be well adapted to the implementation in secondary memory.”

The full text, and other goodies, are available in my publications page.

Paper on ICHIM ’07

I’ve presented a demonstration “Content Based Image Identification on Cultural Databases” on the Conference ICHIM 2007 — Digital Culture and Heritage, in Toronto, Canada.

I’ve had been on ICHIM before, in 2003 (my first year of Ph.D., in France !) when it was hosted at the École du Louvre. I remember clearly how excited I was of finally discovering a community interested exactly in my main subject of interest at the time: the interface between Computer Science and Cultural Heritage.

ICHIM’07 was a great conference, with plenty of exchange.

The paper and the presentation can be found in my publications page.

Poster on ICDAR ’07

I’ve presented the paper “Matching Local Descriptors for Image Identification on Cultural Databases” on a poster session of the 9th International Conference on Document Analysis and Recognition — ICDAR 2007. This paper was about the problem of image identification (or copy detection) and the 3-way tree, the first original method I have devised in my Ph.D. work for the indexing of high-dimensional multimedia descriptors.

Here’s the abstract: “In this paper we present a new method for high-dimensional descriptor matching, based on the KD-Tree, which is a classic method for nearest neighbours search. This new method, which we name 3-Way Tree, avoids the boundary effects that disrupt the KD-Tree in higher dimensionalities, by the addition of redundant, overlapping sub-trees. That way, more precision is ob-tained for the same querying times. We evaluate our method in the context of image identification for cul-tural collections, a task which can greatly benefit from the use of high-dimensional local descriptors computed around PoI (Points of Interest).”

Poster sessions can be either a hit or a miss, depending on how well the conference organisation integrates them to the other activities going around. Fortunately, in ICDAR 2007 the Posters were very next to the heart of the conference, the place where everything happens: the snacks table. Therefore the sessions were very interactive and interesting.

The fulltext of the paper and the poster can be found in my publications page.

Paper on ICIP ’07

My Ph.D. co-supervisor, Prof. Matthieu Cord, has presented our paper “3-way Trees: A Similarity Search Method for High-Dimensional Descriptor Matching” in the 14th IEEE International Conference on Image Processing — ICIP 2007 . This paper was about the eponymous 3-way trees, the first original method proposed in my Ph.D. work for the indexing of high-dimensional multimedia descriptors.

Here is the abstract: “In this paper we look into the problem of high-dimensional local descriptor matching for image identification on cultural databases, presenting an important improvement over a classic method, the KD-Tree. Our method, the 3-Way Tree, uses redundant, overlapping sub-trees, in order to avoid the boundary effects that disrupt the KD-Tree in higher dimensionalities, achieving more precision for the same querying times.”

The fulltext can be found in my publications page.

Paper on GMAI ’06

I’ve presented a paper “Content-based Retrieval of Images for Cultural Institutions using Local Descriptors” in the Conference Geometric Modelling and Imaging — New Trends — GMAI 2006.

This was the first paper published about my thesis results, in a moment were I was convinced that local descriptors were “the way to go” but wasn’t still sure of how I would address their performance issues — by choosing smaller (low-dimensional) descriptors? Trying to reduce the dimensionality of the descriptors I had already chosen? Reducing the number of descriptors in the database? In the query? All those avenues seemed reasonable at the time.

Here is the abstract: “The task of identifying an image whose metadata are missing is often demanded from cultural image collections holders, such as museums and archives. The query image may present distortions (cropping, rescaling rotations, colour changes, noise…) from the original, which poses an additional complication. The majority of proposed solutions are based on classic image signatures, such as the colour histogram. Our approach, however, follows computer vision methods, and is based on local descriptors. In this paper we describe our approach, explain the SIFT method on which it is based and compared it to the Multiscale-CCV, an established scheme employed in a large scale practical
system. We demonstrate experimentally the efficacy of our approach, which achieved a 99,2% success rate, against 61,0% for the Multiscale-CCV, in a database of photos, drawings and paintings.”

The fulltext of the paper and related material can be found in my publications page.