I had a paper accepted on ICIP, the IEEE International Conference on Image Processing, by my student Sandra de Avila (whose main supervisor is my former M.Sc. supervisor Prof. Arnaldo de Araújo). Sandra is currently in France at the prestigious LIP6 lab, under the supervision of my former Ph.D. supervisior Prof. Matthieu Cord and our colleague Prof. Nicolas Thome. The paper presents an interesting extension to the “bag of visual words” approach (which is based on quantized local features using a codebook / “visual dictionary”), taking into consideration an histogram of the distances between the features effectively found on the images and the features chosen to compose the codebook. Here’s the title and abstract:
Bossa: Extended BoW Formalism for Image Classification
In image classification, the most powerful statistical learning approaches are based on the Bag-of-Words paradigm. In this article, we propose an extension of this formalism. Considering the Bag-of-features, dictionary coding and pooling steps, we propose to focus on the pooling step. Instead of using the classical sum or max pooling strategies, we introduced a density function-based pooling strategy. This flexible formalism allows us to better represent the links between dictionary codewords and local descriptors in the resulting image signature. We evaluate our approach in two very challenging tasks of video and image classification, involving very high level semantic categories with large and nuanced visual diversity.
I’ve also had two papers accepted on our counterpart national conference, SIBGRAPI. The first is the work of the Ph.D. student Ana Lopes and her R.A. Elerson Santos (supervised by Prof. Arnaldo Araújo and co-supervised by Prof. Jussara Almeida; I give her some technical and nontechnical support every now and then). It concerns the use of transfer learning of concepts from (static) image datasets to video datasets in order to recognize human actions. We show that learning the concepts present on the Caltech256 dataset allow a classifier to obtain improved results on the challenging “in the wild” human action Hollywood2 dataset.
Transfer Learning for Human Action Recognition
To manually collect action samples from realistic videos is a time-consuming and error-prone task. This is a serious bottleneck to research related to video understanding, since the large intra-class variations of such videos demand training sets large enough to properly encompass those variations. Most authors dealing with this issue rely on (semi-) automated procedures to collect additional, generally noisy, examples. In this paper, we exploit a different approach, based on a Transfer Learning (TL) technique, to address the target task of action recognition. More specifically, we propose a framework that transfers the knowledge about concepts from a previously labeled still image database to the target action video database. It is assumed that, once identified in the target action database, these concepts provide some contextual clues to the action classifier. Our experiments with Caltech256 and Hollywood2 databases indicate: a) the feasibility of successfully using transfer learningtechniques to detect concepts and, b) that it is indeed possible to enhance action recognition with the transferred knowledge of even a few concepts. In our case, only four concepts were enough to obtain statistically significant improvements for most actions.
The second is the work of my Ph.D. student Marcelo Coelho and his R.A. Cássio dos Santos Jr. (again his main supervisor is Prof. Arnaldo de Araújo). It concerns the clean-up of noisy SIFT features of street-view images (urban façades). We have found out that subspace clustering, a non-supervised technique, is able to isolate clusters of useful and non-useful SIFT features for the task of retrieving a target image. The challenge is identifying a priori which cluster is the relevant one. This work compares and contrasts two subspace clustering techniques: FINDIT (based on dimension voting) and MSSC (based on a fuzzy mean-shift).
Subspace Clustering for Information Retrieval in Urban Scene Databases
We present a comprehensive study of two important subspace clustering algorithms and their contribution to enhance results from the difficult task of matching images taken of the same object using different devices at different conditions. Our experiments were done on two distinct databases containing urban scenes which were tested using state-of-the-art matching algorithms. After initial evaluation of both datasets by that procedure, clustering algorithms were applied to them. An exhaustive comparison was performed in every cluster found and a significant amelioration in the results was obtained.
I’ll put a link to the preprints as soon as I have they become available.