In Barcelona for NIPS’2016

I’m in Barcelona now for NIPS’2016 — or should I say, for NIPS’ Symposia and Workshops, since the main conference this year… sold out. That’s at once exciting and frightening: is machine learning the next

Anyways — science ! We’re participating, with posters, in two workshops: Adversarial Training (on Friday, 9th) and Bayesian Deep Learning (on Saturday, 10th). Will you be there, let’s talk!

The papers:

Adversarial Images for Variational Autoencoders

Pedro Tabacof, Julia Tavares, Eduardo Valle

We investigate adversarial attacks for autoencoders. We propose a procedure that distorts the input image to mislead the autoencoder in reconstructing a completely different target image. We attack the internal latent representations, attempting to make the adversarial input produce an internal representation as similar as possible as the target’s. We find that autoencoders are much more robust to the attack than classifiers: while some examples have tolerably small input distortion, and reasonable similarity to the target image, there is a quasi-linear trade-off between those aims. We report results on MNIST and SVHN datasets, and also test regular deterministic autoencoders, reaching similar conclusions in all cases. Finally, we show that the usual adversarial attack for classifiers, while being much easier, also presents a direct proportion between distortion on the input, and misdirection on the output. That proportionality however is hidden by the normalization of the output, which maps a linear layer into non-linear probabilities.

The fulltext here:

🔔 The organizers of the workshop have opened a reddit thread for the public to ask questions. We have a subthread there — ask us anything! 🔔

Known Unknowns: Uncertainty Quality in Bayesian Neural Networks

Ramon Oliveira, Pedro Tabacof, Eduardo Valle

We evaluate the uncertainty quality in neural networks using anomaly detection. We extract uncertainty measures (e.g. entropy) from the predictions of candidate models, use those measures as features for an anomaly detector, and gauge how well the detector differentiates known from unknown classes. We assign higher uncertainty quality to candidate models that lead to better detectors. We also propose a novel method for sampling a variational approximation of a Bayesian neural network, called One-Sample Bayesian Approximation (OSBA). We experiment on two datasets, MNIST and CIFAR10. We compare the following candidate neural network models: Maximum Likelihood, Bayesian Dropout, OSBA, and — for MNIST — the standard variational approximation. We show that Bayesian Dropout and OSBA provide better uncertainty information than Maximum Likelihood, and are essentially equivalent to the standard variational approximation, but much faster.

The fulltext here:

Deep networks can be seen as hierarchical generalized linear models.

My postgraduate offer for 2016/1 : Deep Learning From a Statistician’s Viewpoint

With few exceptions, my postgraduate offers follow a pattern. On the second semester, I offer my “101” Multimedia Information Retrieval course, which introduces multimedia representations, machine learning, computer vision, and… information retrieval. On the first semester, I offer a topics course, usually following a book : so far we have explored Bishop’s PRML, Hofstadter’s GEB, and Jaynes’ “Probability Theory”.

For 2016/1, I’m risking something different :

“Artificial Intelligence is trending again, and much of the buzz is due to Deep Neural Networks. For long considered untrainable, Deep Networks were boosted by a leap in computing power, and in data availability.

Deep Networks stunned the world by classifying images into thousands of categories with accuracy, by writing fake wikipedia articles with panache, and by playing difficult videogames with competence.

My aim here is a less “neural” path to deep models. Let us take the biological metaphors with a healthy dose of cynicism and seek explanations instead in statistics, in information theory, in probability theory. Remember linear regression ? Deep models are multi-layered generalized linear models whose parameters are learned by maximum likelihood. Let us start from there and then explore the most promising avenues leading to the current state of the art.

This course will be nothing like your typical classroom experience. There will be no lectures. We will meet once a week for a presencial session to discuss previous work, and plan our attack for the next week. I’ll expect you to continue working throughout the week. There will be no exams. I’ll grade your work based on participation during the sessions, progress between sessions, self assessment, and peer assessment.

Active participation will be mandatory. This means (surprise !) talking in public. Everyone will be learning together, so all of us must accept the risk to be wrong. This course won’t work for those who always want to appear wise and knowledgeable. The course will be in English.

Deep networks can be seen as hierarchical generalized linear models.

Deep networks can be seen as hierarchical generalized linear models.

We’ll be a cozy small group : at most 12 students. I’ll select the candidates based on a letter of intentions, and on previous experience. Write a short e-mail to No need to be fancy : just state your reasons for participating, and any previous experience (academic, professional, and extra-curricular) with Machine Learning, Statistics, Probability, or Information Theory.

This course is not for beginners, nor for the faint of heart. We are jumping in head first at the deep (tee hee !) end. After all, we will delve into one of the most engaging intellectual frontier of our time. I dare you to join us !”

Very important ! If you want to enroll at this course without being enrolled at the program (what UNICAMP awfully calls “special students”), you have to do you pre-enrollment until 7/Dec/2015 (hard deadline !). Even if you are enrolled at the program (“regular student”) send me your application at most until 31/Dec/2015, because I’ll select regular and special (urgh !) students at the same time.

EDIT 20/01 : I have sent the acceptance notices — looking forward to work with a swell group of very motivated students !

What : Post-graduate course for the Master or Doctorate in Electrical Engineering program of UNICAMP (4 credits)

When : 2016/1st semester — mandatory presencial meetings Tuesdays from 19 to 21h ; support meetings same day from 16 to 18h

Image credit : composite from Ramón y Cajal 1st publication showing a cerebellum cut, and scatterplots from Fisher’s iris dataset drawn by Indon~commonswiki, wikimediacommons.

Deep Neural Network

From instance launch to model accuracy: an AWS/Theano walkthrough

My team has recently participated at Kaggle’s Diabetic Retinopathy challenge, and we won… experience. It was our first Kaggle challenge and we found ourselves unprepared for the workload.

But it was fun — and it was the opportunity to learn new skills, and to sharpen old ones. As the deadline approached, I used Amazon Web Services a lot, and got more familiar with it. Although we have our GPU infrastructure at RECOD, the extra boost provided by AWS allowed exploring extra possibilities.

But it was in the weekend just before the challenge deadline that AWS proved invaluable. Our in-house cluster went AWOL. What are the chances of having a power outage bringing down your servers and a pest control blocking your physical access to them in the weekend before a major deadline ? Murphy knows. Well, AWS allowed us to go down fighting, instead of throwing in the towel.

In this post, I’m compiling Markus Beissinger’s how-to and tutorials into a single hyper-condensed walkthrough to get you as fast as possible from launching an AWS instance until running a simple convolutional deep neural net. If you are anything like me, I know that you are aching to see some code running — but after you scratch that itch, I strongly suggest you to go back to those sources and study them at leisure.

I’ll assume that you already know :

  1. How to create an AWS account ;
  2. How to manage AWS users and permissions ;
  3. How to launch an AWS instance.

Those preparations out of way, let’s get started ?

Step 1: Launch an instance at AWS, picking :

  • AMI (Amazon Machine Image) : Ubuntu Server 14.04 LTS (HVM), SSD Volume Type – 64-bit
  • Instance type : GPU instances / g2.2xlarge

For the other settings, you can use the defaults, but be careful with the security group and access key to not lock yourself out of the instance.

Step 2 : Open a terminal window, and log into your instance. In my Mac I type :

ssh -i private.pem

Where private.pem is the private key file of the key pair used when creating the instance, and is the public DNS of the instance. You might get an angry message from SSH, complaining that your .pem file is too open. If that happens, change its permissions with :

chmod go-rxw private.pem

Step 3 : Install Theano.

Once you’re inside the machine, this is not complicated. Start by making the machine up-to date :

sudo apt-get update
sudo apt-get -y dist-upgrade

Install Theano’s dependencies :

sudo apt-get install -y gcc g++ gfortran build-essential git wget linux-image-generic libopenblas-dev python-dev python-pip python-nose python-numpy python-scipy

Get the package for CUDA and install it :

sudo dpkg -i cuda-repo-ubuntu1404_7.0-28_amd64.deb
sudo apt-get update
sudo apt-get install -y cuda

This last command is the only one that takes some time — you might want to go brew a cuppa while you wait. Once it is over, put CUDA on the path and reboot the machine :

echo 'export PATH=/usr/local/cuda/bin:$PATH' >> .bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> .bashrc
sudo reboot

Log into the instance again and query for the GPU :

nvidia-smi -q

This should spit a lengthy list of details about the installed GPU.

Now you just have to install Theano. The one-liner below installs the latest version, and after the wait for the CUDA driver, runs anticlimactically fast :

sudo pip install --upgrade --no-deps git+git://

And that’s it ! You have Theano on your system.

Step 4 : Run an example.
Let’s take Theano for a run. The simplest sample from that’s already interesting is the convolutional/MNIST digits tutorial. The sample depends on code written in the previous tutorials, MLP and Logistic Regression, so you have to download those too. You also have to download the data. The commands below do all that:

mkdir theano
mkdir theano/data
mkdir theano/lenet
cd theano/data
cd ../lenet

Finally, hit the magical command :


What ? All this work and the epochs will go by as fast as molasses on a winter’s day. What gives ?

You have to tell Theano to run on the GPU, otherwise it will crawl on the CPU. You can paste the lines below into your ~/.theanorc file :




…or you can use the one-liner below to create it:

echo -e '[global]\nfloatX=float32\ndevice=gpu\n\n[lib]\ncnmem=0.9\n\n[nvcc]\nfastmath=True' > ~/.theanorc

Try running the example again.


With some luck, you’ll note two differences: first, Theano will announce the use of the GPU…

Using gpu device 0: GRID K520 (CNMeM is enabled)

…and second, the epochs will run much, much faster !

(Image credit :

Recently accepted papers on ICIP and SIBGRAPI

I had a paper accepted on ICIP, the IEEE International Conference on Image Processing, by my student Sandra de Avila (whose main supervisor is my former M.Sc. supervisor Prof. Arnaldo de Araújo). Sandra is currently in France at the prestigious LIP6 lab, under the supervision of my former Ph.D. supervisior Prof. Matthieu Cord and our colleague Prof. Nicolas Thome. The paper presents an interesting extension to the “bag of visual words” approach (which is based on quantized local features using a codebook / “visual dictionary”), taking into consideration an histogram of the distances between the features effectively found on the images and the features chosen to compose the codebook. Here’s the title and abstract:

Bossa: Extended BoW Formalism for Image Classification
In image classification, the most powerful statistical learning approaches are based on the Bag-of-Words paradigm. In this article, we propose an extension of this formalism. Considering the Bag-of-features, dictionary coding and pooling steps, we propose to focus on the pooling step. Instead of using the classical sum or max pooling strategies, we introduced a density function-based pooling strategy. This flexible formalism allows us to better represent the links between dictionary codewords and local descriptors in the resulting image signature. We evaluate our approach in two very challenging tasks of video and image classification, involving very high level semantic categories with large and nuanced visual diversity.

I’ve also had two papers accepted on our counterpart national conference, SIBGRAPI. The first is the work of the Ph.D. student Ana Lopes and her R.A. Elerson Santos (supervised by Prof. Arnaldo Araújo and co-supervised by Prof. Jussara Almeida; I give her some technical and nontechnical support every now and then). It concerns the use of transfer learning of concepts from (static) image datasets to video datasets in order to recognize human actions. We show that learning the concepts present on the Caltech256 dataset allow a classifier to obtain improved results on the challenging “in the wild” human action Hollywood2 dataset.

Transfer Learning for Human Action Recognition
To manually collect action samples from realistic videos is a time-consuming and error-prone task. This is a serious bottleneck to research related to video understanding, since the large intra-class variations of such videos demand training sets large enough to properly encompass those variations. Most authors dealing with this issue rely on (semi-) automated procedures to collect additional, generally noisy, examples. In this paper, we exploit a different approach, based on a Transfer Learning (TL) technique, to address the target task of action recognition. More specifically, we propose a framework that transfers the knowledge about concepts from a previously labeled still image database to the target action video database. It is assumed that, once identified in the target action database, these concepts provide some contextual clues to the action classifier. Our experiments with Caltech256 and Hollywood2 databases indicate: a) the feasibility of successfully using transfer learningtechniques to detect concepts and, b) that it is indeed possible to enhance action recognition with the transferred knowledge of even a few concepts. In our case, only four concepts were enough to obtain statistically significant improvements for most actions.

The second is the work of my Ph.D. student Marcelo Coelho and his R.A. Cássio dos Santos Jr. (again his main supervisor is Prof. Arnaldo de Araújo). It concerns the clean-up of noisy SIFT features of street-view images (urban façades). We have found out that subspace clustering, a non-supervised technique, is able to isolate clusters of useful and non-useful SIFT features for the task of retrieving a target image. The challenge is identifying a priori which cluster is the relevant one. This work compares and contrasts two subspace clustering techniques: FINDIT (based on dimension voting) and MSSC (based on a fuzzy mean-shift).

Subspace Clustering for Information Retrieval in Urban Scene Databases
We present a comprehensive study of two important subspace clustering algorithms and their contribution to enhance results from the difficult task of matching images taken of the same object using different devices at different conditions. Our experiments were done on two distinct databases containing urban scenes which were tested using state-of-the-art matching algorithms. After initial evaluation of both datasets by that procedure, clustering algorithms were applied to them. An exhaustive comparison was performed in every cluster found and a significant amelioration in the results was obtained.

I’ll put a link to the preprints as soon as I have they become available.

Reasoning for Complex Data

Together with Prof. Anderson Rocha, Prof. Jacques Wainer, Prof. Ricardo Torres (my Post Doc advisor, by the way) and Prof. Siome Goldenstein, we have recently founded a new laboratory at the Computing Institute of the State University of Campinas (UNICAMP).

The new lab — which we named RECOD — aims to embrace the research subjects of machine learning, multimedia retrieval and classification, multimodality and digital forensics.

The foundation of this new lab both celebrates a history of fruitful colaboration between its participating members and inaugurates a new phase of tighter cooperation, in which the synergy of our complementary competencies will be fostered in an optimized environment.

I cannot avoid to be proud that my colleagues have accepted both my name and logo suggestions for the  new lab.

Long live RECOD !

RECOD Lab Logotype, with the lab motto "reasoning for complex data"

Paper Accepted at MIR 2010

Our paper, “Learning to Rank for Content-Based Image Retrieval” , was accepted at the upcoming ACM Multimedia Information Retrieval Conference (MIR 2010). The first author is the M.Sc. student Fábio Faria, and the paper was co-authored with my Post Doc supervisor Ricardo Torres and several of our partners from UFMG, including Marcos Gonçalves, with whom we have an ongoing cooperation.

Here is the abstract:

“In Content-based Image Retrieval (CBIR), accurately ranking the returned images is of paramount importance, since users consider mostly the topmost results. The typical ranking strategy used by many CBIR systems is to employ image content descriptors, so that returned images that are most similar to the query image are placed higher in the rank. While this strategy is well accepted and widely used, improved results may be obtained by combining multiple image descriptors. In this paper we explore this idea, and introduce algorithms that learn to combine information coming from different descriptors. The proposed learning to rank algorithms are based on three diverse learning techniques: Support Vector Machines (CBIR-SVM), Genetic Programming (CBIR-GP), and Association Rules (CBIR-AR). Eighteen image content descriptors (color, texture, and shape information) are used as input and provided as training to the learning algorithms. We performed a systematic evaluation involving two complex and heterogeneous image databases (Corel e Caltech) and two evaluation measures (Precision and MAP). The empirical results show that all learning algorithms provide significant gains when compared to the typical ranking strategy  in which descriptors are used in isolation. We concluded that, in general, CBIR-AR and CBIR-GP outperforms CBIR-SVM. A fine-grained analysis revealed the lack of correlation between the results provided by CBIR-AR and the results provided by the other two algorithms, which indicates the opportunity of an advantageous hybrid approach.”

I will be travelling to Philadelphia on late March to present the poster. I am very excited about this upcoming trip to the United States, where I am to meet several friends and colleagues, but at the same time, worried about the radicalization of air security rules and the exaggeration of perception of threats. Have we got so scared to die that we decided instead not to live ?