In Barcelona for NIPS’2016

I’m in Barcelona now for NIPS’2016 — or should I say, for NIPS’ Symposia and Workshops, since the main conference this year… sold out. That’s at once exciting and frightening: is machine learning the next dot.com?

Anyways — science ! We’re participating, with posters, in two workshops: Adversarial Training (on Friday, 9th) and Bayesian Deep Learning (on Saturday, 10th). Will you be there, let’s talk!

The papers:

Adversarial Images for Variational Autoencoders

Pedro Tabacof, Julia Tavares, Eduardo Valle

We investigate adversarial attacks for autoencoders. We propose a procedure that distorts the input image to mislead the autoencoder in reconstructing a completely different target image. We attack the internal latent representations, attempting to make the adversarial input produce an internal representation as similar as possible as the target’s. We find that autoencoders are much more robust to the attack than classifiers: while some examples have tolerably small input distortion, and reasonable similarity to the target image, there is a quasi-linear trade-off between those aims. We report results on MNIST and SVHN datasets, and also test regular deterministic autoencoders, reaching similar conclusions in all cases. Finally, we show that the usual adversarial attack for classifiers, while being much easier, also presents a direct proportion between distortion on the input, and misdirection on the output. That proportionality however is hidden by the normalization of the output, which maps a linear layer into non-linear probabilities.

The fulltext here:  https://arxiv.org/abs/1612.00155

🔔 The organizers of the workshop have opened a reddit thread for the public to ask questions. We have a subthread there — ask us anything! 🔔

Known Unknowns: Uncertainty Quality in Bayesian Neural Networks

Ramon Oliveira, Pedro Tabacof, Eduardo Valle

We evaluate the uncertainty quality in neural networks using anomaly detection. We extract uncertainty measures (e.g. entropy) from the predictions of candidate models, use those measures as features for an anomaly detector, and gauge how well the detector differentiates known from unknown classes. We assign higher uncertainty quality to candidate models that lead to better detectors. We also propose a novel method for sampling a variational approximation of a Bayesian neural network, called One-Sample Bayesian Approximation (OSBA). We experiment on two datasets, MNIST and CIFAR10. We compare the following candidate neural network models: Maximum Likelihood, Bayesian Dropout, OSBA, and — for MNIST — the standard variational approximation. We show that Bayesian Dropout and OSBA provide better uncertainty information than Maximum Likelihood, and are essentially equivalent to the standard variational approximation, but much faster.

The fulltext here:  https://arxiv.org/abs/1612.01251

My postgraduate offer for 2016/1 : Deep Learning From a Statistician’s Viewpoint

With few exceptions, my postgraduate offers follow a pattern. On the second semester, I offer my “101” Multimedia Information Retrieval course, which introduces multimedia representations, machine learning, computer vision, and… information retrieval. On the first semester, I offer a topics course, usually following a book : so far we have explored Bishop’s PRML, Hofstadter’s GEB, and Jaynes’ “Probability Theory”.

For 2016/1, I’m risking something different :

“Artificial Intelligence is trending again, and much of the buzz is due to Deep Neural Networks. For long considered untrainable, Deep Networks were boosted by a leap in computing power, and in data availability.

Deep Networks stunned the world by classifying images into thousands of categories with accuracy, by writing fake wikipedia articles with panache, and by playing difficult videogames with competence.

My aim here is a less “neural” path to deep models. Let us take the biological metaphors with a healthy dose of cynicism and seek explanations instead in statistics, in information theory, in probability theory. Remember linear regression ? Deep models are multi-layered generalized linear models whose parameters are learned by maximum likelihood. Let us start from there and then explore the most promising avenues leading to the current state of the art.

This course will be nothing like your typical classroom experience. There will be no lectures. We will meet once a week for a presencial session to discuss previous work, and plan our attack for the next week. I’ll expect you to continue working throughout the week. There will be no exams. I’ll grade your work based on participation during the sessions, progress between sessions, self assessment, and peer assessment.

Active participation will be mandatory. This means (surprise !) talking in public. Everyone will be learning together, so all of us must accept the risk to be wrong. This course won’t work for those who always want to appear wise and knowledgeable. The course will be in English.

Deep networks can be seen as hierarchical generalized linear models.

Deep networks can be seen as hierarchical generalized linear models.

We’ll be a cozy small group : at most 12 students. I’ll select the candidates based on a letter of intentions, and on previous experience. Write a short e-mail to dovalle@dca.fee.unicamp.br. No need to be fancy : just state your reasons for participating, and any previous experience (academic, professional, and extra-curricular) with Machine Learning, Statistics, Probability, or Information Theory.

This course is not for beginners, nor for the faint of heart. We are jumping in head first at the deep (tee hee !) end. After all, we will delve into one of the most engaging intellectual frontier of our time. I dare you to join us !”

Very important ! If you want to enroll at this course without being enrolled at the program (what UNICAMP awfully calls “special students”), you have to do you pre-enrollment until 7/Dec/2015 (hard deadline !). Even if you are enrolled at the program (“regular student”) send me your application at most until 31/Dec/2015, because I’ll select regular and special (urgh !) students at the same time.

EDIT 20/01 : I have sent the acceptance notices — looking forward to work with a swell group of very motivated students !

What : Post-graduate course for the Master or Doctorate in Electrical Engineering program of UNICAMP (4 credits)

When : 2016/1st semester — mandatory presencial meetings Tuesdays from 19 to 21h ; support meetings same day from 16 to 18h

Image credit : composite from Ramón y Cajal 1st publication showing a cerebellum cut, and scatterplots from Fisher’s iris dataset drawn by Indon~commonswiki, wikimediacommons.

The devil in the details : feeding images to Theano

Another lesson from the Kaggle competition : pushing your code to ultimate speed is still labour intensive.

High-level libraries like Numpy and Theano really help, but when you are training complex deep-learning models, rushing through the epochs takes more than typing a couple of imports, enabling the GPU, and hoping for the best.

Seemingly trivial tasks — input, output, copying buffers, rearranging arrays — may easily become the critical bottleneck. Those treacherous operations always raise flags for practioners of high-performance computing. The rest of us, as we fixate on numerical computations, tend to overlook that moving data around easily dominates the costs.

Sometimes the bottleneck is a bolt out of the blue, a straightforward task that ends up being painfully slow. Case in point, the pipeline of importing an image from PIL to Theano through a Numpy array.

Opening an image using PIL is certainly very simple :

>>> from PIL import Image
>>> image = Image.open('color.png', 'r')
>>> image
<PIL.PngImagePlugin.PngImageFile image mode=RGB size=12x8 at 0x1D3E518>

PIL will be so charming as even to deduce by itself the necessary decoder from the file extension or headers.

From the returned object, you can fetch metadata :

>>> image.format
'PNG'
>>> image.getbbox()
(0, 0, 12, 8)
>>> image.getbands()
('R', 'G', 'B')

Or you can get an object that gives access to the actual pixels with :

>>> image.getdata()
<ImagingCore object at 0x7f21e5bb2350>

If you want to use the image as input for a model in Theano, you’ll need first to convert it to a Numpy array. This can very easily be done with :

>>> import numpy
>>> array1 = numpy.asarray(image)
>>> array1 
array([[[255,   0,   0],
        [255,   0,   0],
        [128,   0,   0],
        [128,   0,   0],
        [  0,   0,   0],
        [ 64,  64,  64],
        [128, 128, 128],
        [255, 255, 255],
        [  0,   0, 255],
        [  0,   0, 255],
        [  0,   0, 128],
        [  0,   0, 128]],

       [[255,   0,   0],
        [255,   0,   0],
        [128,   0,   0],
        [128,   0,   0],
        [  0,   0,   0],
        [ 64,  64,  64],
        [128, 128, 128],
        [255, 255, 255],
        [  0,   0, 255],
        [  0,   0, 255],
        [  0,   0, 128],
        [  0,   0, 128]],

       [[255,   0,   0],
        [255,   0,   0],
        [128,   0,   0],
        [128,   0,   0],
        [  0,   0,   0],
        [ 64,  64,  64],
        [128, 128, 128],
        [255, 255, 255],
        [  0,   0, 255],
        [  0,   0, 255],
        [  0,   0, 128],
        [  0,   0, 128]],

       [[255,   0,   0],
        [255,   0,   0],
        [128,   0,   0],
        [128,   0,   0],
        [  0,   0,   0],
        [ 64,  64,  64],
        [128, 128, 128],
        [255, 255, 255],
        [  0,   0, 255],
        [  0,   0, 255],
        [  0,   0, 128],
        [  0,   0, 128]],

       [[255, 255,   0],
        [255, 255,   0],
        [  0, 255, 255],
        [  0, 255, 255],
        [  0, 255,   0],
        [  0, 255,   0],
        [  0, 128,   0],
        [  0, 128,   0],
        [128, 128,   0],
        [128, 128,   0],
        [  0, 128, 128],
        [  0, 128, 128]],

       [[255, 255,   0],
        [255, 255,   0],
        [  0, 255, 255],
        [  0, 255, 255],
        [  0, 255,   0],
        [  0, 255,   0],
        [  0, 128,   0],
        [  0, 128,   0],
        [128, 128,   0],
        [128, 128,   0],
        [  0, 128, 128],
        [  0, 128, 128]],

       [[255,   0, 255],
        [255,   0, 255],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0, 255,   0],
        [  0, 255,   0],
        [  0, 128,   0],
        [  0, 128,   0],
        [128,   0, 128],
        [128,   0, 128],
        [255, 255, 255],
        [255, 255, 255]],

       [[255,   0, 255],
        [255,   0, 255],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0, 255,   0],
        [  0, 255,   0],
        [  0, 128,   0],
        [  0, 128,   0],
        [128,   0, 128],
        [128,   0, 128],
        [255, 255, 255],
        [255, 255, 255]]], dtype=uint8)

You may, at this point, specify a data type. For injecting images into Theano, we’ll often want to convert them to float32 to allow GPU processing :

array1 = numpy.asarray(image, dtype='float32')

Or for enhanced portability, you can use Theano’s default float datatype (which will be float32 if the code is intended to run in the current generation of GPUs) :

import theano
array1 = numpy.asarray(image, dtype=theano.config.floatX)

Alternatively, the array can be converted from the ImagingCore object :

>>> array2 = numpy.asarray(image.getdata())
>>> array2
array([[255,   0,   0],
       [255,   0,   0],
       [128,   0,   0],
       [128,   0,   0],
       [  0,   0,   0],
       [ 64,  64,  64],
       [128, 128, 128],
       [255, 255, 255],
       [  0,   0, 255],
       [  0,   0, 255],
       [  0,   0, 128],
       [  0,   0, 128],
       [255,   0,   0],
       [255,   0,   0],
       [128,   0,   0],
       [128,   0,   0],
       [  0,   0,   0],
       [ 64,  64,  64],
       [128, 128, 128],
       [255, 255, 255],
       [  0,   0, 255],
       [  0,   0, 255],
       [  0,   0, 128],
       [  0,   0, 128],
       [255,   0,   0],
       [255,   0,   0],
       [128,   0,   0],
       [128,   0,   0],
       [  0,   0,   0],
       [ 64,  64,  64],
       [128, 128, 128],
       [255, 255, 255],
       [  0,   0, 255],
       [  0,   0, 255],
       [  0,   0, 128],
       [  0,   0, 128],
       [255,   0,   0],
       [255,   0,   0],
       [128,   0,   0],
       [128,   0,   0],
       [  0,   0,   0],
       [ 64,  64,  64],
       [128, 128, 128],
       [255, 255, 255],
       [  0,   0, 255],
       [  0,   0, 255],
       [  0,   0, 128],
       [  0,   0, 128],
       [255, 255,   0],
       [255, 255,   0],
       [  0, 255, 255],
       [  0, 255, 255],
       [  0, 255,   0],
       [  0, 255,   0],
       [  0, 128,   0],
       [  0, 128,   0],
       [128, 128,   0],
       [128, 128,   0],
       [  0, 128, 128],
       [  0, 128, 128],
       [255, 255,   0],
       [255, 255,   0],
       [  0, 255, 255],
       [  0, 255, 255],
       [  0, 255,   0],
       [  0, 255,   0],
       [  0, 128,   0],
       [  0, 128,   0],
       [128, 128,   0],
       [128, 128,   0],
       [  0, 128, 128],
       [  0, 128, 128],
       [255,   0, 255],
       [255,   0, 255],
       [  0,   0,   0],
       [  0,   0,   0],
       [  0, 255,   0],
       [  0, 255,   0],
       [  0, 128,   0],
       [  0, 128,   0],
       [128,   0, 128],
       [128,   0, 128],
       [255, 255, 255],
       [255, 255, 255],
       [255,   0, 255],
       [255,   0, 255],
       [  0,   0,   0],
       [  0,   0,   0],
       [  0, 255,   0],
       [  0, 255,   0],
       [  0, 128,   0],
       [  0, 128,   0],
       [128,   0, 128],
       [128,   0, 128],
       [255, 255, 255],
       [255, 255, 255]])

Note that the techniques are not exactly interchangeable. Converting directly from the Image object preserves the shape, while converting from .getdata() creates a flatter array :

>>> array1.shape
(8, 12, 3)
>>> array2.shape
(96, 3)

In both cases the image channels are “interleaved”, i.e., for each row and column of the image, the values of the channels (in this example: red, green and blue) appear in sequence. This organization is helpful if you are doing local transformations in the pixels (say, changing colorspaces) as it keeps referential locality. For deep learning, however, that arrangement is a nuisance.

Even with 2D input images, we will typically want “3D” convolutional filters in order to reach through — and mix and match — the entire stack of input channels. We need the input channels separated by planes, i.e., all red data, then all green data, etc. That is a breeze with rollaxis and reshape :

>>> array1 = numpy.rollaxis(array1, 2, 0)
>>> array2 = array2.T.reshape(3,8,12)

Some neural nets go one step further and stack together all the images from a given batch into a single tensor. The LeNet/MNIST sample from deeplearning.net does exactly that. That strategy can increase GPU utilization by giving it bigger chunks to chew at once. If you decide to adopt it, it’s not difficult to assemble the tensor :

ImageSize = (512, 512)
NChannelsPerImage = 3
imagesData = [ Image.open(f, 'r').getdata() for f in batch ]
for i in imagesData :
    assert i.size == ImageSize
    assert i.bands == NChannelsPerImage

allImages = numpy.asarray(imagesData)
nImages = len(batch)
allImages = numpy.rollaxis(allImages, 2, 1).reshape(nImages, NChannelsPerImage, ImageSize[0], ImageSize[1])
print allImages.shape

The code above checks that all images conform to a given shape (essential to convolutional networks, which are very rigid about input sizes). And it works… but it will rather walk than run.

As you try to discover what is holding back the speed, it is easy to suspect the array reshaping operations, but the real culprit is the innocent-loking image conversion to array. For some reason importing images to Numpy either from Image objects or from ImagingCore objects — as we have been trying so far — takes an absurd amount of time.

The solution is not exactly elegant but it makes the conversion so much faster, you might want to consider it. You have to bridge the conversion with a pair of .tostring / .fromstring operations :

ImageSize = (512, 512)
NChannelsPerImage = 3
images = [ Image.open(f, 'r') for f in batch ]
for i in images :
    assert i.size == ImageSize
    assert len(i.getbands()) == NChannelsPerImage

ImageShape =  (1,) + ImageSize + (NChannelsPerImage,)
allImages = [ numpy.fromstring(i.tostring(), dtype='uint8', count=-1, sep='') for i in images ]
allImages = [ numpy.rollaxis(a.reshape(ImageShape), 3, 1) for a in allImages ]
allImages = numpy.concatenate(allImages)

The snippet above has exactly the same effect than the previous one, but it will run up to 20 times faster. In both cases, the array will be ready to be fed to the network.

* * *

TL;DR ?

If speed is important to you, do not convert an image from PIL to Numpy like this…

from PIL import Image
import numpy
image = Image.open('color.png', 'r')
array = numpy.asarray(image)

…nor like this…

from PIL import Image
import numpy
imageData = Image.open('color.png', 'r').getdata()
imageArray = numpy.asarray(imageData).reshape(imageData.shape + (imageData.bands,))

…because although both methods work, they will be very very slow. Do it like this :

from PIL import Image
import numpy
image = Image.open('color.png', 'r').getdata()
imageArray = numpy.fromstring(image.tostring(), dtype='uint8', count=-1, sep='').reshape(image.shape + (len(image.getbands()),))

It will be up to 20⨉ faster.

(Also, you’ll probably have to work on the shape of the array before you feed it to a convolutional network, but for that, I’m afraid, you’ll have to read the piece from the top.)

Deep Neural Network

From instance launch to model accuracy: an AWS/Theano walkthrough

My team has recently participated at Kaggle’s Diabetic Retinopathy challenge, and we won… experience. It was our first Kaggle challenge and we found ourselves unprepared for the workload.

But it was fun — and it was the opportunity to learn new skills, and to sharpen old ones. As the deadline approached, I used Amazon Web Services a lot, and got more familiar with it. Although we have our GPU infrastructure at RECOD, the extra boost provided by AWS allowed exploring extra possibilities.

But it was in the weekend just before the challenge deadline that AWS proved invaluable. Our in-house cluster went AWOL. What are the chances of having a power outage bringing down your servers and a pest control blocking your physical access to them in the weekend before a major deadline ? Murphy knows. Well, AWS allowed us to go down fighting, instead of throwing in the towel.

In this post, I’m compiling Markus Beissinger’s how-to and deeplearning.net tutorials into a single hyper-condensed walkthrough to get you as fast as possible from launching an AWS instance until running a simple convolutional deep neural net. If you are anything like me, I know that you are aching to see some code running — but after you scratch that itch, I strongly suggest you to go back to those sources and study them at leisure.

I’ll assume that you already know :

  1. How to create an AWS account ;
  2. How to manage AWS users and permissions ;
  3. How to launch an AWS instance.

Those preparations out of way, let’s get started ?

Step 1: Launch an instance at AWS, picking :

  • AMI (Amazon Machine Image) : Ubuntu Server 14.04 LTS (HVM), SSD Volume Type – 64-bit
  • Instance type : GPU instances / g2.2xlarge

For the other settings, you can use the defaults, but be careful with the security group and access key to not lock yourself out of the instance.

Step 2 : Open a terminal window, and log into your instance. In my Mac I type :

ssh -i private.pem ubuntu@xxxxxxxx.amazonaws.com

Where private.pem is the private key file of the key pair used when creating the instance, and xxxxxxxx.amazonaws.com is the public DNS of the instance. You might get an angry message from SSH, complaining that your .pem file is too open. If that happens, change its permissions with :

chmod go-rxw private.pem

Step 3 : Install Theano.

Once you’re inside the machine, this is not complicated. Start by making the machine up-to date :

sudo apt-get update
sudo apt-get -y dist-upgrade

Install Theano’s dependencies :

sudo apt-get install -y gcc g++ gfortran build-essential git wget linux-image-generic libopenblas-dev python-dev python-pip python-nose python-numpy python-scipy

Get the package for CUDA and install it :

wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/cuda-repo-ubuntu1404_7.0-28_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1404_7.0-28_amd64.deb
sudo apt-get update
sudo apt-get install -y cuda

This last command is the only one that takes some time — you might want to go brew a cuppa while you wait. Once it is over, put CUDA on the path and reboot the machine :

echo 'export PATH=/usr/local/cuda/bin:$PATH' >> .bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> .bashrc
sudo reboot

Log into the instance again and query for the GPU :

nvidia-smi -q

This should spit a lengthy list of details about the installed GPU.

Now you just have to install Theano. The one-liner below installs the latest version, and after the wait for the CUDA driver, runs anticlimactically fast :

sudo pip install --upgrade --no-deps git+git://github.com/Theano/Theano.git

And that’s it ! You have Theano on your system.

Step 4 : Run an example.
Let’s take Theano for a run. The simplest sample from deeplearning.net that’s already interesting is the convolutional/MNIST digits tutorial. The sample depends on code written in the previous tutorials, MLP and Logistic Regression, so you have to download those too. You also have to download the data. The commands below do all that:

mkdir theano
mkdir theano/data
mkdir theano/lenet
cd theano/data
wget http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz
cd ../lenet
wget http://deeplearning.net/tutorial/code/logistic_sgd.py
wget http://deeplearning.net/tutorial/code/mlp.py
wget http://deeplearning.net/tutorial/code/convolutional_mlp.py

Finally, hit the magical command :

python convolutional_mlp.py

What ? All this work and the epochs will go by as fast as molasses on a winter’s day. What gives ?

You have to tell Theano to run on the GPU, otherwise it will crawl on the CPU. You can paste the lines below into your ~/.theanorc file :

[global]
floatX=float32
device=gpu

[lib]
cnmem=0.9

[nvcc]
fastmath=True

…or you can use the one-liner below to create it:

echo -e '[global]\nfloatX=float32\ndevice=gpu\n\n[lib]\ncnmem=0.9\n\n[nvcc]\nfastmath=True' > ~/.theanorc

Try running the example again.

python convolutional_mlp.py

With some luck, you’ll note two differences: first, Theano will announce the use of the GPU…

Using gpu device 0: GRID K520 (CNMeM is enabled)

…and second, the epochs will run much, much faster !

(Image credit : networkworld.com).

Grant from Microsoft Azure

My team and I received a grant from Microsoft Azure for Research. Our proposal was named “Medical Image Classification for Computer Aided Diagnosis with Deep Learning and Jumbo Vectors”, with the following abstract :

Information retrieval and content-based image classification are progressing fast. A key application for those technologies is Computer-Aided Diagnosis (CAD), improving doctors’ ability to detect diseases and prevent their progression. Our aim is to advance the state of the art in disease screening based upon images, focusing on the early screening of melanoma. The techniques covered in this project are Deep Learning Architectures and Bag of Visual Words models, which show complementary advantages.

The grant provides us with Azure’s resources (storage, processing units, etc.) to run our techniques in that cloud-based environment.

We are much thankful to Microsoft Research for this opportunity.