Deep Neural Network

From instance launch to model accuracy: an AWS/Theano walkthrough

My team has recently participated at Kaggle’s Diabetic Retinopathy challenge, and we won… experience. It was our first Kaggle challenge and we found ourselves unprepared for the workload.

But it was fun — and it was the opportunity to learn new skills, and to sharpen old ones. As the deadline approached, I used Amazon Web Services a lot, and got more familiar with it. Although we have our GPU infrastructure at RECOD, the extra boost provided by AWS allowed exploring extra possibilities.

But it was in the weekend just before the challenge deadline that AWS proved invaluable. Our in-house cluster went AWOL. What are the chances of having a power outage bringing down your servers and a pest control blocking your physical access to them in the weekend before a major deadline ? Murphy knows. Well, AWS allowed us to go down fighting, instead of throwing in the towel.

In this post, I’m compiling Markus Beissinger’s how-to and tutorials into a single hyper-condensed walkthrough to get you as fast as possible from launching an AWS instance until running a simple convolutional deep neural net. If you are anything like me, I know that you are aching to see some code running — but after you scratch that itch, I strongly suggest you to go back to those sources and study them at leisure.

I’ll assume that you already know :

  1. How to create an AWS account ;
  2. How to manage AWS users and permissions ;
  3. How to launch an AWS instance.

Those preparations out of way, let’s get started ?

Step 1: Launch an instance at AWS, picking :

  • AMI (Amazon Machine Image) : Ubuntu Server 14.04 LTS (HVM), SSD Volume Type – 64-bit
  • Instance type : GPU instances / g2.2xlarge

For the other settings, you can use the defaults, but be careful with the security group and access key to not lock yourself out of the instance.

Step 2 : Open a terminal window, and log into your instance. In my Mac I type :

ssh -i private.pem

Where private.pem is the private key file of the key pair used when creating the instance, and is the public DNS of the instance. You might get an angry message from SSH, complaining that your .pem file is too open. If that happens, change its permissions with :

chmod go-rxw private.pem

Step 3 : Install Theano.

Once you’re inside the machine, this is not complicated. Start by making the machine up-to date :

sudo apt-get update
sudo apt-get -y dist-upgrade

Install Theano’s dependencies :

sudo apt-get install -y gcc g++ gfortran build-essential git wget linux-image-generic libopenblas-dev python-dev python-pip python-nose python-numpy python-scipy

Get the package for CUDA and install it :

sudo dpkg -i cuda-repo-ubuntu1404_7.0-28_amd64.deb
sudo apt-get update
sudo apt-get install -y cuda

This last command is the only one that takes some time — you might want to go brew a cuppa while you wait. Once it is over, put CUDA on the path and reboot the machine :

echo 'export PATH=/usr/local/cuda/bin:$PATH' >> .bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> .bashrc
sudo reboot

Log into the instance again and query for the GPU :

nvidia-smi -q

This should spit a lengthy list of details about the installed GPU.

Now you just have to install Theano. The one-liner below installs the latest version, and after the wait for the CUDA driver, runs anticlimactically fast :

sudo pip install --upgrade --no-deps git+git://

And that’s it ! You have Theano on your system.

Step 4 : Run an example.
Let’s take Theano for a run. The simplest sample from that’s already interesting is the convolutional/MNIST digits tutorial. The sample depends on code written in the previous tutorials, MLP and Logistic Regression, so you have to download those too. You also have to download the data. The commands below do all that:

mkdir theano
mkdir theano/data
mkdir theano/lenet
cd theano/data
cd ../lenet

Finally, hit the magical command :


What ? All this work and the epochs will go by as fast as molasses on a winter’s day. What gives ?

You have to tell Theano to run on the GPU, otherwise it will crawl on the CPU. You can paste the lines below into your ~/.theanorc file :




…or you can use the one-liner below to create it:

echo -e '[global]\nfloatX=float32\ndevice=gpu\n\n[lib]\ncnmem=0.9\n\n[nvcc]\nfastmath=True' > ~/.theanorc

Try running the example again.


With some luck, you’ll note two differences: first, Theano will announce the use of the GPU…

Using gpu device 0: GRID K520 (CNMeM is enabled)

…and second, the epochs will run much, much faster !

(Image credit :

Talk at the I3S Lab, Université de Nice, Sophia-Antipolis

As part of my visit to the I3S Lab, I’m giving a talk on February 11th :

Title: Scalability Issues in Multimedia Information Retrieval
Where: I3S conference room (level 0)
When: Monday, February 11th, at 14h00

The Millennium marked a turning point for textual Information Retrieval, a moment when Search Engines and Social Networks changed our relationship to World Wide Web: gigantic corpora of knowledge suddenly felt friendly, accessible and manageable. Ten years later, the same phenomenon is happening for complex non-textual data, including multimedia. The challenge is how to provide intuitive, convenient, fast services for those data, in collections whose size and growing rate is so big, that our intuitions fail to grasp.

Two issues have dominated the scientific discourse when we aim at that goal: our ability to represent multimedia information in a way that allows answering the high-level queries posed by the users, and our ability to process those queries fast.

In this talk, I will focus on the latter issue, examining similarity search in high-dimensional spaces, a pivotal operation found a variety of database applications — including Multimedia Information Retrieval. Similarity search is conceptually very simple: find the objects in the dataset that are similar to the query, i.e., those that are close to the query according to some notion of distance. However, due to the infamous “curse of the dimensionality”, performing it fast is challenging from both the theoretical and the practical point-of-view.

I have selected for this talk Hypercurves, my latest research endeavor, which is a distributed technique aimed at hybrid CPU–GPU environments. Hypercurves’ goal is to employ throughput-oriented GPUs to keep answer times optimal, under several load regimens. The parallelization also poses interesting theoretical questions of how much can we optimize the parallelization of approximate k-nearest neighbors, if we relax the equivalence to the sequential algorithm from exact to probabilistic.

The talk will be in English. I thank my colleague and friend Prof. Frédéric Precioso, for this opportunity.