My team has recently participated at Kaggle’s Diabetic Retinopathy challenge, and we won… experience. It was our first Kaggle challenge and we found ourselves unprepared for the workload.
But it was fun — and it was the opportunity to learn new skills, and to sharpen old ones. As the deadline approached, I used Amazon Web Services a lot, and got more familiar with it. Although we have our GPU infrastructure at RECOD, the extra boost provided by AWS allowed exploring extra possibilities.
But it was in the weekend just before the challenge deadline that AWS proved invaluable. Our in-house cluster went AWOL. What are the chances of having a power outage bringing down your servers and a pest control blocking your physical access to them in the weekend before a major deadline ? Murphy knows. Well, AWS allowed us to go down fighting, instead of throwing in the towel.
In this post, I’m compiling Markus Beissinger’s how-to and deeplearning.net tutorials into a single hyper-condensed walkthrough to get you as fast as possible from launching an AWS instance until running a simple convolutional deep neural net. If you are anything like me, I know that you are aching to see some code running — but after you scratch that itch, I strongly suggest you to go back to those sources and study them at leisure.
I’ll assume that you already know :
- How to create an AWS account ;
- How to manage AWS users and permissions ;
- How to launch an AWS instance.
Those preparations out of way, let’s get started ?
Step 1: Launch an instance at AWS, picking :
- AMI (Amazon Machine Image) : Ubuntu Server 14.04 LTS (HVM), SSD Volume Type – 64-bit
- Instance type : GPU instances / g2.2xlarge
For the other settings, you can use the defaults, but be careful with the security group and access key to not lock yourself out of the instance.
Step 2 : Open a terminal window, and log into your instance. In my Mac I type :
ssh -i private.pem email@example.com
private.pem is the private key file of the key pair used when creating the instance, and
xxxxxxxx.amazonaws.com is the public DNS of the instance. You might get an angry message from SSH, complaining that your
.pem file is too open. If that happens, change its permissions with :
chmod go-rxw private.pem
Step 3 : Install Theano.
Once you’re inside the machine, this is not complicated. Start by making the machine up-to date :
sudo apt-get update sudo apt-get -y dist-upgrade
Install Theano’s dependencies :
sudo apt-get install -y gcc g++ gfortran build-essential git wget linux-image-generic libopenblas-dev python-dev python-pip python-nose python-numpy python-scipy
Get the package for CUDA and install it :
wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/cuda-repo-ubuntu1404_7.0-28_amd64.deb sudo dpkg -i cuda-repo-ubuntu1404_7.0-28_amd64.deb sudo apt-get update sudo apt-get install -y cuda
This last command is the only one that takes some time — you might want to go brew a cuppa while you wait. Once it is over, put CUDA on the path and reboot the machine :
echo 'export PATH=/usr/local/cuda/bin:$PATH' >> .bashrc echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> .bashrc sudo reboot
Log into the instance again and query for the GPU :
This should spit a lengthy list of details about the installed GPU.
Now you just have to install Theano. The one-liner below installs the latest version, and after the wait for the CUDA driver, runs anticlimactically fast :
sudo pip install --upgrade --no-deps git+git://github.com/Theano/Theano.git
And that’s it ! You have Theano on your system.
Step 4 : Run an example.
Let’s take Theano for a run. The simplest sample from deeplearning.net that’s already interesting is the convolutional/MNIST digits tutorial. The sample depends on code written in the previous tutorials, MLP and Logistic Regression, so you have to download those too. You also have to download the data. The commands below do all that:
mkdir theano mkdir theano/data mkdir theano/lenet cd theano/data wget http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz cd ../lenet wget http://deeplearning.net/tutorial/code/logistic_sgd.py wget http://deeplearning.net/tutorial/code/mlp.py wget http://deeplearning.net/tutorial/code/convolutional_mlp.py
Finally, hit the magical command :
What ? All this work and the epochs will go by as fast as molasses on a winter’s day. What gives ?
You have to tell Theano to run on the GPU, otherwise it will crawl on the CPU. You can paste the lines below into your ~/.theanorc file :
[global] floatX=float32 device=gpu [lib] cnmem=0.9 [nvcc] fastmath=True
…or you can use the one-liner below to create it:
echo -e '[global]\nfloatX=float32\ndevice=gpu\n\n[lib]\ncnmem=0.9\n\n[nvcc]\nfastmath=True' > ~/.theanorc
Try running the example again.
With some luck, you’ll note two differences: first, Theano will announce the use of the GPU…
Using gpu device 0: GRID K520 (CNMeM is enabled)
…and second, the epochs will run much, much faster !
(Image credit : networkworld.com).