The devil in the details : feeding images to Theano

Another lesson from the Kaggle competition : pushing your code to ultimate speed is still labour intensive.

High-level libraries like Numpy and Theano really help, but when you are training complex deep-learning models, rushing through the epochs takes more than typing a couple of imports, enabling the GPU, and hoping for the best.

Seemingly trivial tasks — input, output, copying buffers, rearranging arrays — may easily become the critical bottleneck. Those treacherous operations always raise flags for practioners of high-performance computing. The rest of us, as we fixate on numerical computations, tend to overlook that moving data around easily dominates the costs.

Sometimes the bottleneck is a bolt out of the blue, a straightforward task that ends up being painfully slow. Case in point, the pipeline of importing an image from PIL to Theano through a Numpy array.

Opening an image using PIL is certainly very simple :

>>> from PIL import Image
>>> image = Image.open('color.png', 'r')
>>> image
<PIL.PngImagePlugin.PngImageFile image mode=RGB size=12x8 at 0x1D3E518>

PIL will be so charming as even to deduce by itself the necessary decoder from the file extension or headers.

From the returned object, you can fetch metadata :

>>> image.format
'PNG'
>>> image.getbbox()
(0, 0, 12, 8)
>>> image.getbands()
('R', 'G', 'B')

Or you can get an object that gives access to the actual pixels with :

>>> image.getdata()
<ImagingCore object at 0x7f21e5bb2350>

If you want to use the image as input for a model in Theano, you’ll need first to convert it to a Numpy array. This can very easily be done with :

>>> import numpy
>>> array1 = numpy.asarray(image)
>>> array1 
array([[[255,   0,   0],
        [255,   0,   0],
        [128,   0,   0],
        [128,   0,   0],
        [  0,   0,   0],
        [ 64,  64,  64],
        [128, 128, 128],
        [255, 255, 255],
        [  0,   0, 255],
        [  0,   0, 255],
        [  0,   0, 128],
        [  0,   0, 128]],

       [[255,   0,   0],
        [255,   0,   0],
        [128,   0,   0],
        [128,   0,   0],
        [  0,   0,   0],
        [ 64,  64,  64],
        [128, 128, 128],
        [255, 255, 255],
        [  0,   0, 255],
        [  0,   0, 255],
        [  0,   0, 128],
        [  0,   0, 128]],

       [[255,   0,   0],
        [255,   0,   0],
        [128,   0,   0],
        [128,   0,   0],
        [  0,   0,   0],
        [ 64,  64,  64],
        [128, 128, 128],
        [255, 255, 255],
        [  0,   0, 255],
        [  0,   0, 255],
        [  0,   0, 128],
        [  0,   0, 128]],

       [[255,   0,   0],
        [255,   0,   0],
        [128,   0,   0],
        [128,   0,   0],
        [  0,   0,   0],
        [ 64,  64,  64],
        [128, 128, 128],
        [255, 255, 255],
        [  0,   0, 255],
        [  0,   0, 255],
        [  0,   0, 128],
        [  0,   0, 128]],

       [[255, 255,   0],
        [255, 255,   0],
        [  0, 255, 255],
        [  0, 255, 255],
        [  0, 255,   0],
        [  0, 255,   0],
        [  0, 128,   0],
        [  0, 128,   0],
        [128, 128,   0],
        [128, 128,   0],
        [  0, 128, 128],
        [  0, 128, 128]],

       [[255, 255,   0],
        [255, 255,   0],
        [  0, 255, 255],
        [  0, 255, 255],
        [  0, 255,   0],
        [  0, 255,   0],
        [  0, 128,   0],
        [  0, 128,   0],
        [128, 128,   0],
        [128, 128,   0],
        [  0, 128, 128],
        [  0, 128, 128]],

       [[255,   0, 255],
        [255,   0, 255],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0, 255,   0],
        [  0, 255,   0],
        [  0, 128,   0],
        [  0, 128,   0],
        [128,   0, 128],
        [128,   0, 128],
        [255, 255, 255],
        [255, 255, 255]],

       [[255,   0, 255],
        [255,   0, 255],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0, 255,   0],
        [  0, 255,   0],
        [  0, 128,   0],
        [  0, 128,   0],
        [128,   0, 128],
        [128,   0, 128],
        [255, 255, 255],
        [255, 255, 255]]], dtype=uint8)

You may, at this point, specify a data type. For injecting images into Theano, we’ll often want to convert them to float32 to allow GPU processing :

array1 = numpy.asarray(image, dtype='float32')

Or for enhanced portability, you can use Theano’s default float datatype (which will be float32 if the code is intended to run in the current generation of GPUs) :

import theano
array1 = numpy.asarray(image, dtype=theano.config.floatX)

Alternatively, the array can be converted from the ImagingCore object :

>>> array2 = numpy.asarray(image.getdata())
>>> array2
array([[255,   0,   0],
       [255,   0,   0],
       [128,   0,   0],
       [128,   0,   0],
       [  0,   0,   0],
       [ 64,  64,  64],
       [128, 128, 128],
       [255, 255, 255],
       [  0,   0, 255],
       [  0,   0, 255],
       [  0,   0, 128],
       [  0,   0, 128],
       [255,   0,   0],
       [255,   0,   0],
       [128,   0,   0],
       [128,   0,   0],
       [  0,   0,   0],
       [ 64,  64,  64],
       [128, 128, 128],
       [255, 255, 255],
       [  0,   0, 255],
       [  0,   0, 255],
       [  0,   0, 128],
       [  0,   0, 128],
       [255,   0,   0],
       [255,   0,   0],
       [128,   0,   0],
       [128,   0,   0],
       [  0,   0,   0],
       [ 64,  64,  64],
       [128, 128, 128],
       [255, 255, 255],
       [  0,   0, 255],
       [  0,   0, 255],
       [  0,   0, 128],
       [  0,   0, 128],
       [255,   0,   0],
       [255,   0,   0],
       [128,   0,   0],
       [128,   0,   0],
       [  0,   0,   0],
       [ 64,  64,  64],
       [128, 128, 128],
       [255, 255, 255],
       [  0,   0, 255],
       [  0,   0, 255],
       [  0,   0, 128],
       [  0,   0, 128],
       [255, 255,   0],
       [255, 255,   0],
       [  0, 255, 255],
       [  0, 255, 255],
       [  0, 255,   0],
       [  0, 255,   0],
       [  0, 128,   0],
       [  0, 128,   0],
       [128, 128,   0],
       [128, 128,   0],
       [  0, 128, 128],
       [  0, 128, 128],
       [255, 255,   0],
       [255, 255,   0],
       [  0, 255, 255],
       [  0, 255, 255],
       [  0, 255,   0],
       [  0, 255,   0],
       [  0, 128,   0],
       [  0, 128,   0],
       [128, 128,   0],
       [128, 128,   0],
       [  0, 128, 128],
       [  0, 128, 128],
       [255,   0, 255],
       [255,   0, 255],
       [  0,   0,   0],
       [  0,   0,   0],
       [  0, 255,   0],
       [  0, 255,   0],
       [  0, 128,   0],
       [  0, 128,   0],
       [128,   0, 128],
       [128,   0, 128],
       [255, 255, 255],
       [255, 255, 255],
       [255,   0, 255],
       [255,   0, 255],
       [  0,   0,   0],
       [  0,   0,   0],
       [  0, 255,   0],
       [  0, 255,   0],
       [  0, 128,   0],
       [  0, 128,   0],
       [128,   0, 128],
       [128,   0, 128],
       [255, 255, 255],
       [255, 255, 255]])

Note that the techniques are not exactly interchangeable. Converting directly from the Image object preserves the shape, while converting from .getdata() creates a flatter array :

>>> array1.shape
(8, 12, 3)
>>> array2.shape
(96, 3)

In both cases the image channels are “interleaved”, i.e., for each row and column of the image, the values of the channels (in this example: red, green and blue) appear in sequence. This organization is helpful if you are doing local transformations in the pixels (say, changing colorspaces) as it keeps referential locality. For deep learning, however, that arrangement is a nuisance.

Even with 2D input images, we will typically want “3D” convolutional filters in order to reach through — and mix and match — the entire stack of input channels. We need the input channels separated by planes, i.e., all red data, then all green data, etc. That is a breeze with rollaxis and reshape :

>>> array1 = numpy.rollaxis(array1, 2, 0)
>>> array2 = array2.T.reshape(3,8,12)

Some neural nets go one step further and stack together all the images from a given batch into a single tensor. The LeNet/MNIST sample from deeplearning.net does exactly that. That strategy can increase GPU utilization by giving it bigger chunks to chew at once. If you decide to adopt it, it’s not difficult to assemble the tensor :

ImageSize = (512, 512)
NChannelsPerImage = 3
imagesData = [ Image.open(f, 'r').getdata() for f in batch ]
for i in imagesData :
    assert i.size == ImageSize
    assert i.bands == NChannelsPerImage

allImages = numpy.asarray(imagesData)
nImages = len(batch)
allImages = numpy.rollaxis(allImages, 2, 1).reshape(nImages, NChannelsPerImage, ImageSize[0], ImageSize[1])
print allImages.shape

The code above checks that all images conform to a given shape (essential to convolutional networks, which are very rigid about input sizes). And it works… but it will rather walk than run.

As you try to discover what is holding back the speed, it is easy to suspect the array reshaping operations, but the real culprit is the innocent-loking image conversion to array. For some reason importing images to Numpy either from Image objects or from ImagingCore objects — as we have been trying so far — takes an absurd amount of time.

The solution is not exactly elegant but it makes the conversion so much faster, you might want to consider it. You have to bridge the conversion with a pair of .tostring / .fromstring operations :

ImageSize = (512, 512)
NChannelsPerImage = 3
images = [ Image.open(f, 'r') for f in batch ]
for i in images :
    assert i.size == ImageSize
    assert len(i.getbands()) == NChannelsPerImage

ImageShape =  (1,) + ImageSize + (NChannelsPerImage,)
allImages = [ numpy.fromstring(i.tostring(), dtype='uint8', count=-1, sep='') for i in images ]
allImages = [ numpy.rollaxis(a.reshape(ImageShape), 3, 1) for a in allImages ]
allImages = numpy.concatenate(allImages)

The snippet above has exactly the same effect than the previous one, but it will run up to 20 times faster. In both cases, the array will be ready to be fed to the network.

* * *

TL;DR ?

If speed is important to you, do not convert an image from PIL to Numpy like this…

from PIL import Image
import numpy
image = Image.open('color.png', 'r')
array = numpy.asarray(image)

…nor like this…

from PIL import Image
import numpy
imageData = Image.open('color.png', 'r').getdata()
imageArray = numpy.asarray(imageData).reshape(imageData.shape + (imageData.bands,))

…because although both methods work, they will be very very slow. Do it like this :

from PIL import Image
import numpy
image = Image.open('color.png', 'r').getdata()
imageArray = numpy.fromstring(image.tostring(), dtype='uint8', count=-1, sep='').reshape(image.shape + (len(image.getbands()),))

It will be up to 20⨉ faster.

(Also, you’ll probably have to work on the shape of the array before you feed it to a convolutional network, but for that, I’m afraid, you’ll have to read the piece from the top.)

Wow ! Much Homebrew. Very Numpy. So Scipy. Such OpenCV

The first time I tried to install NumPy+SciPy in my Mac, it turned into a Kafkaesque nightmare, out of which I only managed to surface due to luck and grit. (Only to have, a few weeks later, a system update breaking my MacPorts and sending everything back to hell.)

The second time around, I traded freedom for comfort, and went with Enthought Python Distribution (now Enthought Canopy).  EPD came with an impressive list of available packages, and, more importantly : it just worked. It was also generously available at no fee for academic use, an offer from which I’ve profited.

Recently though, I became a latecomer to Homebrew, enticed by their taglines (‘The missing package manager of OS X’, ‘MacPorts driving you to drink ? Try Homebrew !’) and by their oneliner installation procedure (look for ‘Install Homebrew’ at their homepage).

So far, I am incredibly impressed — I’ve done fresh installations of Python, Nose, NumPy, OpenCV, GCC (!), SciPy, Bottleneck, wxPython and PIL. All went smoothly, installing and testing without smoke. My command-line history reveals just how easy it was :

ruby -e "$(curl -fsSL https://raw.github.com/Homebrew/homebrew/go/install)"

brew install python

/usr/local/bin/pip-2.7 install nose

/usr/local/bin/pip-2.7 install numpy

brew install opencv

brew install gcc49

brew install scipy

/usr/local/bin/pip-2.7 install bottleneck

brew install wxwidgets

/usr/local/bin/pip install pil

The only pitfall (if it may even be called so) is that some Python packages prefer the homebrew installer, and some prefer pip — but quick error and trial works just fine to find out.

Often homebrew installer will discreetly guide you through the process, like when I asked ‘brew install wxpython’, and it told me that there was no such package, but that  ‘wxwidgets’ already came with the wxPython bindings. That kind of gentle bending of Unix philosophy, on behalf of preserving the user sanity, never fails to win my respect.

Now : maybe homebrew is running so smoothly only because I have EPD already installed in this machine, all exoteric dependences having been previously solved. I also had Xcode fully installed and operational, a requirement for most interesting tools working on OS X at all. Remark also that I am still running Mountain Lion.

Homebrew’s express requirements seem to be quite modest, however : the Command-line Tools for Xcode, and a bash or zsh-compatible shell (the default terminal is fine). Additionally, it resides in a branch independent from EPD, so it probably can’t cound on the latter’s dependences. In a few weeks, I intend to do a fresh installation of Mavericks on this machine, and we will know for sure.

I can haz SciPy !!1!1!

"The Dependency Hell", rightmost panel of Hieronymus Bosch's "The Garden of Earthly Delights".
Well, at least it looks like so (the self-test does not smoke anymore).

I will try to reconstruct a walkthrough, but bear in mind that I have installed, uninstalled and reinstalled so much stuff in my Mac that the reproducibility of this recipe should be taken with a grain of salt. If you try it and it works, I would be thankful if you’d leave a note describing exactly what you did.

Without further ado, how to install the trio NumPy, SciPy and Matplotlib in Mac OS X Snow Leopard, using MacPorts:

  1. Download Apple’s Xcode for Mac OS X, without which nothing else is possible. You’ll have to register on Apple, but there is a free inscription if you are not planning to sell anything;
  2. Download MacPorts. This is the package manager which will operate all the magic — it will do for your Mac the same thing yum or apt-get does for a Linux box, so if it doesn’t work properly nothing else will;
  3. Install Xcode and then MacPorts. If you want to be extra sure, follow the intructions on the Macports page — basically instructing you to install X11 support on Mac and MacPorts. I didn’t do this (because I only became aware of those instruction very late in the whole process), but that doesn’t mean you shouldn’t. Alternatively, if you already have MacPorts installed, ensure it is up-to-date by typing:

    sudo port selfupdate

  4. You can compile your dependencies with any compiler — in theory. I found out that SciPy smokes unless compiled with GCC 4.4, but now I have some dependencies compiled with their default choice and some (NumPy, SciPy) in which I forced compilation with GCC 4.4. If you want to try your luck, start by downloading GCC 4.4 and making it default:

    sudo port install gcc44
    sudo port install gcc_select
    sudo gcc_select mp-gcc44

    If instead, you want to reproduce exactly my crazy (but successful) sequence, install GCC 4.4 but do not make it default (omit the last two lines), and also install GCC 4.5 (I know, how many versions of GCC does one need ?!):

    sudo port install gcc45

  5. Install the “non-dependencies” of PIL. Those “non-dependencies” are libraries needed by other libraries in other to provided optional (but often important) functionality, which won’t be installed by the package manager otherwise (PIL itself is a “non-dependency” of SciPy):

    sudo port install jpeg libpng tiff lcms freefont-ttf

  6. Install PIL. This will install python as a dependency. (I have chosen to install 2.7, so I’ve consistenly chosen py27-* packages through the process):

    sudo port install py27-pil

  7. Install the binary dependencies of NumPy and SciPy:

    sudo port install arpack
    sudo port install SuiteSparse

  8. Finally, install the goodies:

    sudo port install py27-numpy
    sudo port install py27-scipy

    Again, if you want to do exactly what I did, the process is more convoluted. After typing exactly those commands above, I then uninstalled then (but not their dependencies):

    sudo port uninstall py27-scipy py27-matplotlib

    And reinstalled everything again (but the dependencies) with the compiler directive:

    sudo port install py27-numpy configure.compiler=macports-gcc-4.5
    sudo port install py27-numpy configure.compiler=macports-gcc-4.4

    I don’t remember why I chose GCC 4.5 for NumPy and 4.4 for SciPy — but my history file is a more faithful testimony than my memory (probably I just mistyped NumPy’s and meant 4.4 for both — give-me some slack, it was almost 4 a.m. by then !).

  9. Last, but not least, install Matplotlib:

    sudo port install py27-matplotlib

I don’t know if all this mix-and-match of compilers is a good thing — all I know is that SciPy smokes if not compiled with GCC 4.4, but maybe compiling everything with GCC 4.4 is the way to go.

To test the installation, call python and selftest the packages one by one:

python
import numpy
numpy.test(‘1′,’10’)

import scipy
scipy.test(‘1′,’10’)

import matplotlib
matplotlib.test()

I’ve got a pristine regression test for NumPy and SciPy, including the problematic C++ code weaving. My Matplotlib regression smoked, with several “ImageComparisonFailure: images not close” errors with RMS values slightly above acceptable. I compared the images by hand, however, and could not spot the difference — and dismissed the problem as a matter of different engines of font smoothing.

By the way, if you want to make the new python default, there is a python_select for this job:

sudo port install python_select
sudo python_select python27

Both gcc_select and python_select allow you to choose betwen a list of options which include the MacPorts and the Apple versions of GCC and python, to see the options available, type;

gcc_select -l
python_select -l

And a little something I found out at my expense. When using bash, sometimes updating the PATH is not enough: recently used commands will be at a hash table, which will have to be refreshed, lest bash will still be fetching the command at the wrong path. To check and clear this hash table you can use, respectively, the commands:

hash
hash -r

Well, it is not the Seventh Circle of Heaven, but it is half-way through the Purgatory. Not bad for this Dante, who’s got only Google and StackOverflow for Virgile !

Then you just have to add another dependence…

Okay, I am well in the third day of trying to install SciPy on the Mac OS X, and things do not seem they are improving:

  1. I have started by signing on the Apple Developer Program, downloading and installing Xcode (without which nothing else is possible);
  2. I’ve downloaded and installed MacPorts (without which nothing interesting is humanely feasible);
  3. As required, I’ve installed a fresh copy of Python 2.6 from the .dmg, available at python.org;
  4. I’ve installed gfortran from the binary, available at AT&T Research;
  5. Using MacPorts, I’ve installed the SuiteSparse;
  6. I’ve installed numpy from the .dmg available at SourceForge;
  7. I’ve installed the setuputils for python, downloading the egg and running it as a shell script, as indicated on its website;
  8. Using setuputils, I’ve installed nose, the testing framework, and run the smoke test for numpy (import numpy \n numpy.test('1', '10')). It passed perfectly;
  9. I’ve installed scipy from the .dmg available at SourceForge;
  10. I’ve run the smoke test for scipy (import numpy \n scipy.test('1', '10')). It smoked badly at all attempts of code weaving with C(++).

This short version of the story does not include all the comings and goings — installing Python 2.7 and discovering very late in the process its incompatibility with Ccipy for Mac, finding out that the concept of automatic uninstallation simply does not exist in Mac OS X, it has been lots of fun ! It feels just like UNIX, only worse — I have fallen in the rabbit hole and I am waiting for the floor to arrive.

I swear that I am this close to just installing Windows on this damn thing.

EDIT 4/feb: I have just found out this very useful script, Scipy Superpack for Mac, courtesy of Chris Fonnesbeck of the now defunct macinscience.org. My scipy.test() still smokes badly, but now on other, fresh new issues. Apparently, I am not alone: getting SciPy to work on Mac seems to be something short of heroic.