The devil in the details : feeding images to Theano

Another lesson from the Kaggle competition : pushing your code to ultimate speed is still labour intensive.

High-level libraries like Numpy and Theano really help, but when you are training complex deep-learning models, rushing through the epochs takes more than typing a couple of imports, enabling the GPU, and hoping for the best.

Seemingly trivial tasks — input, output, copying buffers, rearranging arrays — may easily become the critical bottleneck. Those treacherous operations always raise flags for practioners of high-performance computing. The rest of us, as we fixate on numerical computations, tend to overlook that moving data around easily dominates the costs.

Sometimes the bottleneck is a bolt out of the blue, a straightforward task that ends up being painfully slow. Case in point, the pipeline of importing an image from PIL to Theano through a Numpy array.

Opening an image using PIL is certainly very simple :

>>> from PIL import Image
>>> image = Image.open('color.png', 'r')
>>> image
<PIL.PngImagePlugin.PngImageFile image mode=RGB size=12x8 at 0x1D3E518>

PIL will be so charming as even to deduce by itself the necessary decoder from the file extension or headers.

From the returned object, you can fetch metadata :

>>> image.format
'PNG'
>>> image.getbbox()
(0, 0, 12, 8)
>>> image.getbands()
('R', 'G', 'B')

Or you can get an object that gives access to the actual pixels with :

>>> image.getdata()
<ImagingCore object at 0x7f21e5bb2350>

If you want to use the image as input for a model in Theano, you’ll need first to convert it to a Numpy array. This can very easily be done with :

>>> import numpy
>>> array1 = numpy.asarray(image)
>>> array1 
array([[[255,   0,   0],
        [255,   0,   0],
        [128,   0,   0],
        [128,   0,   0],
        [  0,   0,   0],
        [ 64,  64,  64],
        [128, 128, 128],
        [255, 255, 255],
        [  0,   0, 255],
        [  0,   0, 255],
        [  0,   0, 128],
        [  0,   0, 128]],

       [[255,   0,   0],
        [255,   0,   0],
        [128,   0,   0],
        [128,   0,   0],
        [  0,   0,   0],
        [ 64,  64,  64],
        [128, 128, 128],
        [255, 255, 255],
        [  0,   0, 255],
        [  0,   0, 255],
        [  0,   0, 128],
        [  0,   0, 128]],

       [[255,   0,   0],
        [255,   0,   0],
        [128,   0,   0],
        [128,   0,   0],
        [  0,   0,   0],
        [ 64,  64,  64],
        [128, 128, 128],
        [255, 255, 255],
        [  0,   0, 255],
        [  0,   0, 255],
        [  0,   0, 128],
        [  0,   0, 128]],

       [[255,   0,   0],
        [255,   0,   0],
        [128,   0,   0],
        [128,   0,   0],
        [  0,   0,   0],
        [ 64,  64,  64],
        [128, 128, 128],
        [255, 255, 255],
        [  0,   0, 255],
        [  0,   0, 255],
        [  0,   0, 128],
        [  0,   0, 128]],

       [[255, 255,   0],
        [255, 255,   0],
        [  0, 255, 255],
        [  0, 255, 255],
        [  0, 255,   0],
        [  0, 255,   0],
        [  0, 128,   0],
        [  0, 128,   0],
        [128, 128,   0],
        [128, 128,   0],
        [  0, 128, 128],
        [  0, 128, 128]],

       [[255, 255,   0],
        [255, 255,   0],
        [  0, 255, 255],
        [  0, 255, 255],
        [  0, 255,   0],
        [  0, 255,   0],
        [  0, 128,   0],
        [  0, 128,   0],
        [128, 128,   0],
        [128, 128,   0],
        [  0, 128, 128],
        [  0, 128, 128]],

       [[255,   0, 255],
        [255,   0, 255],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0, 255,   0],
        [  0, 255,   0],
        [  0, 128,   0],
        [  0, 128,   0],
        [128,   0, 128],
        [128,   0, 128],
        [255, 255, 255],
        [255, 255, 255]],

       [[255,   0, 255],
        [255,   0, 255],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0, 255,   0],
        [  0, 255,   0],
        [  0, 128,   0],
        [  0, 128,   0],
        [128,   0, 128],
        [128,   0, 128],
        [255, 255, 255],
        [255, 255, 255]]], dtype=uint8)

You may, at this point, specify a data type. For injecting images into Theano, we’ll often want to convert them to float32 to allow GPU processing :

array1 = numpy.asarray(image, dtype='float32')

Or for enhanced portability, you can use Theano’s default float datatype (which will be float32 if the code is intended to run in the current generation of GPUs) :

import theano
array1 = numpy.asarray(image, dtype=theano.config.floatX)

Alternatively, the array can be converted from the ImagingCore object :

>>> array2 = numpy.asarray(image.getdata())
>>> array2
array([[255,   0,   0],
       [255,   0,   0],
       [128,   0,   0],
       [128,   0,   0],
       [  0,   0,   0],
       [ 64,  64,  64],
       [128, 128, 128],
       [255, 255, 255],
       [  0,   0, 255],
       [  0,   0, 255],
       [  0,   0, 128],
       [  0,   0, 128],
       [255,   0,   0],
       [255,   0,   0],
       [128,   0,   0],
       [128,   0,   0],
       [  0,   0,   0],
       [ 64,  64,  64],
       [128, 128, 128],
       [255, 255, 255],
       [  0,   0, 255],
       [  0,   0, 255],
       [  0,   0, 128],
       [  0,   0, 128],
       [255,   0,   0],
       [255,   0,   0],
       [128,   0,   0],
       [128,   0,   0],
       [  0,   0,   0],
       [ 64,  64,  64],
       [128, 128, 128],
       [255, 255, 255],
       [  0,   0, 255],
       [  0,   0, 255],
       [  0,   0, 128],
       [  0,   0, 128],
       [255,   0,   0],
       [255,   0,   0],
       [128,   0,   0],
       [128,   0,   0],
       [  0,   0,   0],
       [ 64,  64,  64],
       [128, 128, 128],
       [255, 255, 255],
       [  0,   0, 255],
       [  0,   0, 255],
       [  0,   0, 128],
       [  0,   0, 128],
       [255, 255,   0],
       [255, 255,   0],
       [  0, 255, 255],
       [  0, 255, 255],
       [  0, 255,   0],
       [  0, 255,   0],
       [  0, 128,   0],
       [  0, 128,   0],
       [128, 128,   0],
       [128, 128,   0],
       [  0, 128, 128],
       [  0, 128, 128],
       [255, 255,   0],
       [255, 255,   0],
       [  0, 255, 255],
       [  0, 255, 255],
       [  0, 255,   0],
       [  0, 255,   0],
       [  0, 128,   0],
       [  0, 128,   0],
       [128, 128,   0],
       [128, 128,   0],
       [  0, 128, 128],
       [  0, 128, 128],
       [255,   0, 255],
       [255,   0, 255],
       [  0,   0,   0],
       [  0,   0,   0],
       [  0, 255,   0],
       [  0, 255,   0],
       [  0, 128,   0],
       [  0, 128,   0],
       [128,   0, 128],
       [128,   0, 128],
       [255, 255, 255],
       [255, 255, 255],
       [255,   0, 255],
       [255,   0, 255],
       [  0,   0,   0],
       [  0,   0,   0],
       [  0, 255,   0],
       [  0, 255,   0],
       [  0, 128,   0],
       [  0, 128,   0],
       [128,   0, 128],
       [128,   0, 128],
       [255, 255, 255],
       [255, 255, 255]])

Note that the techniques are not exactly interchangeable. Converting directly from the Image object preserves the shape, while converting from .getdata() creates a flatter array :

>>> array1.shape
(8, 12, 3)
>>> array2.shape
(96, 3)

In both cases the image channels are “interleaved”, i.e., for each row and column of the image, the values of the channels (in this example: red, green and blue) appear in sequence. This organization is helpful if you are doing local transformations in the pixels (say, changing colorspaces) as it keeps referential locality. For deep learning, however, that arrangement is a nuisance.

Even with 2D input images, we will typically want “3D” convolutional filters in order to reach through — and mix and match — the entire stack of input channels. We need the input channels separated by planes, i.e., all red data, then all green data, etc. That is a breeze with rollaxis and reshape :

>>> array1 = numpy.rollaxis(array1, 2, 0)
>>> array2 = array2.T.reshape(3,8,12)

Some neural nets go one step further and stack together all the images from a given batch into a single tensor. The LeNet/MNIST sample from deeplearning.net does exactly that. That strategy can increase GPU utilization by giving it bigger chunks to chew at once. If you decide to adopt it, it’s not difficult to assemble the tensor :

ImageSize = (512, 512)
NChannelsPerImage = 3
imagesData = [ Image.open(f, 'r').getdata() for f in batch ]
for i in imagesData :
    assert i.size == ImageSize
    assert i.bands == NChannelsPerImage

allImages = numpy.asarray(imagesData)
nImages = len(batch)
allImages = numpy.rollaxis(allImages, 2, 1).reshape(nImages, NChannelsPerImage, ImageSize[0], ImageSize[1])
print allImages.shape

The code above checks that all images conform to a given shape (essential to convolutional networks, which are very rigid about input sizes). And it works… but it will rather walk than run.

As you try to discover what is holding back the speed, it is easy to suspect the array reshaping operations, but the real culprit is the innocent-loking image conversion to array. For some reason importing images to Numpy either from Image objects or from ImagingCore objects — as we have been trying so far — takes an absurd amount of time.

The solution is not exactly elegant but it makes the conversion so much faster, you might want to consider it. You have to bridge the conversion with a pair of .tostring / .fromstring operations :

ImageSize = (512, 512)
NChannelsPerImage = 3
images = [ Image.open(f, 'r') for f in batch ]
for i in images :
    assert i.size == ImageSize
    assert len(i.getbands()) == NChannelsPerImage

ImageShape =  (1,) + ImageSize + (NChannelsPerImage,)
allImages = [ numpy.fromstring(i.tostring(), dtype='uint8', count=-1, sep='') for i in images ]
allImages = [ numpy.rollaxis(a.reshape(ImageShape), 3, 1) for a in allImages ]
allImages = numpy.concatenate(allImages)

The snippet above has exactly the same effect than the previous one, but it will run up to 20 times faster. In both cases, the array will be ready to be fed to the network.

* * *

TL;DR ?

If speed is important to you, do not convert an image from PIL to Numpy like this…

from PIL import Image
import numpy
image = Image.open('color.png', 'r')
array = numpy.asarray(image)

…nor like this…

from PIL import Image
import numpy
imageData = Image.open('color.png', 'r').getdata()
imageArray = numpy.asarray(imageData).reshape(imageData.shape + (imageData.bands,))

…because although both methods work, they will be very very slow. Do it like this :

from PIL import Image
import numpy
image = Image.open('color.png', 'r').getdata()
imageArray = numpy.fromstring(image.tostring(), dtype='uint8', count=-1, sep='').reshape(image.shape + (len(image.getbands()),))

It will be up to 20⨉ faster.

(Also, you’ll probably have to work on the shape of the array before you feed it to a convolutional network, but for that, I’m afraid, you’ll have to read the piece from the top.)

Deep Neural Network

From instance launch to model accuracy: an AWS/Theano walkthrough

My team has recently participated at Kaggle’s Diabetic Retinopathy challenge, and we won… experience. It was our first Kaggle challenge and we found ourselves unprepared for the workload.

But it was fun — and it was the opportunity to learn new skills, and to sharpen old ones. As the deadline approached, I used Amazon Web Services a lot, and got more familiar with it. Although we have our GPU infrastructure at RECOD, the extra boost provided by AWS allowed exploring extra possibilities.

But it was in the weekend just before the challenge deadline that AWS proved invaluable. Our in-house cluster went AWOL. What are the chances of having a power outage bringing down your servers and a pest control blocking your physical access to them in the weekend before a major deadline ? Murphy knows. Well, AWS allowed us to go down fighting, instead of throwing in the towel.

In this post, I’m compiling Markus Beissinger’s how-to and deeplearning.net tutorials into a single hyper-condensed walkthrough to get you as fast as possible from launching an AWS instance until running a simple convolutional deep neural net. If you are anything like me, I know that you are aching to see some code running — but after you scratch that itch, I strongly suggest you to go back to those sources and study them at leisure.

I’ll assume that you already know :

  1. How to create an AWS account ;
  2. How to manage AWS users and permissions ;
  3. How to launch an AWS instance.

Those preparations out of way, let’s get started ?

Step 1: Launch an instance at AWS, picking :

  • AMI (Amazon Machine Image) : Ubuntu Server 14.04 LTS (HVM), SSD Volume Type – 64-bit
  • Instance type : GPU instances / g2.2xlarge

For the other settings, you can use the defaults, but be careful with the security group and access key to not lock yourself out of the instance.

Step 2 : Open a terminal window, and log into your instance. In my Mac I type :

ssh -i private.pem ubuntu@xxxxxxxx.amazonaws.com

Where private.pem is the private key file of the key pair used when creating the instance, and xxxxxxxx.amazonaws.com is the public DNS of the instance. You might get an angry message from SSH, complaining that your .pem file is too open. If that happens, change its permissions with :

chmod go-rxw private.pem

Step 3 : Install Theano.

Once you’re inside the machine, this is not complicated. Start by making the machine up-to date :

sudo apt-get update
sudo apt-get -y dist-upgrade

Install Theano’s dependencies :

sudo apt-get install -y gcc g++ gfortran build-essential git wget linux-image-generic libopenblas-dev python-dev python-pip python-nose python-numpy python-scipy

Get the package for CUDA and install it :

wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/cuda-repo-ubuntu1404_7.0-28_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1404_7.0-28_amd64.deb
sudo apt-get update
sudo apt-get install -y cuda

This last command is the only one that takes some time — you might want to go brew a cuppa while you wait. Once it is over, put CUDA on the path and reboot the machine :

echo 'export PATH=/usr/local/cuda/bin:$PATH' >> .bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> .bashrc
sudo reboot

Log into the instance again and query for the GPU :

nvidia-smi -q

This should spit a lengthy list of details about the installed GPU.

Now you just have to install Theano. The one-liner below installs the latest version, and after the wait for the CUDA driver, runs anticlimactically fast :

sudo pip install --upgrade --no-deps git+git://github.com/Theano/Theano.git

And that’s it ! You have Theano on your system.

Step 4 : Run an example.
Let’s take Theano for a run. The simplest sample from deeplearning.net that’s already interesting is the convolutional/MNIST digits tutorial. The sample depends on code written in the previous tutorials, MLP and Logistic Regression, so you have to download those too. You also have to download the data. The commands below do all that:

mkdir theano
mkdir theano/data
mkdir theano/lenet
cd theano/data
wget http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz
cd ../lenet
wget http://deeplearning.net/tutorial/code/logistic_sgd.py
wget http://deeplearning.net/tutorial/code/mlp.py
wget http://deeplearning.net/tutorial/code/convolutional_mlp.py

Finally, hit the magical command :

python convolutional_mlp.py

What ? All this work and the epochs will go by as fast as molasses on a winter’s day. What gives ?

You have to tell Theano to run on the GPU, otherwise it will crawl on the CPU. You can paste the lines below into your ~/.theanorc file :

[global]
floatX=float32
device=gpu

[lib]
cnmem=0.9

[nvcc]
fastmath=True

…or you can use the one-liner below to create it:

echo -e '[global]\nfloatX=float32\ndevice=gpu\n\n[lib]\ncnmem=0.9\n\n[nvcc]\nfastmath=True' > ~/.theanorc

Try running the example again.

python convolutional_mlp.py

With some luck, you’ll note two differences: first, Theano will announce the use of the GPU…

Using gpu device 0: GRID K520 (CNMeM is enabled)

…and second, the epochs will run much, much faster !

(Image credit : networkworld.com).

Keeping your iPhone alive in France — Part III

I’m back in France, this time for a leisure trip (my first real vacation in Europe since I graduated here as a Ph.D.) Yes, it’s Winter, but promenading through the streets of Paris without having to worry about the next meeting / deadline / academic obligation is a nice change of pace, cold and rain notwithstanding.

I’m skeptical about the concept of unconnected vacations — in my leisure time I still want to have access to the hive mind, lest my IQ drop a full 30 points. But having a decent data plan in France without breaking the bank is not necessarily obvious, as I’ve been exploring for some time.

mobicarte-holidayOrange Telecom, the main cell-phone company of France, has finally waken-up to the reality that 80M tourists come to France every year — 20% more people than the country own population of 66M. They now propose the Mobicarte Holiday, a pre-paid SIM card loaded with 2h of calls, 1000 SMS, and 1GB of cell data, and unlimited access to Orange Wifi Hotspots — at a price of 40€. If you already have a working Mobicarte you can buy a “Holiday recharge” for 30€. After you use all your credit (or after your credit expires) you can reload the Mobicarte with either the “Holiday” or a normal recharge.

That package is not exactly the cheapest, but it is the most convenient I’ve experienced so far : an offer completely adapted to the needs of the traveller.

Well, almost…

The first limitation is that you’ll probably won’t be able to buy the Mobicarte Holiday from Orange Online store — unless you have a French credit card. This prevents having it delivered directly to your hotel. To compensate that inconvenience, several Orange physical stores are open from Monday to Saturday, until 19h30. I had no difficulty in buying it at the physical store at boulevard Haussmann on a Saturday afternoon.

The second limitation is much more irritating : you won’t be able to connect to the Orange Wifi Hotspots — unless you have a French credit card ! In order to get access to the Wifi hotspots, Orange forces you to install an iOS app — “Mon Réseau” (My Network) — but that app is only available at the French App Store ! Here the synergy of Orange’s nearsightedness and Apple’s greediness creates the perfect storm, as you won’t be able to create an Apple ID for the French Apple store unless you enter a credit card valid in France. (My love–hate relationship with Apple has such a healthy dose of hate because of those things.)

Finally, the kick on the shins : the Holiday credit is valid for meager 14 days, so for longer trips you’ll have to keep buying recharges.

Is there any silver linings ? Well, the SIM card itself will remain valid for 6 months after the last recharge. The price of 20€ a week is still 3 times cheaper than the data roaming offer of my Brazilian operator (Vivo Telecom). You can get a Mobicarte in any one of the mini-, micro-, and nano-SIM formats : you won’t have to deal with SIM clippers (or worse : a sharp kitchen knife and a steady hand). In addition, the 3G Internet offer takes effect immediately (some previous Internet options of Orange took up to 3 days to kick in.)

The Mobicart Holiday is far from perfect, but it’s still the most traveller-friendly offer by Orange France I’ve experienced so far.

Printing Multiple Copies of a Single Page on a Sheet in OS X

This is something that was making me crazy : getting multiple copies of a page into a single sheet in OS X — think of small fliers or business cards. The problem was particularly annoying in Adobe Creative Suite (Illustrator, Photoshop, InDesign), where I hoped (in vain) to find an option to do it easily. The straightforward solution (asking on the system Print dialog for multiple pages per sheet, and then asking for multiple copies) doesn’t really work.

The answer is as simple as Columbus’ egg : convert the document to PDF, duplicate the pages manually, and then ask for multiple pages per sheet. It works like a charm !

Need more guidance ? You’re in luck, for I did a video tutorial (my very first — be forgiving) to show the process step by step :

I’m demonstrating the solution on Illustrator CS5, but it works for any page that can be rendered on a PDF, so not only for Illustrator or Photoshop, but also for Microsoft Word and PowerPoint, or Apple Pages and Keynote.

Edit 8/dec : there’s a simpler procedure than the one explained above. Once you open the PDF in Preview, don’t duplicate the pages, and choose the number of Copies per page on the Preview tab (not the Layout tab) on the Print system dialog. I would have completely overlooked this if it weren’t for a helpful YouTube commenter, who also suggests that you can avoid the intermediate PDF step in Microsoft Word by, for example, putting  “1, 1, 1, 1” in Page Range (Copies & Pages tab), and then selecting 4 Pages per Sheet (Layout tab).

Android Smart Girls — Finishing Line !

Yesterday, the project Android Smart Girls crossed the finishing line, with an amazing prize award ceremony. As you might remember, this was the pilot for an extra-curricular computer programming activity for high school girls. On its first phase, the girls had classes on the MIT App Inventor ; on the second phase, they proposed and implemented their own apps with the help of mentors.

The project was an initiative of Prof. Juliana Borin (Institute of Computing / UNICAMP), the girls from IEEE Women in Engineering South Brazil, and IEEE WIE founder (now at SAMSUNG Research Brazil) Dr. Vanessa Testoni, in cooperation with Hilton Federici State High School, at Campinas, and many, many, many wonderful, generous volunteers. The project was supported by SAMSUNG and by a grant from CNPq.

The project leaders, project contributors, and I are working to document the initiative into an open courseware that make possible to reproduce it in other schools throughout Brazil. The project leaders and I also want to ensure that all the many contributors to the project — from Hilton Federici High, from IEEE WIE, from UNICAMP, from SAMSUNG — get their work acknowledged.

Stay tuned !

This slideshow requires JavaScript.

Dr. Sandra Avila, mentor of the winning team, is my former Ph.D. student, and current postdoc. Ms. Nadja Ramos, the other mentor, is doing her capstone undergraduate project under my supervision (her capstone project, incidentally, is directly related to the Smart Girls initiative). Needless to say, I was proud as a peacock.

"Upgrade to Yosemite," they said.
"It will be fun,", they said.

Back in the developer’s saddle in Yosemite ; Installing Maven on OS X

When Mavericks launched, I scheduled doing a clean reinstall over a blank, reformatted HD. (Due to the degradation of configurations, permissions, and other metadata, a system may suffer to something akin to a long-term aging effect. A reinstallation from scratch is a way to freshen it up.) The task, however, was marked “low priority” in my To Do list. The result : last week I was forced to upgrade to Yosemite, and still no reformatting.

As I explained in that post, I’ve noticed a trend of CS/IT professionals being the most reluctant users in updating to the latest hardware or software. Yosemite justified that reluctance, by breaking my HomeBrew installation. The reason : HomeBrew explicitly links to Ruby 1.8, which is obliterated by Yosemite in favor of Ruby 2.0. (Hey, Apple, word of advice : it’s no use having a sophisticated system of coexisting Framework versions if you decide on a whim to delete the older versions.)

I had experienced some  minor inconveniences before I encountered this problem. In the text that follows, I assume that you have already dealt with the following :

  1. Updating Xcode on App Store (Menu Apple … App Store…; tab Updates) ;
  2. Re-accepting the terms and conditions of Xcode : neither Xcode nor its command-line tools will run before will sell your soul to Apple again. And even if you have administrator permissions, you have to sudo a command-line tool to be able to do it. You’ll see an ugly message like : “Agreeing to the Xcode/iOS license requires admin privileges, please re-run as root via sudo.” Either re-execute the command with sudo (e.g., sudo make), or accept the agreement in the Xcode graphical app ;
  3. (Possibly ?) Reinstalling Java VM from Oracle. This might just be an issue for web browsing ; maybe the VM works on the command-line out of the box : I didn’t check it. But if you type java on Terminal and nothing happens, chances are you’ll need to get it before being able to do anything interesting.

The bad news : the only way I could get HomeBrew back to work was reinstalling Ruby 1.8.

The good news (if you have a Time Machine) : doing it is a breeze. Just restore the folder /System/Library/Frameworks/Ruby.framework/Versions/1.8/ to its rightful place.

If you don’t have a Time Machine (how do you even survive on OS X without one ?!), maybe you have an old MacBook stored in a cupboard ? Or an upgrade-averse friend who has not yet moved to Yosemite ? (Hint : do you know anyone who works on CS/IT ?) Get a copy of that folder and put it back where it belongs.

If you can’t get your hands on that folder anywhere, you’re probably out of luck. You might be able to fish the framework out of the installer packages of an older OS X version, but just thinking about it makes me want to cry. Maybe you can wait for HomeBrew to issue a patch ?

With Ruby 1.8 back in place, things become very straightforward. Just to be sure, run the following commands :

brew update

brew doctor

And check if there are any remaining issues to be solved. (By the by, you don’t have to try and solve every minor problem : in computing as in medicine, minimal intervention is often wise.)


All this marathon started when I needed to install Maven on my system.

With HomeBrew working, this takes an one-liner :

brew install maven

The installation worked without issues, but for some reason, Maven kept complaining that the JAVA_HOME environment variable was broken:

Error: JAVA_HOME is not defined correctly.
  We cannot execute /usr/libexec/java_home/bin/java

Naïvely setting JAVA_HOME to /usr let Maven run, but with an irritating warning :

Unable to find a $JAVA_HOME at '/usr', continuing with system-provided Java...

What solved the problem completely was adding this line to ~/.bash_profile :

export JAVA_HOME=`/usr/libexec/java_home`
"Upgrade to Yosemite," they said. "It will be fun,", they said.

“Upgrade to Yosemite,” they said.
“It will be fun,”, they said.

…but if you experience the same problem, you’ll first need to check where the java_home util (it prints the path of the Java VM on stdout) actually is in your system. If /usr/libexec/java_home runs, the solution above will probably work.

Upgrade cascade : iPhone, Yosemite, iPhoto, iMovie

I’ve noticed a consistent trend of my colleagues and I, Computer Sciences / Engineering faculty, being way less eager than the general public in updating to the latest hardware or software. There is, maybe, a component of the shoemaker’s son going barefoot, but most importantly — I suspect — it’s the knowledge on sausage-making impairing our appetites. When you know the reality of system design intimately, you become very reluctant in disturbing whatever metastability you might have reached.

But all systems have a service life, and eventually even the most reluctant user will be forced to upgrade. After skipping 2 generations, I thought it was time to abandon my iPhone 4S for a new iPhone 6.

(Which was an adventure in itself : amazingly, after almost 2 months, there are still queues for buying an iPhone on the States. So far, ok — supply and demand, etc. — but for some unfathomable reason, Apple has instructed their clerks to outright lie about the non-contract T-Mobile iPhone, in saying that it is not unlocked.  After some googling and whatsapping with friends, the truth emerged : it is unlocked. Still, at the first Apple Store I tried, the clerks where very non-cooperative, and one of them positively adversarial, like he’d rather not sell anything to me. I am really not the type of person to buy into this “privilege to be a customer” attitude, so I just went to another store. Long story short : two days and 830 bucks later, I had an iPhone 6 in my pocket. It is indeed unlocked, I had it working with my Vivo telecom nano-SIM immediately, still inside the store.)

But as often it happens, one upgrade leads to another in cascade effect : the iPhone rejected my old iTunes, forcing me to upgrade old faithful Mountain Lion to Yosemite.

Update Unavailable with This Apple IDAs if to confirm that upgrading is a messy business, Yosemite got me a great welcoming surprise : it disabled my old iPhoto (“incompatible with new OS version, must be updated”), and made it impossible for me to update it (“Update Unavailable with This Apple ID”). For some strange reason, the App Store utility insisted on that message, no matter which Apple ID I used (I only have two).

Apparently this is not a rare situation, and the causes and solutions are exasperatingly diverse. What solved the problem in my case, was closing the App Store, deleting iPhoto altogether (dragging the disabled application to the trash), opening the App Store again, and doing a fresh install. The procedure itself is not very painful, I concede : the annoyance is having to find out what exactly to do.

For upgrading iMovie, the solution was not so simple. It is not a mandatory upgrade (the Mountain Lion version still works with Yosemite), but since I had gone so far, I now wanted to go all the way. Deleting iMovie made available a fresh install on App Store… for 15 bucks. No good. I’ve tried, as some suggested by some users, reinstalling the original (from the Snow Leopard CDs in my case), but to no avail. In the end, I just moved the old Mountain Lion iMovie from the trash back to the Applications folder.

Curiously, XCode, which is normally a trouble-maker, updated without further ado.

Edit 19/11 : upgrading to Yosemite 10.10.1 solved the iMovie Apple ID issue. I’m guessing it would have solved the iPhoto issue as well. This is another golden rule of upgrading — never move to the version with a round number, always wait for the next minor patch.

Paper at SISAP’2014 on large-scale LSH for general metric data

We’ve got a paper — Large-Scale Distributed Locality-Sensitive Hashing for General Metric Data — accepted at the upcoming International Conference on Similarity Search and Applications (SISAP’2014).

I’ll be presenting the paper next week : if you’re planning to be there, please, come to say hi ! My session will be on Wednesday afternoon (October 30th at 14h).

The paper is part of an ongoing cooperation with my colleague Prof. George Teodoro (University of Brasilia), with my former M.Sc. student Eliezer Silva, and Petrobras researcher Thiago Teixeira. Here’s the abstract :

Locality-Sensitive Hashing (LSH) is extremely competitive for similarity search, but works under the assumption of uniform access cost to the data, and for just a handful of dissimilarities for which locality-sensitive families are available. In this work we propose Parallel Voronoi LSH, an approach that addresses those two limitations of LSH: it makes LSH efficient for distributed- memory architectures, and it works for very general dissimilarities (in particular, it works for all metric dissimilarities). Each hash table of Voronoi LSH works by selecting a sample of the dataset to be used as seeds of a Voronoi diagram. The Voronoi cells are then used to hash the data. Because Voronoi diagrams depend only on the distance, the technique is very general. Implementing LSH in distributed-memory systems is very challenging because it lacks referential locality in its access to the data: if care is not taken, excessive message-passing ruins the index performance. Therefore, another important contribution of this work is the parallel design needed to allow the scalability of the index, which we evaluate in a dataset of a thousand million multimedia features.

The fullpaper is available at the conference proceedings (LNCS 8821) and will be on open access from October 20 to November 21, 2014. The last preprint is also available on my publications page.

RECODMediaEvalDistinctiveMention

We can’t tell you just yet…

(This entry is cross-posted from my lab’s blog.)

Anyone who’s ever worked in the frontier between Science and Innovation has faced the dilemma of secrecy versus disclosure : the scientific spirit demands full publication of every implementation detail — a result that cannot be reproduced is not a result — but when you are seeking intellectual property rights, you are often forced to withhold some details until you’ve got that patent.

We have faced that quandary during our participation in MediaEval’s Violence Detection task : the Scientist in us wanted to just tell everything. But the research project that resulted in our participation in that competition is not just a scientific project, it is also about innovation, in partnership with Samsung Research Institute Brazil. As such, some details had to remain concealed, much to the frustration of everyone’s curiosity.

Fortunately, the task organizers took it in stride :

RECODMediaEvalDistinctiveMention

…that good-natured ribbing got everyone laughing at the task closing meeting !

We are sorry for the teasing, guys. We promise we will tell you everything soon… just not yet.

(Kudos to Mats and Martha for their good humor !)

Associate director of undergraduate studies

For the next few months I’ll be occupying the position of associate director of undergraduate studies of the Computer Engineering course, left by Prof. Ivan Ricarte, who got his full professorship at another academic unit of UNICAMP. Currently, the director is Prof. Helio Pedrini of the Institute of Computing. Prof. Akebo Yamakami has kindly accepted to be my “vice-associate”, an informal position that exists due to the direction being shared between two academic units. This is good news, because I’m a rookie in what concerns academic administration, while Prof.  Yamakami has been involved in undergraduate studies direction since… forever. His experience will be inestimable.

I was appointed by the Electrical and Computer Engineering School steering committee in an indirect election, for a provisional mandate. Next June, the entire electoral college (faculty, staff and students) will vote for the next director here at FEEC, and for the next associate director at Institute of Computing, since the positions switch between the two units at the end of the mandates.  (I know, I know — it’s complicated — but you get used to the idiosyncrasies of Brazilian public administration after a while…)

I thank my colleagues of the steering committee for their trust.