Campinas, the new Silicon Valley ?

I have been bombarded so much by radiant news about Brazil that my stance about the country has slowly changed from cynically pessimistic to cautiously optimistic.

The latest attack on my scepticism comes from The Washington Post, and talks about the town of Campinas, where my University is located. The piece is not exactly new, but only came to my awareness today :

Brazil: Home of the next Mark Zuckerberg?

(N.B.: those bombastic titles don’t exactly help me to take the argument seriously.)

By , Published: March 21

Silicon Valley has led the world in innovation and entrepreneurship because of its culture of information sharing and mentoring. No other region in the world is like it. But things are changing. (…) One of the most impressive examples of this is in Campinas, Brazil—a small university town on the outskirts of Sao Paulo.

In June 2010, ten startups at the Softex incubator at the Universidade Estadual de Campinas decided to break free from the university incubator they were housed in and form an entrepreneurial co-op of sorts, called the Associação Campinas Startups. Instead of relying on local business executives and professors to guide them, the entrepreneurs decided to learn from each other. (Read the rest)

I am, of course, delighted by the news, and I am sure the people both at Softex and at the new Associação Campinas Startups are doing a wonderful job. Last year, I’ve got acquainted with professionals and students involved in Campinas-area entrepreneurship projects, and I was very impressed by their enthusiasm and commitment. The administrative environment, however, has to help, or at very least, it has to avoid disrupting. I would be more willing to believe in our chances in the global innovation competition if it weren’t for our Kafkaesque bureaucracy and titanic tax burdens.

Wadhwa ends his piece with the optimistic prediction that “by the end of this decade, we will see some Mark Zuckerbergs emerging from the slums of Sao Paulo or New Delhi, India or Valparaiso, Chile”. My guess is that he’s yet to realize the depth of the inequality in Brazilian society. I don’t think it’s impossible that the next Zuckerberg will come from the favelas, but the chic neighborhood of Jardins in São Paulo has much better odds. The potential favela Zuckerberg may count herself lucky if she learns enough math to properly understand a cross-multiplication of proportions. By the end of high-school (in the already unlikely scenario she completes it), her chances of actually coping with high-school level math and science are as good as those of winning the lottery.

Posted in technology | Tagged , , , , , , , | Leave a comment

Mac OS X, Word and the quest for the unbloated PDF

Hard-science scholars are strange people, who insist on using TeX because it “typesets beautifully”, but then forget to check badness warnings, letting the lines spill beyond the right margin. I have resisted TeX as much as I could, until I finally caved to peer pressure. Still, I only use it for cooperative work : when all by myself, I want something, let’s say frankly, less Jurassic.

Still, I am forced to envy my frozen-in-1975 colleagues, when I find out that saving to PDF, an operation that the industry should had gotten right by now, turns my 1 MB Microsoft Word file into an 80 MB PDF-zilla.

I’ve spent a good part of my morning solving that problem, considering both the official solution, and more independente initiatives. The official solution flunked when I’ve found out that Adobe had no trial of Acrobat for Mac (am I really willing to spent US$ 500 just to find out whether or not it’ll do what I want?). I tried a PDF compression solution, PDF Shrink, which reduced my PDF… from 89 MB to 87 MB, while mangling horribly all the images: not exactly worth the US$ 35. I’ve also tried recreating the PDF from scratch, but PDF Studio, at US$ 125, just refused to open the Word file with a cryptic ‘error reading’ message. I was glad both were in trial.

In despair, I continued searching the Web. Lots of users crying “Large PDF !”, “Word PDF too big !”, “Huge PDFs on Mac !”, but very few answers. Industry, why u no listen ?

They say that we should never attribute to malice that which can be explained by incompetence. But Hanlon’s razor notwithstanding,  I couldn’t avoid drifting into conspiracy theories. What if that horrible implementation of PDF conversion was not completely accidental ?

Conspiracy theories are unfalsifiable, of course, but I’ll tell you what finally solved the problem and you’ll tell me if it doesn’t make you itsy bitsy suspicious :

  1. On Word, instead of saving to PDF, save to PostScript (using File… Print…, and then, on the print dialog, the PDF button on the lower left corner. The Save to PostsScript is one of the options);
  2. Open the PostScript file (double-click its icon) and let Preview make the automatic conversion;
  3. On Preview, save the file as PDF (using the menu File… Save as…)

And that’s all. Now lets check the sizes :

  • Original PDF file (using Save as PDF or Print to PDF from Word) : 89 MB
  • PostScript file (Using Print to PostScript) : 94 MB
  • Final PDF file (using the steps above) : 5 MB

That is, using only tools already present in OS X, and three small steps, I’ve got an almost 18x smaller file. Risking joining the ranks of the ‘moon hoax’ lunatics, I smell something rotten in the current state of PDF conversion implemented by Word–OS X.

* * *

Incidentally, I’ve found something I also needed : how to password-protect PDFs. I was ready to buy a solution, but I’ve found that unnecessary.

When creating a new one, on OS X, you can click on the “Security Options” menu of the “Save as PDF” dialog (how come I’ve never remarked that one ?).

If the PDF exists already, you can open it with Preview, go to File… Save as, and check the box “Encrypt”. Two textboxes below let you put a password. Save the file and it will be only visible after the password is entered.

EDIT 28/02 : I am finding out that the above method is by no means foolproof, i.e., it doesn’t work for every kind of PDF. In particular, I tested it for PDFs generated by PowerPoint, and it backfired (PostScript conversion generated a file much bigger). For image loaded PDFs from PowerPoint, contrarily to the mainly textual ones from Word, I’m finding that the usual tip of using the Quartz Filter (open the PDF file with Preview, then Save As…, then select “Reduce File Size” on the Quartz Filter field in the dialog) works quite well.

Posted in technology | Tagged , , , , , , | Leave a comment

No Pardon for Turing

Despite the 20,000+ collected signatures, the British House of Lords has dismissed the motion to pardon Alan Turing:

“A posthumous pardon was not considered appropriate as Alan Turing was properly convicted of what at the time was a criminal offence,” said Justice Minister Lord McNally.

As often, the most lucid analysis comes from… Dan Savage:

It was a crime in Switzerland during the Second World War for Swiss citizens help German Jews who were fleeing the Nazis—indeed, “the law at the time required a prosecution” of any Swiss citizen who helped a Jewish refugee escape from Germany: (…) In January of 2004 the Swiss government pardoned Jakob Spirig and all other Swiss citizens who had been prosecuted for helping Jews escape Nazi Germany: (…) Question for the House of Lords: Did the Swiss government err when it pardoned Jakob Spirig? Or did you err by not pardoning Alan Turing?

2012 marks the centenary of Turing’s birth. Maybe we should try again in 2040 ?

Posted in science | Tagged , , , | 1 Comment

After SOPA / PIPA, RWS

From Information Today, the appalling Research Works Act :

H.R. 3699, the Research Works Act, was introduced Dec. 23, 2011, by Rep. Darrell Issa (R-Calif.), chairman of the Committee on Oversight and Government Reform, and committee member Rep. Carolyn Maloney (D-NY). According to the Association of American Publishers (AAP) website, “The legislation is aimed at preventing regulatory interference with private-sector research publishers in the production, peer review and publication of scientific, medical, technical, humanities, legal and scholarly journal articles.” Put another way, it is designed to thwart activities such as the National Institutes of Health (NIH) Public Access Policy, which requires scientists to submit final peer-reviewed journal manuscripts that arise from NIH funds to the digital archive PubMed Central upon acceptance for publication.

In the harsh words of The Guardian, “This is the moment academic publishers gave up all pretence of being on the side of scientists.” I, for myself, never believed their good intentions.

I you are a resident of USA, please take the time to sign the petition against RWS.

Posted in science | Tagged , , , , | Leave a comment

Buy me ! Upgrade me ! Register me !

It had happened just once before, but now that Parallels (the virtual environment of choice for Mac users) has launched its version 7, trying to use it is a constant source of irritation. Every other time I open the application, I am greeted with a huge colorful popup ad prompting me to upgrade. The “do not show me this again” checkbox is basically useless. I asked it politely to refrain from soliciting, to no avail.

I remember when shareware was a novel concept, and I used to download it by the dozens : nagging screens were part of the deal, something to be expected until you paid for a registered version. Since when those “features” have become cricket for bought wares ?

After wondering if I should open a support ticket, I’ve just reported the pestering behavior as a bug. Lets see if the development team will agree.

Posted in technology | Tagged , , | 2 Comments

Am I forgetting anything ?

I have just realized : the most important event in my professional life since the Ph.D. viva-voce defense went unannounced in this blog. I have been recently accepted as a faculty member of the Department of Computer Engineering and Industrial Automation (DCA) of the School of Electrical and Computer Engineering (FEEC) of the State University of Campinas (UNICAMP). I am now officially an absent-minded professor.

Balancing a faculty career, with research, teaching and administrative obligations is more challenging than people outside academia usually realize. For the last 5 years, I was exclusively focused on research, so I am rediscovering the thrill of being in a classroom. I am also discovering the painstaking work needed to sustain academic institutions, for, if their horizontal, democratic nature warrants their members many freedoms, they require in return much debate, discussion and politics.

Nevertheless, I am loving every minute of my new duties. I know that the passing years take their toll, but for the moment, at least, I am in my element.

Posted in blogging | Tagged , , , | 2 Comments

In a Flash

What is happening to the Firefox Flash plugin, those days ? The critter has created a life of its own. Nowadays, it’s always the same story: I am minding my own business, and all of the sudden the CPU fans start to spin, the computer gets warm, the battery starts to drain noticeably faster. I don’t bother anymore to check the processes list: the culprit is always the same.

This has lasted for the last few months, so it’s not just a rotten update: it’s a whole series of rotten updates. It has become so serious that before I found the problem I considered abandoning Firefox altogether. Nowadays I just keep Flash disabled, except when absolutely needed.

Interestingly, living without Flash has been much less nightmarish than I anticipated. Half of the time, even embedded YouTube videos will play (once you sign up for the HTML5 beta). Often, the only thing lost are the more aggressive forms of advertisement. (To which I say good riddance, of course.)

Posted in technology | Tagged , , , | Leave a comment

Good ad, creepy ad

When does effective ad sense falls right into the uncanny valley and becomes ultra creepy ad sense ?

Like, when I visit a (not that popular) website, type another URL a few minutes later and get an ad from the previously visited website ? Coincidence ?

Apparently the detrimental effect this has on consumers is old news. I can attest that I was not positively impressed.

Posted in technology | Tagged , , | Leave a comment

Recently accepted papers on ICIP and SIBGRAPI

I had a paper accepted on ICIP, the IEEE International Conference on Image Processing, by my student Sandra de Avila (whose main supervisor is my former M.Sc. supervisor Prof. Arnaldo de Araújo). Sandra is currently in France at the prestigious LIP6 lab, under the supervision of my former Ph.D. supervisior Prof. Matthieu Cord and our colleague Prof. Nicolas Thome. The paper presents an interesting extension to the “bag of visual words” approach (which is based on quantized local features using a codebook / “visual dictionary”), taking into consideration an histogram of the distances between the features effectively found on the images and the features chosen to compose the codebook. Here’s the title and abstract:

Bossa: Extended BoW Formalism for Image Classification
In image classification, the most powerful statistical learning approaches are based on the Bag-of-Words paradigm. In this article, we propose an extension of this formalism. Considering the Bag-of-features, dictionary coding and pooling steps, we propose to focus on the pooling step. Instead of using the classical sum or max pooling strategies, we introduced a density function-based pooling strategy. This flexible formalism allows us to better represent the links between dictionary codewords and local descriptors in the resulting image signature. We evaluate our approach in two very challenging tasks of video and image classification, involving very high level semantic categories with large and nuanced visual diversity.

I’ve also had two papers accepted on our counterpart national conference, SIBGRAPI. The first is the work of the Ph.D. student Ana Lopes and her R.A. Elerson Santos (supervised by Prof. Arnaldo Araújo and co-supervised by Prof. Jussara Almeida; I give her some technical and nontechnical support every now and then). It concerns the use of transfer learning of concepts from (static) image datasets to video datasets in order to recognize human actions. We show that learning the concepts present on the Caltech256 dataset allow a classifier to obtain improved results on the challenging “in the wild” human action Hollywood2 dataset.

Transfer Learning for Human Action Recognition
To manually collect action samples from realistic videos is a time-consuming and error-prone task. This is a serious bottleneck to research related to video understanding, since the large intra-class variations of such videos demand training sets large enough to properly encompass those variations. Most authors dealing with this issue rely on (semi-) automated procedures to collect additional, generally noisy, examples. In this paper, we exploit a different approach, based on a Transfer Learning (TL) technique, to address the target task of action recognition. More specifically, we propose a framework that transfers the knowledge about concepts from a previously labeled still image database to the target action video database. It is assumed that, once identified in the target action database, these concepts provide some contextual clues to the action classifier. Our experiments with Caltech256 and Hollywood2 databases indicate: a) the feasibility of successfully using transfer learningtechniques to detect concepts and, b) that it is indeed possible to enhance action recognition with the transferred knowledge of even a few concepts. In our case, only four concepts were enough to obtain statistically significant improvements for most actions.

The second is the work of my Ph.D. student Marcelo Coelho and his R.A. Cássio dos Santos Jr. (again his main supervisor is Prof. Arnaldo de Araújo). It concerns the clean-up of noisy SIFT features of street-view images (urban façades). We have found out that subspace clustering, a non-supervised technique, is able to isolate clusters of useful and non-useful SIFT features for the task of retrieving a target image. The challenge is identifying a priori which cluster is the relevant one. This work compares and contrasts two subspace clustering techniques: FINDIT (based on dimension voting) and MSSC (based on a fuzzy mean-shift).

Subspace Clustering for Information Retrieval in Urban Scene Databases
We present a comprehensive study of two important subspace clustering algorithms and their contribution to enhance results from the difficult task of matching images taken of the same object using different devices at different conditions. Our experiments were done on two distinct databases containing urban scenes which were tested using state-of-the-art matching algorithms. After initial evaluation of both datasets by that procedure, clustering algorithms were applied to them. An exhaustive comparison was performed in every cluster found and a significant amelioration in the results was obtained.

I’ll put a link to the preprints as soon as I have they become available.

Posted in publications, science | Tagged , , , , , , , , , , , , , , , , | Leave a comment

Scientific sense and hurt sensibilities

A few of my students are working on pornography detection for video sharing social networks (an early draft of our work is available on ARXIV). Pornography is a contentious issue, littered with polemic, fallacies and rethorical traps. We have tried, as much as possible, to keep away from those. We refrain, thus, from value judgements, which are the realm of Philosophy and Social Sciences, way outside our jurisdiction.

An interesting difficulty I have faced for reporting on this work was showing representative images, without hurting the sensibilities of reviewers and readers. So far, my (admittedly coward) choice has been taking the tamest images that are still representative of the phenomena I want to illustrate. For example: to illustrate that the dataset is ethnically diverse, I would chose frames where only the faces of the actors are shown; to illustrate that the dataset contains gay porn as well as straight porn, I would show a frame with the actors kissing instead of having sex; etc.

But recently, I had a tough choice to make. A student was to submit his Master disstertation to the viva-voce committee, and, as it usually happens in Brazil, he has sent me a draft for corrections and suggestions. His “Results” chapter contained, among cold graphs and tables, several very explicit images, illustrating in detail the cases of success and failure of our algorithm. The only thing is: all images contained censor bars.

I returned the draft with several corrections, among which, a note begging him to remove the bars:

Don’t censor the images — it’s extremely distasteful: this is a scientific work for an adult audience. Either remove the images entirely (if they are not needed), either keep them uncensored (don’t mess up with the data !). In the worst case, put them in an Annex or in a separate suplement.”

In the end, he’s decided to keep the images uncensored, which I feel was the right scientific decision.

Nevertheless, everytime I open his “Experimental Results” chapter I cringe a little bit. Againg admittedly cowardly I am looking forward for the defense, when I’ll be able to share the responsibility for the final decision — keeping or taking away the images from the definitive version — with the rest of the committee.

* * *

Taking a (superficial) look in the literature, I noticed that many authors (including myself) practice a form of “partial self-censorship”: choosing “tame” images, making them tiny in the page, or using washed out grayscale reproductions — a compromise between scientific truth and respect to the taboo ? Or just plain cowardice ? Most authors simply don’t include images, and a few choose to employ the censor bars. The full-fledged honesty of my student is rare.

The censor bars, IMHO, are the worst choice — at once hypocritical and unscientific. Hypocritical, because the reader can perfectly imagine what is behind them, so any of the “dirtiness” from which they would be supposedly “protecting” the reader is still being created in his or her mind. The effect is exactly the same as when using euphemisms like “f-word”: the  correct word is still created in the listener mind. Unscientific, because they count on the reader imagination (with its distortions, imprecisions, and, often, amplifications) instead of depicting precisely the phenomena at study.

Interestingly, in one paper, the authors censor the faces of the actors (by pixelization). This is an interesting choice and raises a question I have not considered: since we collect our dataset from pornography sharing social networks, we cannot assume that everyone in the video is a professional actor. I hope that none of our examples have Computer Vision scientists unaware that their amateur videos have escaped to the net !

* * *

In the end of the day, this is 2011 — 64 years since the first Kinsey report ! Shouldn’t science have got some guts by now ?

Posted in science | Tagged , , , , | 7 Comments