We had a paper accepted at the ACM WebScience Conference 2014 (WebSci 2014), a study on how much visual attributes can predict the popularity of images on Pinterest (measured as the number of repins). We found that social attributes are more predictive to popularity than automatically extracted visual attributes (not very surprisingly). However, for the heavily followed users, visual attributes respond for a considerable fraction of the deviation from the expected behavior, after we factor out the most predictive social attribute (number of followers). This is shown in the featured image of this blog entry.
Here’s the full abstract of the paper:
Little is known on how visual content affects the popularity on social networks, despite images being now ubiquitous on the Web, and currently accounting for a considerable fraction of all content shared. Existing art on image sharing focuses mainly on non-visual attributes. In this work we take a complementary approach, and investigate resharing from a mainly visual perspective. Two sets of visual features are proposed, encoding both aesthetical properties (brightness, contrast, sharpness, etc.), and semantical content (concepts represented by the images). We collected data from a large image-sharing service (Pinterest) and evaluated the predictive power of different features on popularity (number of reshares). We found that visual properties have low predictive power compared that of social cues. However, after factoring-out social influence, visual features show considerable predictive power, especially for images with higher exposure, with over 3:1 accuracy odds when classifying highly exposed images between very popular and unpopular.
The paper was a cooperation between my post-doc Dr. Sandra Avila and I, from the RECOD Lab here at the State University of Campinas, and master student, and students Luam Totti and Felipe Costa, and Profs. Wagner Meira Jr. and Virgílio Almeida, from the InWeb — National Institute of Science and Technology for the Web at the Federal University of Minas Gerais.
In accordance to our policy of improving the reproducibility of our published results, both the data, and the code of the paper are available. Due to the restrictions of FigShare, the dataset is on a fragmented zipped SQL dump. I thank Luam Totti very much for agreeing in putting the effort to make that possible.