5 November 2015 at 11:23 #2308
Guillaume Van AelstParticipant
‘Predicting the Popularity of Online Content’ is an attempt to study whether online contents early behaviour can be an indicator to predict its popularity over time. The research is presenting its method for popularity prediction on two prominent content sharing websites: YouTube (video sharing platform) and Digg (news aggregator). The authors also analyse the impact that a social-network may have on the popularity of the contents.
Analysing the past popularity of online content is relatively straight forward and widely used as for example Google search algorithm presenting the most popular contents first. The authors’ objective is to be able to predict the long-term popularity of contents shortly after it has been published. This would for example enable the content provider to estimate its ad revenues or even adapt its requirements for bandwidth usage over time.
The areas investigated are:
– What measures Popularity;
– Differences of media consumption across the different platforms;
– Does the social-network of a user impact the popularity of the content;
– Working with as unbiased data as possible, consider and avoid randomness;
The article was published in 2010, but the study started in 2007 throughout 2008. The two platforms studied were created within 3-month difference (end-2004, beginning-2005) and YouTube was bought by Google end-2006 for $1.65 Billion. Both writers are HP employees in Social Computing.
The popularity of the Web 2.0 is booming at the time. It enables more and more people to share medias though different channels to reach an audience.
The title of the article is somewhat misleading as the study solely considers Digg.com and Youtube.com as web content sharing platforms. While the concept of popularity, defined as a “digg” (vote) for the first platform is clear, the concept of a “view” for the second is not as appropriate. Indeed, consider a three minutes video; 10 views of 10 seconds of that video should not be considered as “popular” as opposed to five views of the full video. Rather, the average time (in seconds for example) viewed should be more appropriate to measure popularity.
Also, while acknowledging that most YouTube views are not actually made from the website itself (source from which the researchers were taking their data), too many views are going “under the radar” (as viewed through other platforms, such as Facebook for example). YouTube should not have been used as a main subject for research.
The analysis is based on a very extensive amount of data over the study period and the authors go into great lengths to describing minutely every step taken to “normalise” the data from both platforms also considering many aspects that could skew the data (peak hours, different time zones, obsolescence of articles…). This rigorousness leads to making this journal highly technical with terms such as; “logarithmic transformation”, “cosine distance”, “Bayesian network”, “variances”… to name just a few. The graphics are over-complicated too but some, even though hard, may be understood after reading multiple times the long explanation related to each of them.
The fact that, based on the measurements of the first 2 hours of a Digg post and the first 10 days of a YouTube post, the researchers could forecast the popularity 30 days ahead of those posts (with a 10% margin error) is interesting but not really practically useful. This shows that contents are consumed in a different manner; one is news that is quickly out-dated, the other is content that can be searched for and still be relevant in the future.
Predicting the popularity of a post based on its contents (semantics) would be more useful as opposed its early behaviour. The authors acknowledge this fact in their conclusion.
The reason why those two specifically selected platform were chosen remains unclear (even if YouTube accounts for “60% of all online videos”). Many other popular platforms existed at the time, such as; Flicker, Reddit, Slashdot, Myspace, Twitter or even Facebook…
Although the two authors are HP employees, no particular reason could be found to justify the choice of specifically selecting YouTube or Digg.
The reason why HP (and not Google or some other advertiser or social media provider) would carry such a research could either propose that the company was considering entering into the advertisement business or maybe testing the benefits of its corporate social network.
The conclusion that social networks is “not effective promoting downloads on a large scale” is quite surprising and may be obsolete. Social Networks are everyday integral part of the everyday life. With so much data and contents created on a daily basis, social network may help narrow down, more than in the past, to contents that are of interest to the user hence its popularity being affected.
As outlined in the conclusion, the researchers decided “not to consider the semantics of popularity and why some submissions become more popular than others”, but the outcome (if any) of this research would have been more enriching than the actual one.
The idea that a media gets popular due to the concept of a “rich get-richer” makes sense. The question that is not properly answer in this article is why an online content gets “rich” in the first place?
You must be logged in to reply to this topic.