Wednesday, February 6, 2013

Examining the Popularity of Wikipedia Articles: Catalysts, Trends, and Applications

On February 12, 2012, news of Whitney Houston's death brought 425 hits per second to her Wikipedia article, the highest peak traffic on any article since at least January 2010.

It is broadly known that Wikipedia is the sixth most popular website on the Internet, but the English Wikipedia now has over 4 million articles and 29 million total pages. Much less attention has been given to traffic patterns and trends in content viewed. The Wikimedia Foundation makes available aggregate raw article view data for all of its projects.

This article attempts to convey some of the fascinating phenomena that underlie extremely popular articles, and perhaps more importantly to editors, discusses how this information can be used to improve the project moving forward. While some dismiss view spikes as the manifestation of shallow pop culture interests (e.g., Justin Bieber is the 6th most popular article over the past 3 years, see Tab. 2), these are valuable opportunities to study reader behavior and to shape the public perception of our projects. (...)

The origins of heightened popularity

Articles which are "extremely popular" on Wikipedia fall into the category of either (1) occasional or isolated popularity, or (2) consistent popularity.
Tab. 1. The most viewed pages on Wikipedia in a one hour period, since January 1, 2010 (excluding duplicate entries and DOS attacks) 
Whitney Houston 12 Feb 2012 1532302 425.6 Death of subject
Amy Winehouse 23 Jul 2011 1359091 377.5 Death of subject
Steve Jobs 6 Oct 2012 1063665 295.5 Death of subject
Madonna (entertainer) 6 Feb 2012 993062 275.9 Super Bowl halftime
Osama bin Laden 5 Feb 2011 862169 239.5 Death of subject
The Who 7 Feb 2010 567905 157.8 Super Bowl halftime
Ryan Dunn 20 Jun 2011 522301 145.1 Death of subject
Jodie Foster 14 Jan 2013 451270 125.4 Golden Globes speech
The prime sources of occasional or isolated popularity include:

Cultural events and deaths: The best way to reach the highest levels of Wikipedia popularity are to be a celebrity who (a) dies, or (b) plays the Super Bowl halftime show (see Tab. 1). This year's Super Bowl entertainment, Beyoncé Knowles, just missed the chart with 100-110 views/second. Generally, prominent deaths dominate the top-100 traffic events and beyond. However, less morbid events are occasionally on the same scale, such as Jodie Foster following her recent coming out at the 2013 Golden Globes, Bubba Watson upon winning the 2012 Masters Tournament, and Ice hockey at the 2010 Winter Olympics during the final match between the U.S. and Canada (all drew over 250,000 views in a single hour).

Google Doodles: Google often replaces its logo to commemorate anniversaries and other events, and clicking on the logo will usually produce the search results for that topic. With Wikipedia appearing first for many search engine queries, this can be a tremendous source of traffic. When the 110th birthday of Dennis Gabor was celebrated in this fashion on June 5, 2010, his article peaked at over 55 views per second (this for an article that currently sees only about 140 views per day). There are many other examples, including Winsor McCay on October 15, 2012, Gideon Sundback on April 24, 2012, and the London Underground last month.

Non-human views and DOS attacks: Page access data cannot distinguish between human and automated attackers. The most dramatic example occurred on March 9, 2010, when the Jyllands-Posten Muhammad cartoons controversy article saw 5.3 million views in a single hour (likely the densest view-hour at any point in Wikipedia's history). Due to the religious controversy/sensitivity surrounding the topic, this is believed to be an attack designed to prevent others from viewing the page and its associated imagery. Ironically, the Denial of Service article also appears to be a frequent target. Often, it can be hard to distinguish between malicious attacks, accidental misconfiguration (e.g. bot testing), and undiscovered catalysts of human traffic. In compiling the WP:5000/Top25Report, some discretion is applied to attempt to remove odd anomalies. For example, Cat anatomy has been a popular article in raw page views for a few months (and not only on Caturdays), after previously being much less popular.

Second screen effect: Though not nearly on the scale of the above spikes, we find that television programs and their content are reflected in page view data. This can be as broad as spikes on the Big Bang Theory article when the program airs on popular networks, but is even seen in small traffic bumps when a quiz show like Jeopardy! or Who Wants to be a Millionaire? asks about a particular topic. This phenomenon has recently been more thoroughly investigated on the German Wikipedia.[1]

Slashdot effect: When extremely popular aggregation sites like Slashdot or Reddit prominently link to Wikipedia, traffic follows. Internally, Wikipedia's Main page can have much the same effect.
Temporal patterns: The Christmas article is popular in December, Easter peaks around that holiday, and Christianity-related articles tend to see unusual amounts of Sunday traffic. This is the just the start of patterns which are reflected diurnally, annually, and at other pre-determined intervals.

by Andrew West and Milowent, Wikipedia |  Read more: