Tuesday, July 31, 2012

Can an Algorithm be Wrong?

[ed. For example, from today's NY Times, see: Social Media Are Giving a Voice to Taste Buds.]

Throughout the Occupy Wall Street protests, participants and supporters used Twitter (among other tools) to coordinate, debate, and publicize their efforts. But amidst the enthusiasm a concern surfaced: even as the protests were gaining strength and media coverage, and talk of the movement on Twitter was surging, the term was not “Trending.” A simple list of ten terms provided by Twitter on their homepage, Twitter Trends digests the 250 million tweets sent every day and indexes the most vigorously discussed terms at that moment, either globally or for a user’s chosen country or city. Yet, even in the cities where protests were happening, including New York, when tweets using the term #occupywallstreet seem to spike, the term did not Trend. Some suggested that Twitter was deliberately dropping the term from its list, and in doing so, preventing it from reaching a wider audience.

The charge of censorship is a revealing one. It suggests, first, that many are deeply invested in the Twitter network as a political tool, and that some worry that Twitter’s interests might be aligned with the financial and political status quo they hope to challenge. But it reveals something else about the importance and the opacity of the algorithm that drives the identification of Trends. To suggest that the best or only explanation of #occupywallstreet’s absence is that Twitter “censored” it implies that Trends is otherwise an accurate barometer of the public discussion. For some, this glitch could only mean deliberate human intervention into what should be a smoothly-running machine.

The workings of these algorithms are political, an important terrain upon which political battles about visibility are being fought (Grimmelmann 2009). Much like taking over the privately owned Zuccotti Park in Manhattan in order to stage a public protest, more and more of our online public discourse is taking place on private communication platforms like Twitter. These providers offer complex algorithms to manage, curate, and organize these massive networks. But there is a tension between what we understand these algorithms to be, what we need them to be, and what they in fact are. We do not have a sufficient vocabulary for assessing the intervention of these algorithms. We’re not adept at appreciating what it takes to design a tool like Trends – one that appears to effortlessly identify what’s going on, yet also makes distinct and motivated choices. We don’t have a language for the unexpected associations algorithms make, beyond the intention (or even comprehension) of their designers (Ananny 2011). Most importantly, we have not fully recognized how these algorithms attempt to produce representations of the wants or concerns of the public, and as such, run into the classic problem of political representation: who claims to know the mind of the public, and how do they claim to know it?  (...)

Twitter explains that Trends is designed to identify topics that are enjoying a surge, not just rising above the normal chatter, but doing so in a particular way. Part of the evaluation includes: Is the use of the term spiking, i.e. accelerating rapidly, or is its growth more gradual? Are the users densely interconnected into a single cluster, or does the term span multiple clusters? Are the tweets unique content, or mostly retweets of the same post? Is this the first time the term has Trended? (If not, the threshold to Trend again is higher.) So this list, though automatically calculated in real time, is also the result of the careful implementation of Twitter’s judgments as to what should count as a “trend.” (...)

Twitter Trends is only one such tool. Search engines, while promising to provide a logical set of results in response to a query, are in fact algorithms designed to take a range of criteria into account so as to serve up results that satisfy not just the user, but the aims of the provider, their understanding of relevance or newsworthiness or public import, and the particular demands of their business model (Granka 2010). When users of Apple’s Siri iPhone tool begin to speculate that its cool, measured voice is withholding information about abortion clinics, or worse, sending users towards alternatives preferred by conservatives, they are in fact questioning the algorithmic product of the various search mechanisms that Siri consults. [link]

Beyond search, we are surrounded by algorithmic tools that offer to help us navigate online platforms and social networks, based not on what we want, but on what all of their users do. When Facebook, YouTube, or Digg offer to mathematically and in real time report what is “most popular” or “liked” or “most viewed” or “best selling” or “most commented” or “highest rated,” they are curating a list whose legitimacy is built on the promise that it has not been curated, that it is the product of aggregate user activity itself. When Amazon recommends a book based on matching your purchases to those of its other customers, or Demand Media commissions news based on aggregate search queries (Anderson 2011), their accuracy and relevance depend on the promise of an algorithmic calculation paired with the massive, even exhaustive, corpus of the traces we all leave.

We might, then, pursue the question of the algorithm’s politics further. The Trends algorithm does have criteria built in: criteria that help produce the particular Trends results we see, criteria that are more complex and opaque than some users take them to be, criteria that could have produced the absence of the term #occupywallstreet that critics noted. But further, the criteria that animate the Trends algorithm also presume a shape and character to the public they intend to measure, and in doing so, help to construct publics in that image.

by Tarelton Gillespie, LIMN |  Read more: