Tuesday, March 11, 2014

Holding Algorithms Accountable


[ed. Dispatches from the 2014 conference for NICAR (National Institute of Computer Assisted Reporting)]

Chase kicks off the panel with an observation he’s made over years of working in data science. “When you’re talking about algorithms used for decision making and predictive modeling, models you build, by definition, have to be imperfect.” He says these models aren’t trying to catch all these edge cases, but to capture the “general gist of the data you’re working with.” If you focus too much on edge cases, it leads to overfitting, which can reduce the value of the model as the whole.

“When you consider the models that people are using are imperfect and they’re becoming increasingly important,” Chase says. “Journalists have an important role in exposing those models and holding them accountable.”

Now, we meet the panelists: first up is Jeremy, who worked on a project for the Wall Street Journal on online pricing algorithms used by Staples and other online vendors. Next is Nicholas, who has been researching how algorithms might be discriminatory, what kind of mistakes they can make, etc — essentially, “algorithms as sources of power.” Frank, who teaches at the Maryland Law School, is interested in helping journalists understand the legal aspects of data and “how the law can be changed to allow more of this work [data journalism] to be done.”

Chase says maybe algorithms aren’t completely to blame. From a reporting perspective, there’s a split responsibility between the algorithm and the institution that chooses to trust the algorithm. He asks the panel: should we focus more on exposing the algorithmic layer or the institutional layer?

Frank brings up an example of S&P failing to update their data set promptly — most of the reporting was focused on the failure of the algorithm. Nicholas throws out a bunch of questions reporters should ask about algorithms. “How are they making mistakes? Who do those mistakes affect? Who are the stakeholders? How might algorithms be censoring or discriminatory?” This covers a wide swath of reporting, from the most abstract features of algorithms down to the nitty gritty.

“If you’re going to talk about responsibility — and it’s tricky — it’s all about the human level,” says Jeremy.

Nicholas adds a corollary to the correlation doesn’t equal causation argument. “Correlation doesn’t equal intent. Just because there’s a correlation, doesn’t mean a designer sat down and intended for that to happen.” He noted that the predictive models used by the Chicago Police could show a correlation with race, but that doesn’t necessarily reflect the intent of the analysts. He says journalists have to be careful in claiming there’s a specific intent to an observed algorithm. “Really understanding the design process behind algorithms can shed some light into their intents.” (...)

Chase draws an analogy between penetrating the black box of an algorithm and the black box of an institution. How do we navigate these things? Nicholas brings up the trade secret exemptions to FOIA law that ensures to opaqueness of the algorithms the government might use through third-parties. If these technologies are patented, there’s a chance that we can start to understand them, but “at its core, it’s very opaque.” Reporters can start to connect the dots, but “correlation is a weak form of connection.”

Jeremy says that the inability to explain an algorithm may be a story in itself. Frank agrees, bringing up a the example of company personality tests. There’s very little direct relationship between your answers and your productivity as an employee; they’re essentially matching large data sets of what the best employees in the past have said.

by Stephen Suen, MIT Center for Civic Media |  Read more:
Image via: