The Algorithms Beat: Angles and Methods for Investigation
The Machine Bias series from ProPublica began in May 2016 as an effort to investigate algorithms in society.1 Perhaps most striking in the series was an investigation and analysis exposing the racial bias of recidivism risk assessment algorithms used in criminal justice decisions.2 These algorithms score individuals based on whether they are a low or high risk of reoffending. States and other municipalities variously use the scores for managing pre-trial detention, probation, parole, and sometimes even sentencing. Reporters at ProPublica filed a public records request for the scores from Broward County in Florida and then matched those scores to actual criminal histories to see whether an individual had actually recidivated (i.e. reoffended) within two years. Analysis of the data showed that black defendants tended to be assigned higher risk scores than white defendants, and were more likely to be incorrectly labeled as high risk when in fact after two years they hadn’t actually been rearrested.3
Scoring in the criminal justice system is of course just one domain where algorithms are being deployed in society. The Machine Bias series has since covered everything from Facebook’s ad targeting system, to geographically discriminatory auto insurance rates, and unfair pricing practices on Amazon.com. Algorithmic decision making is increasingly pervasive throughout both the public and private sectors. We see it in domains like credit and insurance risk scoring, employment systems, welfare management, educational and teacher rankings, and online media curation, among many others.4 Operating at scale and often impacting large swaths of people, algorithms can make consequential and sometimes contestable calculation, ranking, classification, association, and filtering decisions. Algorithms, animated by piles of data, are a potent new way of wielding power in society.
As ProPublica’s Machine Bias series attests, a new strand of computational and data journalism is emerging to investigate and hold accountable how power is exerted through algorithms. I call this algorithmic accountability reporting, a re-orientation of the traditional watchdog function of journalism towards the power wielded through algorithms.5 Despite their ostensible objectivity, algorithms can and do make mistakes and embed biases that warrant closer scrutiny. Slowly, a beat on algorithms is coalescing as journalistic skills come together with technical skills to provide the scrutiny that algorithms deserve.
There are, of course, a variety of forms of algorithmic accountability that may take place in diverse forums beyond journalism, such as in political, legal, academic, activist, or artistic contexts.6 But my focus is this chapter is squarely on algorithmic accountability reporting as an independent journalistic endeavor that contributes to accountability by mobilizing public pressure. This can be seen as complementary to other avenues that may ultimately also contribute to accountability, such as by developing regulations and legal standards, creating audit institutions in civil society, elaborating effective transparency policies, exhibiting reflexive art shows, and publishing academic critiques.
In deciding what constitutes the beat in journalism, it’s first helpful to define what’s newsworthy about algorithms. Technically speaking, an algorithm is a sequence of steps followed in order to solve a particular problem or to accomplish a defined outcome. In terms of information processes the outcomes of algorithms are typically decisions. The crux of algorithmic power often boils down to computers’ ability to make such decisions very quickly and at scale, potentially affecting large numbers of people. In practice, algorithmic accountability isn’t just about the technical side of algorithms though—algorithms should be understood as composites of technology woven together with people such as designers, operators, owners, and maintainers in complex sociotechnical systems.7 Algorithmic accountability is about understanding how those people exercise power within and through the system, and are ultimately responsible for the system’s decisions. Oftentimes what makes an algorithm newsworthy is when it somehow makes a “bad” decision. This might involve an algorithm doing something it wasn’t supposed to do, or perhaps not doing something it was supposed to do. For journalism, the public significance and consequences of a bad decision are key factors. What’s the potential harm for an individual, or for society? Bad decisions might impact individuals directly, or in aggregate may reinforce issues like structural bias. Bad decisions can also be costly. Let’s look at how various bad decisions can lead to news stories.
Angles on Algorithms
In observing the algorithms beat develop over the last several years in journalism, as well as through my own investigations of algorithms, I’ve identified at least four driving forces that appear to underlie many algorithmic accountability stories: (1) discrimination and unfairness, (2) errors or mistakes in predictions or classifications, (3) legal or social norm violations, and (4) misuse of algorithms by people either intentionally or inadvertently. I provide illustrative examples of each of these in the following subsections.
Discrimination and Unfairness
Uncovering discrimination and unfairness is a common theme in algorithmic accountability reporting. The story from ProPublica that led this chapter is a striking example of how an algorithm can lead to systematic disparities in the treatment of different groups of people. Northpoint, the company that designed the risk assessment scores (since renamed to Equivant), argued the scores were equally accurate across races and were therefore fair. But their definition of fairness failed to take into account the disproportionate volume of mistakes that affected black people. Stories of discrimination and unfairness hinge on the definition of fairness applied, which may reflect different political suppositions.8
I have also worked on stories that uncover unfairness due to algorithmic systems—in particular looking at how Uber pricing dynamics may differentially affect neighborhoods in Washington, DC.9 Based on initial observations of different waiting times and how those waiting times shifted based on Uber’s surge pricing algorithm we hypothesized that different neighborhoods would have different levels of service quality (i.e. waiting time). By systematically sampling the waiting times in different census tracts over time we showed that census tracts with more people of color tend to have longer wait times for a car, even when controlling for other factors like income, poverty rate, and population density in the neighborhood. It’s difficult to pin the unfair outcome directly to Uber’s technical algorithm because other human factors also drive the system, such as the behavior and potential biases of Uber drivers. But the results do suggest that when considered as a whole, the system exhibits disparity associated with demographics.
Errors and Mistakes
Algorithms can also be newsworthy when they make specific errors or mistakes in their classification, prediction, or filtering decisions. Consider the case of platforms like Facebook and Google which use algorithmic filters to reduce exposure to harmful content like hate speech, violence, and pornography. This can be important for the protection of specific vulnerable populations, like children, especially in products like Google’s YouTube Kids which are explicitly marketed as safe for children. Errors in the filtering algorithm for the app are newsworthy because they mean that sometimes children encounter inappropriate or violent content.10 Classically, algorithms make two types of mistakes: false positives and false negatives. In the YouTube Kids scenario, a false positive would be a video mistakenly classified as inappropriate when actually it’s totally fine for kids. A false negative is a video classified as appropriate when it’s really not something you want kids watching.
Classification decisions impact individuals when they either increase or decrease the positive or negative treatment an individual receives. When an algorithm mistakenly selects an individual to receive free ice cream (increased positive treatment), you won’t hear that individual complain (although when others find out, they might say it’s unfair). Errors are generally newsworthy when they lead to increased negative treatment for a person, such as by exposing a child to an inappropriate video. Errors are also newsworthy when they lead to a decrease in positive treatment for an individual, such as when a person misses an opportunity. Just imagine a qualified buyer who never gets a special offer because an algorithm mistakenly excludes them. Finally, errors can be newsworthy when they cause a decrease in warranted negative attention. Consider a criminal risk assessment algorithm mistakenly labeling a high-risk individual as low-risk—a false negative. While that’s great for the individual, this creates a greater risk to public safety by letting free an individual who goes on to commit a crime again.
Legal and Social Norm Violations
Predictive algorithms can sometimes test the boundaries of established legal or social norms, leading to other opportunities and angles for coverage. Consider for a moment the possibility of algorithmic defamation.11 Defamation is defined as “a false statement of fact that exposes a person to hatred, ridicule or contempt, lowers him in the esteem of his peers, causes him to be shunned, or injures him in his business or trade.”12 Over the last several years there have been numerous stories, and legal battles, over individuals who feel they’ve been defamed by Google’s autocomplete algorithm. An autocompletion can link an individual’s or company’s name to everything from crime and fraud to bankruptcy or sexual conduct, which can then have consequences for reputation. Algorithms can also be newsworthy when they encroach on social norms like privacy. For instance, Gizmodo has extensively covered the “People You May Know” (PYMK) algorithm on Facebook, which suggests potential “friends” on the platform that are sometimes inappropriate or undesired.13 In one story, reporters identified a case where PYMK outed the real identity of a sex worker to her clients.14 This is problematic not only because of the potential stigma attached to sex work, but also out of fear of clients who could become stalkers.
Defamation and privacy violations are only two possible story angles here. Journalists should be on the lookout for a range of other legal or social norm violations that algorithms may create in various social contexts. Since algorithms necessarily rely on a quantified version of reality that only incorporates what is measurable as data they can miss a lot of the social and legal context that would otherwise be essential in rendering an accurate decision. By understanding what a particular algorithm actually quantifies about the world—how it “sees” things – it can inform critique by illuminating the missing bits that would support a decision in the richness of its full context.
Algorithmic decisions are often embedded in larger decision-making processes that involve a constellation of people and algorithms woven together in a sociotechnical system. Despite the inaccessibility of some of their sensitive technical components, the sociotechnical nature of algorithms opens up new opportunities for investigating the relationships that users, designers, owners, and other stakeholders may have to the overall system.15 If algorithms are misused by the people in the sociotechnical ensemble this may also be newsworthy. The designers of algorithms can sometimes anticipate and articulate guidelines for a reasonable set of use contexts for a system, and so if people ignore these in practice it can lead to a story of negligence or misuse. The risk assessment story from ProPublica provides a salient example. Northpointe had in fact created two versions and calibrations of the tool, one for men and one for women. Statistical models need to be trained on data reflective of the population where they will be used and gender is an important factor in recidivism prediction. But Broward County was misusing the risk score designed and calibrated for men by using it for women as well.16
How to Investigate an Algorithm
There are various routes to the investigation of algorithmic power: no single approach will always be appropriate. But there is a growing stable of methods to choose from, including everything from highly technical reverse engineering and code inspection techniques, to auditing using automated or crowdsourced data collection, or even low-tech approaches to prod and critique based on algorithmic reactions.17 Each story may require a different approach depending on the angle and the specific context, including what degree of access to the algorithm, its data, and code is available. For instance, an exposé on systematic discrimination may lean heavily on an audit method using data collected online, whereas a code review may be necessary to verify the correct implementation of an intended policy.18 Traditional journalistic sourcing to talk to company insiders such as designers, developers, and data scientists, as well as to file public records requests and find impacted individuals are as important as ever. I can’t go into depth on all of these methods in this short chapter, but here I want to at least elaborate a bit more on how journalists can investigate algorithms using auditing.
Auditing techniques have been used for decades to study social bias in systems like housing markets, and have recently been adapted for studying algorithms.19 The basic idea is that if the inputs to algorithms are varied in enough different ways, and the outputs are monitored, then inputs and outputs can be correlated to build a theory for how the algorithm may be functioning.20 If we have some expected outcome that the algorithm violates for a given input this can help tabulate errors and see if errors are biased in systematic ways. When algorithms can be accessed via APIs or online webpages output data can be collected automatically.21 For personalized algorithms, auditing techniques have also been married to crowdsourcing in order to gather data from a range of people who may each have a unique “view” of the algorithm. AlgorithmWatch in Germany has used this technique effectively to study the personalization of Google Search results, collecting almost 6 million search results from more than 4,000 users who shared data via a browser plugin (as discussed further by Christina Elmer in her chapter in this book).22 Gizmodo has used a variant of this technique to help investigate Facebook’s PYMK. Users download a piece of software to their computer that periodically tracks PYMK results locally to the user’s computer, maintaining their privacy. Reporters can then solicit tips from users who think their results are worrisome or surprising.23
Auditing algorithms is not for the faint of heart. Information deficits limit an auditor’s ability to sometimes even know where to start, what to ask for, how to interpret results, and how to explain the patterns they’re seeing in an algorithm’s behavior. There is also the challenge of knowing and defining what’s expected of an algorithm, and how those expectations may vary across contexts and according to different global moral, social, cultural, and legal standards and norms. For instance, different expectations for fairness may come into play for a criminal risk assessment algorithm in comparison to an algorithm that charges people different prices for an airline seat. In order to identify a newsworthy mistake or bias you must first define what normal or unbiased should look like. Sometimes that definition comes from a data-driven baseline, such as in our audits of news sources in Google search results during the 2016 U.S. elections.24 The issue of legal access to information about algorithms also crops up and is of course heavily contingent on the jurisdiction.25 In the U.S., Freedom of Information (FOI) laws govern the public’s access to documents in government, but the response from different agencies for documents relating to algorithms is uneven at best.26 Legal reforms may be in order so that public access to information about algorithms is more easily facilitated. And if information deficits, difficult to articulate expectations, and uncertain legal access aren’t challenging enough, just remember that algorithms can also be quite capricious. Today’s version of the algorithm may already be different than yesterday’s: as one example, Google typically changes its search algorithm 500-600 times a year. Depending on the stakes of the potential changes, algorithms may need to be monitored over time in order to understand how they are changing and evolving.
Recommendations Moving Forward
To get started and make the most of algorithmic accountability reporting I would recommend three things. Firstly, we’ve developed a resource called Algorithm Tips, which curates relevant methods, examples, and educational resources, and hosts a database of algorithms for potential investigation (first covering algorithms in the U.S. Federal government and then expanded to cover more jurisdictions globally)27. If you’re looking for resources to learn more and help get a project off the ground, that could be one starting point.28 Secondly, focus on the outcomes and impacts of algorithms rather than trying to explain the exact mechanism for their decision making. Identifying algorithmic discrimination (i.e., an output) oftentimes has more value to society as an initial step than explaining exactly how that discrimination came about. By focusing on outcomes, journalists can provide a first-order diagnostic and signal an alarm which other stakeholders can then dig into in other accountability forums. Finally, much of the published algorithmic accountability reporting I’ve cited here is done in teams, and with good reason. Effective algorithmic accountability reporting demands all of the traditional skills journalists need in reporting and interviewing, domain knowledge of a beat, public records requests and analysis of the returned documents, and writing results clearly and compellingly, while often also relying on a host of new capabilities like scraping and cleaning data, designing audit studies, and using advanced statistical techniques. Expertise in these different areas can be distributed among a team, or with external collaborators, as long as there is clear communication, awareness, and leadership. In this way, methods specialists can partner with different domain experts to understand algorithmic power across a larger variety of social domains.
Julia Angwin, Jeff Larson, Surya Mattu and Lauren Kirchner, ‘Machine Bias’, ProPublica, May
Jeff Larson, Surya Mattu, Lauren Kirchner and Julia Angwin, ‘How We Analyzed the COMPAS Recidivism Algorithm’, ProPublica, May 2016.
Cathy O’Neil, ‘Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy’, Broadway Books, 2016.
Frank Pasquale, ‘The Black Box Society: The Secret Algorithms That Control Money and Information’, Harvard University Press, 2015.
Virginia Eubanks, ‘Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor’, St. Martin’s Press, 2018.
Nicholas Diakopoulos, ‘Algorithmic Accountability: Journalistic Investigation of Computational Power Structures’, Digital Journalism 3 (3), (2015).
Nicholas Diakopoulos, ‘Sex, Violence, and Autocomplete Algorithms’, Slate, August, 2013.
Nicholas Diakopoulos, ‘Rage Against the Algorithms’, The Atlantic, October, 2013.
Tega Brain and Surya Mattu, ‘Algorithmic Disobedience’ n.d.
Taina Bucher, ‘If… Then: Algorithmic Power and Politics’, Oxford University Press, 2018.
Nick Seaver, ‘Algorithms as culture: Some tactics for the ethnography of algorithmic systems’, Big Data & Society, 4(2), (2017).
Mike Ananny, ‘Toward an Ethics of Algorithms’, Science, Technology & Human Values 41 (1), (2015).
Bruno Lepri, et al., ‘Fair, Transparent, and Accountable Algorithmic Decision-making Processes’, Philosophy & Technology, 84(3), (2017).
Jennifer Stark and Nicholas Diakopoulos, ‘Uber seems to offer better service in areas with more white people. That raises some tough questions’, Washington Post, March 2016.
Sapna Maheshwari, ‘On YouTube Kids, Startling Videos Slip Past Filters’, New York Times, November 2017.
Nicholas Diakopoulos, ‘Algorithmic Defamation: The Case of the Shameless Autocomplete’, Tow Center, August 2013.
Seth C. Lewis, Kristin Sanders, Casey Carmody, ‘Libel by Algorithm? Automated Journalism and the
Threat of Legal Liability’, Journalism & Mass Communication Quarterly 80(1), (2018).
Kashmir Hill, ‘How Facebook Figures Out Everyone You've Ever Met’, Gizmodo, November 2017.
Kashmir Hill, ‘How Facebook Outs Sex Workers’, Gizmodo, October 2017.
Daniel Trielli and Nicholas Diakopoulos, ‘How To Report on Algorithms Even If You’re Not a Data Whiz’, Columbia Journalism Review, May 2017.
Jeff Larson, ‘Machine Bias with Jeff Larson’, Data Stories Podcast, October, 2016.
Nicholas Diakopoulos, ‘Automating the News: How Algorithms are Rewriting the Media’, Harvard University Press, (2019).
Nicholas Diakopoulos, ‘Enabling Accountability of Algorithmic Media: Transparency as a Constructive and Critical Lens’, in Towards glass-box data mining for Big and Small Data, ed by Tania Cerquitelli, Daniele Quercia and Frank Pasquale, (Springer, June 2017), pp 25-44 .
Colin Lecher, ‘What Happens When An Algorithm Cuts Your Health Care’, The Verge,
Steven Michael Gaddis, ‘An Introduction to Audit Studies in the Social Sciences’, in Audit Studies Behind the Scenes with Theory, Method, and Nuance, ed. by Michael Gaddis, (Springer, 2017), pp 3-44.
Christian Sandvig, et al, ‘Auditing algorithms: Research methods for detecting discrimination on Internet platforms’, presented at International Communication Association preconference on Data and Discrimination Converting Critical Concerns into Productive Inquiry, (2014).
Nicholas Diakopoulos, ’Algorithmic Accountability: Journalistic Investigation of Computational Power Structures’, Digital Journalism 3 (3), (2015).
Valentino-DeVries, Jennifer, Jeremy Singer-Vine, and Ashkan Soltani, ‘Websites Vary Prices, Deals Based on Users’ Information’, Wall Street Journal, 24 December 2012.
Kashmir Hill and Surya Mattu, ‘Keep Track Of Who Facebook Thinks You Know With This Nifty Tool’, Gizmodo, January 2018.
Nicholas Diakopoulos, Daniel Trielli, Jennifer Stark and Sean Mussenden, ‘I Vote For – How Search Informs Our Choice of Candidate’, in Digital Dominance: The Power of Google, Amazon, Facebook, and Apple, eds by M. Moore and D. Tambini, June 2018.
Esha Bhandari and Rachel Goodman, ‘Data Journalism and the Computer Fraud and Abuse Act: Tips for Moving Forward in an Uncertain Landscape’, Computation + Journalism Symposium, (2017).
Nicholas Diakopoulos, ‘We need to know the algorithms the government uses to make important decisions about us’, The Conversation, May 2016.
Katherine Fink, ‘Opening the government’s black boxes: freedom of information and algorithmic
accountability’, Digital Journalism 17(1), (2017).
Robert Brauneis and Ellen Goodman, ‘Algorithmic Transparency for the Smart City’, 20 Yale Journal of Law & Technology 103, 2018.
Daniel Trielli, Jennifer Stark and Nicholas Diakopoulos, ‘Algorithm Tips: A Resource for Algorithmic Accountability in Government’, Computation + Journalism Symposium, October 2017.