How to watchdog algorithms

Q&A with Nick Diakopoulos


Algorithms are powerful. They can decide who goes to jail, who gets offered employment opportunities, what videos are suitable for children to watch, and more.

But, with such power, what happens when things go wrong?

This is a question that Nick Diakopoulos wants journalists to explore. In his Data Journalism Handbook 2 chapter, The Algorithms Beat, he shows how to refocus the traditional watchdog role of journalists and hold powerful algorithms to account.

It's a technical and emerging field of reporting, so we gave subscribers to our Conversations with Data newsletter the opportunity to ask Nick questions on the subject. A roundup of their questions and Nick's answers are below.

Watchdog journalists can hold powerful people to account by catching them out doing something wrong. How is reporting on algorithmic power different?

Watchdogging algorithmic power is conceptually quite similar to watchdogging other powerful actors in society. With algorithms you’re looking out for wrongdoings like discriminatory decisions, individually or societally impactful errors, violations of laws or social norms, or, in some cases, misuse by the people operating the algorithmic system. What’s different is that algorithms can be highly technical and involve machine learning methods that themselves can be difficult to explain, they can be difficult to access because they’re behind organisational boundaries and often can’t be compelled through public records requests, and they’re capricious, and can change on a dime.

How can journalists investigate algorithms when the methodology used to create them is kept secret?

There are a growing number of approaches that are being developed to investigate algorithms such as reverse engineering, auditing, and crowdsourcing. But the basic idea is that every algorithm has an input and an output. If you have access to the algorithm, you can poke the inputs in a million different ways and then look at the outputs to see how the system reacted. You can test things like bias this way, or look for mistakes in the inputs when you know what the output should be. You can also do lower-stakes critiques of algorithms this way.

One of my favourites in this genre is a story from Katie Notopoulos at Buzzfeed. Katie illustrates a recent change to the Facebook Newsfeed algorithm -- not by reverse engineering it, but by simply poking it and telling a story about how it reacted, and the problems exposed by that reaction.

CURIOUS-4

Facebook uses controversial algorithms to surface content via its News Feed.

It seems like all the content on Google Alerts these days is automated. Beyond reporting on algorithms, how can journalists investigate content aggregator algorithms so that more human-written stories are picked up?

Investigating content aggregation algorithms is something we are very interested in at the Computational Journalism Lab at Northwestern University. We have published a chapter looking at biases in the sources that surfaced from Google searches for 'top stories', and images, particularly in relation to candidates during the 2016 US elections. In Google’s ‘top stories’ we found a high concentration of impressions (44%) to just two outlets: CNN and the New York Times. In general, there’s a lot more auditing that could be done around content aggregation sites to understand their diversity and the impact of personalisation, including Google Alerts (as the question alludes to), and other algorithmically driven news curators.

Google

Dis- and mis-information is a widespread phenomena and a method of influencing in today's internet environment. How should journalists confront dis- and misinformation, and what is the role of artificial intelligence (AI) in these efforts?

Combatting dis- and mis-information online needs an all-hands-on-deck approach. That means strategically using algorithms and experts, like journalists, to sort the disinformation from the misinformation, and everything in-between. Journalists need more computational tools to sort out synthesised (that is, fake or fabricated) videos and other media. However, they also need more training, and to wise-up to avoid being duped by astroturfing and bot-driven manipulation of ideas online. AI is not going to magically solve the problem. But it can inform well-trained journalists and help them work on a larger scale, to do more verification and fact-checking of online media.

To say what is ‘true’ or ‘not true’ is usually easier than detecting ‘manipulated’ information, or ‘unbalanced’ news, which could be a matter of opinion. Could AI-related algorithms eventually be more effective than journalists in the fight against fake news? Or, do they also have potential to be biased?

I do not think AI or algorithms are going to be more effective than journalists in the fight for truth.

Algorithms are always limited by what is quantified about the world. And, they are not yet smart enough to ask questions about what they don’t know.

Oftentimes, not-yet-quantified contextual information is essential to understanding what is true and what is not. So, algorithms are always going to be limited in how far they can go to debunk information online. The key here is knowing the limits of the algorithms. We have to feather this into the expertise of computational journalists to make the hybrid system more than the sum of its parts.

Given that data journalists often use algorithms in their reporting, do fact-checkers need to investigate these algorithms too?

Investigative journalists are some of the most methodologically careful people I know. They often make an effort to open-source their methods so that others can inspect them. This kind of algorithmic transparency is important -- if we are to thoroughly trust evidence, we should be able to see how that evidence was produced. I believe that fact-checkers should always get to the bottom of how a fact or any piece of knowledge is known. If a fact was produced from a bit of code running some machine learning model, they should check that too. Open-sourcing models make it easier to verify them.

Markus spiske 109588 unsplash

The Data Journalism Handbook 2 has a dedicated section on investigating data, platforms, and algorithms. Expand your skill set with Nick's landmark chapter The Algorithms Beat: Angles and Methods for Investigation and Christina Elmer's chapter Algorithms in the Spotlight: Collaborative Investigations at Spiegel Online.

What level of technical ability do you need to effectively investigate algorithms? Can non-data journalists also investigate them?

Non-data journalists can absolutely participate in algorithmic accountability reporting! Daniel Trielli and I wrote a piece in 2017 outlining some useful techniques, like looking at who owns an algorithm and how it’s procured, and reporting on the scale and magnitude of potential impact. That said, there’s a lot more you can flesh out from algorithms if you approach them with the eye of a computational or data journalist. This enables you to really start reverse engineering them, or auditing them for bias and discrimination. Those types of studies often benefit from advanced statistics techniques, and from knowing what algorithms could do. This allows you to orient an investigation towards what might go wrong.

Like what you read? Subscribe to our Conversations with Data newsletter for more tips like these and the exclusive opportunity to ask other experts questions about data journalism.

subscribe figure