Ethical dilemmas in data journalism

Conversations with Data: #26

Do you want to receive Conversations with Data? Subscribe

Conversations W Data header

To publish or not to publish? It’s a question every data journalist will undoubtedly face at some point in their career.

In this edition of Conversations with Data, we’ve collated your anecdotes and advice into seven tips to help you work through common ethical dilemmas. Whether it’s concerns over data privacy, identifying bias, or providing appropriate context, we’ve got you sorted.

What you said

1. Watch out for noise in your data; it can bias conclusions -- Stijn Debrouwere, course instructor for Bulletproof Data Journalism and Going Viral Using Social Media Analytics

“For me, the big dilemma is always whether we’ve been able to collect enough information, conduct enough analyses and listen to enough viewpoints to warrant publication. It always seems as if there is one more check you could do, one more thing you could investigate.

I once did a story where I tried to gauge how too much tourism can spoil a city for its inhabitants, and which European cities suffer most. But guess what: Every city has a different definition of a city centre, for some cities the most recent tourism numbers are from five years ago and for others there is no official estimate of the population density in that area.

What to do? Personally, I don’t mind that much if data is a bit vague, imprecise, and perhaps not so expertly collected. That’s noise, and different sources of noise tend to cancel each other out. I do get worried when I notice that the noise in the data starts to bias conclusions one way or the other. That’s usually when I decide that I would rather publish nothing than risk misleading the audience.”


Stijn’s course Bulletproof Data Journalism helps reporters protect themselves against all kinds of errors that can arise when working with data.

2. Be just as skeptical with data, as you would be with human sources -- Subramaniam Vincent, Director of Journalism and Media Ethics at the Santa Clara University Markkula Center for Applied Ethics

“Journalists are forever on deadline. One of the ethical challenges in working with data about people is the temptation to take a large dataset at face value and make rich visual representations quickly. But gaps in surveying methods and labeling vocabulary can ignore crucial context from the lives of the people the data is supposed to ‘represent’.

Data journalists could get baseline training in data ethics and exploratory data analysis (EDA) to develop comfort with data-skepticism the same way they do with people as sources. This may help them interpret a dataset's usefulness carefully, target conventional methods (e.g. interviews) to corroborate the story the data may offer prima-facie, or even explore a different story altogether.”

3. When scraping online, think about whether data subjects would’ve consented -- Amy Mowle, PhD Candidate at Victoria University

“I had figured out how to use a web-scraping tool called TWINT to bypass Twitter's API and scrape historical Twitter data, exported into an Excel spreadsheet and ready for analysis.

I took this breakthrough to my ethics officer, to make sure there was nothing untoward about using this method for data collection. While I was aware that webscraping was forbidden by the Twitter Terms of Service (TOS), I had assumed the TOS didn't directly apply to me, since I didn't have a Twitter account and thus wasn't bound by their TOS.

My ethics officer pointed out that those individuals whose data I was scraping did agree to the Twitter TOS, and as a result, were (theoretically) under the impression that their data wouldn't be scraped. By going against the Twitter TOS, I was, in a way, breaching the trust of those whose data I was collecting. While this method of data collection had to ultimately be abandoned, the experience really opened my eyes to the different ways we need to think about consent and participation.”

4. Accept responsibility when things go wrong -- Chiara Sighele, EDJnet Project Manager based at OBC Transeuropa

Europe One Degree Warmer was the first large data driven investigation we carried out as the European Data Journalism Network. The investigation provided a tangible example of global warming in Europe and was successful. However, particular challenges came with producing such a piece of work based on data: How to deal with mistakes when data journalism goes wrong? How to handle transparency and corrections while building trust with our audience? How to put in place more checks to ensure the integrity of the data, whilst respecting the independence of each partner?

After the first round of publications was out, we noticed that the data for some cities was erroneous. We acknowledged the mistake (also on Medium) and fixed it, but as a network, we had to critically reflect on our practices of doing data journalism in a heavily collaborative, transnational, and decentralised setting. We have faced up to the importance of reinventing research and fact-checking procedures when working in a network, and have opened up our database to allow for external scrutiny and reuse.”

0 ofg6s Z F Aw Iyy1g

This graph shows the difference between the correct and erroneous for the Swedish town of Kiruna and its surroundings.

5. Get trained in data security and privacy -- Colin Porlezza, Senior Lecturer at City, University of London and author of a chapter on data journalism and the ethics of open source in Good Data

“Journalism -- and data journalism in particular -- is becoming increasingly networked: Collaborative news work often goes beyond the boundaries of the news organisations and includes actors from outside the institutionalised field of journalism, such as computer scientists, hackers, visualisation experts, activists, or academics. However, such partnerships can entail specific ethical challenges related to the use of (personal) data: Who is in control of the data? With whom can the data be shared? How should the data be protected and secured?

News organisations should not only make sure that sensitive data is securely (electronically) stored, but also that journalists get appropriate training on how to avoid security risks and potential misuse of data due to increased sharing and partnerships. A good place to start could be the recently published guide on data protection and journalism by the UK’s Information Commissioner’s Office – which is, by the way, currently calling for views on a specific data protection and journalism code of practice.”

6. Question the ways that data has been classified -- Brant Houston, Professor and Knight Chair in Investigative Reporting University of Illinois

“A number of years ago one of my graduate students at the University of Missouri was working on a story on guns and acquired a county database on those licensed to carry small firearms. The database contained both those that had been approved or denied but had no specific column for that status. Instead, there was a column with a license number in it if the person had been approved.

If the person had been denied, the short reason for denial was put in place of a license number. In some cases, the phrase listed was ‘mental case’. We could have used that information in connection with the names of the persons, but the judgment on the person's state of mind was being made by an unqualified clerk. We decided to note in the story that ‘mental case’ was one of the reasons for denial but did not connect that reason with a specific person. And we also questioned the use of that phrase.”

7. Beware of the bell curve -- Rebekah McBride, author of Giving data soul: best practices for ethical data journalism

“When reporting on data that is representative of a population, context is key. Interviews often provide that context while strengthening data journalism stories, but interviews can also lead to the misrepresentation of communities and perpetuation of stereotypes if the sources represent only the tail end of the data in the curve. In other words, an interview with only one or two individuals from a community may result in a misrepresentation of the median or average population, and instead highlight only the extremes. Interviewing several members of a community who represent a range of different age groups, socioeconomic backgrounds, and so on, can help reporters to avoid this ethical dilemma. Additionally, becoming embedded in a community and building relationships will help reporters to diffuse an understanding of what it means to be a member of that community.”

Caught in an ethically sticky situation? We’ll be keeping our forums open for you to source extra advice from the community. The Ethical Journalism Network is also working on course to help data journalists critically reflect on ethical challenges throughout the each stage in the reporting process. Watch this space for more.

Our next conversation

Journalists working on crime beats are no stranger to ethical dilemmas. It seems fitting then that our next edition will give these journos a chance to share their experiences with criminal justice storytelling. Let us know your top sources of crime data, unexpected challenges on this beat, and any advice for crime newbies, by starting a discussion in our forums.

As always, don’t forget to let us know what you’d like us to feature in our future editions.

Until next time,

Madolyn from the EJC Data team

subscribe figure