Giving data soul: best practices for ethical data journalism

Preventing data from perpetuating stereotypes and biases

Author and scholar Brené Brown once said, “Stories are just data with a soul”. For centuries, journalists have affirmed this statement by combining data and interviews to create content that informs and inspires change. It is no surprise, then, that data journalism -- which transforms numbers into impactful graphics and enlightening narratives -- has become a popular medium for journalists.

Growing alongside the burgeoning computational technology industry, data journalism has shown its unique ability to provide a wealth of information on topics that affect society. In fact, when published alongside the proper context, data journalism has the power to expose societal disparities in education or aid. In the process, it can bring more voices to the forefront -- an essential step in understanding the current state of affairs and ways to improve it. By holding a magnifying glass to the systemic flaws that permeate our society, data journalism can encourage positive change.

However, when published without context or consideration for ethics, my research suggests that data journalism can cause harm through perpetuating stereotypes and biases.

To comprehend the role of data journalism in combating stereotypes and other systemic issues, I began to study the current impact of data journalism, as well as the ethical guidelines and educational models used within the field. Through interviews with data journalists, sociologists, and scholars, it became clear that some data journalism is playing a role in the perpetuation of systemic issues. Improved education and communication in this area is necessary for data journalists to gain the interpretive and critical thinking skills they need to avoid ethical pitfalls.

Based on interviews and the review of more than 20 publications on the topic, I developed a list of best practices and ethical guidelines for data journalists and journalism educators. My findings and suggestions in this regard are just one step towards a solution. I hope they will not only bring awareness to the issues, but also serve as a catalyst for a larger industry discussion of data journalism ethics and education.

Best practices

1. Review the data to uncover inconsistencies and missing pieces

According to the thesaurus, the word ‘data’ and the word ‘fact’ are synonymous. This seemingly harmless association can cause a great deal of trouble for data journalists who equate data with facts. While peer-reviewed scientific studies and ethically sourced data are great starting points for trustworthy information, the journalist is ultimately responsible for reviewing even the most reliable data to uncover inconsistencies, potential missing pieces and, ultimately, what data is factual.

As data journalist Eric Litke pointed out in an interview:

Data cannot be taken at face value, as doing so requires a litany of assumptions.

He continued, “When such data inconsistencies surface, the journalist at a minimum has to reveal this prominently to the reader, but more likely the data should be limited to a group that is internally consistent or new data should be gathered”.

Identifying inconsistencies and missing data is a crucial step towards responsible and ethical data journalism. It minimises the potential for inaccuracy and ensures that the data is representative of all populations. However, doing so can be a challenge, as it requires journalists to know and understand the methodology used to gather the data, including the risks and benefits of the survey methodology.

“When we talk about surveys there’s a lot of concern about the methods for sampling,” says Director of the Center for Journalism Ethics at the University of Wisconsin-Madison Kathleen Culver. “For example, surveys are often set up for landline phones but we’ve discovered that ethnic minorities are more likely to have cell phones, which leaves them out of these surveys”.

Michelle Robinson, a data analyst for Race to Equity in Wisconsin and a dissertator in the Department of Sociology at the University of Wisconsin-Madison, agrees with Kathleen. However, Michelle points out that there does not seem to be any singular methodology that truly captures the impact behind the numbers.

“There seems to be a mismatch between racial inequality and the types of methodologies I’ve been trained in,” says Michelle. “I’m very skeptical of the by-design reductionist methods that tend to flatten out an interactive environment”.

Given these inherent issues with data collection, journalists should always evaluate the methodology used. Specifically, it is important to reflect on who may have been excluded from the data and what impact, systemic or otherwise, this has on the resulting data.

An illustration of this challenge occurred in 2014 when the Canadian government began gathering data on unemployment rates in an effort to understand the impact of temporary foreign workers on the jobless rates among citizens. The results showed that the jobless rates in Manitoba, Alberta, and Saskatchewan were less than the high employment rate of six percent, suggesting that the presence of temporary foreign workers would not hinder the hiring of citizens in that area. It turns out, however, that the data used to calculate the jobless rates in these areas did not include information from Aboriginal groups. This is a serious gap in the data, considering that Saskatchewan alone hosts more than 30 First Nation reserves. The misstep is clear -- omitting a large number of people from the data collection produced an inaccurate and possibly damaging account of the employment rate in Canada. This had the potential to disproportionately affect a marginalised population.


Credit: The Globe and Mail.

Fortunately, journalists from The Globe and Mail utilised their data interviewing skills to discover that something was missing from the data. This example showcases the power of data to shape policy and the equally powerful role of journalists to dispel stereotypes and ensure the accuracy of reported data.

2. Unpack the concepts

While validating data is a crucial first step, proper contextualisation through systematic reporting is equally essential for ensuring accountability and inclusion. This contextualisation requires journalists to unpack the concepts. In other words -- to go beyond a basic analysis of the figures. Do this by incorporating interviews with the individuals behind the numbers and information on how systemic issues have affected the statistical results. This systemic reporting is particularly important when it comes to publicly available data on the education gap, housing, unemployment, incarceration rates, or other data that can disproportionately affect disadvantaged subgroups and feed into stereotypes.

“I was recently looking at the achievement disparities in education and there were a lot of stories about test scores, particularly in the wake of No Child Left Behind, when many districts realised they had giant gaps between white kids and pretty much everyone else,” says Sue Robinson, Ph.D, a professor at the University of Wisconsin-Madison and author of a book on race, media and education. “So, what you see is an article that discusses the fact that there was a gap, but what we need to see is more understanding of the system. For example, what are the schools doing to eliminate the systemic issues and what are the factors that might be contributing to the gap? We need to have a broader conversation about these issues”.

Though structural constraints on time and resources make this type of reporting difficult, the fact remains: The proper contextualisation of data through systemic reporting is essential to avoid ethical missteps and the oversimplification of complex systemic issues.

This best practice certainly puts a lot of pressure on journalists to fight the perpetuation of stereotypes, but the reality is that data and the way journalists report that data has an impact on societal views, policy decisions, rules and regulations.

In fact, a 2014 study at Stanford University revealed “that exposing people to extreme racial disparities in the prison population heightened their fear of crime and increased acceptance of the very policies that lead to those disparities”.

As Michelle says, “people often come to data with their own preconceived notions, so we need to think ethically about how to approach data”.

While it is difficult to predict the preconceived notions or prejudices of every reader, it is important to remember that they exist. Proper context remains essential in dispelling stereotypes.

Markus spiske 109588 unsplash

Putting data back into context

Data are never neutral ‘givens’, but always situated in a particular context, collected for a particular reason. Learn more in this Long Read by Catherine D'Ignazio, where she provides practical tips on understanding who and what have been excluded from data, and how to identify other contextual influences.

3. Beware the bell curve

Providing context to the data often requires interviews with the people behind the numbers -- both those who gathered the numbers, and those represented by them.

While interviews with individuals can strengthen a data journalism story, they can also lead to the misrepresentation of communities and perpetuation of stereotypes, especially if sources represent only the tail end of the data in the curve. In other words, an interview with only one or two individuals from a community may result in a misrepresentation of the median or average population, and instead highlight only the extremes.

To represent a larger population and avoid the outliers, Michelle suggests interviewing several members of a community who represent a range of different areas, age groups, and financial backgrounds. In other words, showcase what it means to be a member of that data. She also suggests that reporters become “embedded in a community, build relationships with folks and really diffuse an understanding of what it means to be a member of that community”.

In doing so, experts hope data journalism can avoid past issues, such as those faced during the onset of the mass incarceration movement.

“Journalism was really caught flat-footed with the mass incarceration movement,” says Kathleen. “I guarantee that movement was felt in those communities. It was felt economically, it was felt in schools and in relationships, but because largely white newsrooms were not connected to those communities, they weren’t hearing what was happening”.

It is because of cases like these that Kathleen and many other professionals are promoting a call to arms for increased diversity in the newsroom and for journalists to break out of their bubble and engage with individuals and communities outside of their own. Experts agree that these improved community partnerships will help journalists to avoid the oversimplification and misrepresentation of community issues represented in data.

4. Follow the golden rules of journalism

Although data journalism has its own unique set of hurdles, the traditional rules and ethics of reporting still apply. The Society of Professional Journalists Code of Ethics, which encourages journalists to seek truth and minimise harm to those in the news, is just as relevant to the reporting of data as it is to more traditional forms of journalism. Despite the challenges presented by changing newsrooms and ever-expanding datasets, an accountable and ethical press is still the best defence against misinformation.

5. Become educated

Of course, the ethical execution of data journalism requires time, resources, and journalists trained to interview and report on data. Unfortunately, despite the fantastic efforts of many scholars and data journalism organisations, studies indicate that education in this area is still lacking. According to a 2016 report, Teaching Data and Computational Journalism by Charles Berret and Cheryl Phillips, only a small percentage of schools offer education in this area; just 59 of the 113 surveyed institutions hosted one or more data journalism courses. This suggests that scholarship has not yet caught up with the expanding data journalism field.

In addition to increased higher education initiatives, it is also important to consider education for journalists who have already graduated or who have chosen not to pursue a formal education. In that case, short courses, reading material, and other forms of education will be key to promoting an educated and ethical press.


82% of the AEJMC-accredited classes surveyed by Columbia Journalism School in 2016 were taught at an introductory or foundational level.

As Kathleen says, “technology is across the board giving us new ways to tell stories and data journalism is an important development, but if it’s not done ethically it’s a huge problem. We need to be teaching common statistical reporting to all journalism students so that they are not dismissing this as an ‘I can’t do this scenario’”.

“We need to look at, not just how we train journalists, but also how the deadlines are structured and how the job is set up. Right now, it doesn’t encourage systemic reporting”.


Despite the many challenges and ethical considerations, data journalism is the key to an accountable and aware society. By exposing data and providing context, data journalists not only keep organisations and government accountable, but also help to expose a wider, more inclusive view of the world’s population. Through additional educational initiatives and a broader discussion of the issues, data journalists will play a vital role in this new age of investigative reporting. Ultimately, they will create ‘data with a soul’ that informs and inspires change.

Read Rebekah McBride’s full report here.

subscribe figure