Write a response

Debunking with data: insights from fact-checkers around the globe

Name: DataJournalism.com
Price range: $

"Data is the soul of fact-checking"

29 August 2018

By Anim van Wyk, Samar Halarnkar, Matt Martino, Dinda Purnamasari, Tania Roettger

This article was migrated from datadrivenjournalism.net. It has been edited for DataJournalism.com, but some information may still be outdated.

Ever wondered if a politician’s claims really add up? Or perhaps you read a news story which seemed a little fishy? Armed with data, fact-checking organisations across the globe work tirelessly to help separate these facts from fiction, and any misnomers in-between.

To find out more about debunking with data, we gave subscribers to our Conversations with Data newsletter access to a global group of fact-checkers for an exclusive ask me anything.

Can you share some good examples or best cases where data has been successfully used for fact-checking?

Anim van Wyk, Africa Check: Good data aids good fact-checking, which need to point out exactly what the data can and can’t tell you. The more limitations, the less certain the answer becomes.

For example, it’s easy to use data from the World Health Organization’s Global Ambient Air Quality database to rank cities according to their pollution levels. But the fine print shows that these entries aren’t comparable. This is due to differences in the methods and quality of measurements -- and the fact that some cities suspected to be the most polluted don’t report data to the WHO.

Samar Halarnkar, Factchecker.in: Data are (we never use the singular!) the foundation of fact-checking.

One example: The Indian telecommunications minister announced that within a year of taking charge, his administration ensured that the government-run telecoms behemoth, BSNL, had turned a operating profit, after seven years of losses, and had added subscribers. After a meticulous examination of data--including right-to-information requests -- we found that operating profits did not mean the company had turned profitable; indeed net losses had increased, and the minister had, conveniently, not mentioned that more subscribers left than were added.

After a new right-wing government took over in 2014, there were many reports of lynchings, especially of minorities, based on violence related to cows, considered holy by many Hindus. The ruling party and its adherents insisted these were isolated incidents, were never reported before and were not related to the extreme version on Hinduism that they promoted. A debate raged nationwide, poisoning politics and society, made worse by the absence of data--national crime records did not register crimes related to bovines. At Factchecker.in, we created a database of each such crime from 2010 onwards, so that crime patterns could be compared with those after 2014, when the new government took office. Our database--now widely quoted in India and abroad--clearly shows that the overwhelming majority of the victims of such lynchings are minorities, in particular Muslims, and most violence has occurred in states run by India’s ruling party.

Cow Crime — Factchecker.in’s interactive database of cow-related violence in India.

Matt Martino, RMIT ABC Fact Check: Politicians in Australia often like to speak about records, both when attacking opponents and spruiking their achievements. A famous example in our unit was when the ruling Coalition Foreign Minister said that when the Opposition Labor Party were last in government, they bequeathed the ‘worst set of financial accounts’ in Australia’s history to their incoming government. This particular fact-check took several months of work sourcing data from the history books on debt and deficit. We were able to find data on federal government surpluses and deficits, plus gross debt, stretching back to 1901, and on net debt handed over to incoming governments back to the 1970s. It’s a great example of where a claimant has used the raw number in place of a percentage, which puts the figure in historical context. In this case, experts told us that these figures must be expressed as a percentage of GDP to enable historical comparisons. Ultimately, we found that the Foreign Minister’s claim was wrong, as there were far larger (as a percentage of GDP) inherited deficits recorded during WWII, far larger gross debt inherited in the same period, and far larger net debt bequeathed to a government during the 1990s.

Dinda Purnamasari, Tirto.id: Data is the soul of fact-checking. But not just data, more importantly, the context of data itself is what makes our fact-check more reliable.

First, on 2 May 2017, Jake Van Der Kamp, an economist, shared an opinion entitled ‘Sorry President Widodo, GDP rankings are economists’ equivalent of fake news’. At that time, Kamp quoted a statement from President Joko Widodo (Jokowi) that Indonesia’s economic growth was third in the world, after India and China. After this opinion became an issue in Indonesia, Tirto.id decided to verify the data that had been used by Jokowi. We looked at data from the International Monetary Fund (IMF) and based on that we concluded that Indonesia was not in the third position using general criteria, but instead ranks third among BRICS and high populated countries.

Indonesia Econ — A graph from tirto.id’s fact-check, showing that Indonesia is ranked third out of the BRICS countries.

Second, in early August 2018, the Vice Governor claimed that their policy of odd-even traffic limitation had reduced air pollution in Jakarta. His statement became an issue, and even some media quoted his data. We verified the data using measurements from the Indonesian Agency for Meteorology, Climatology and Geophysics (Badan Meteorologi, Klimatologi, dan Geofisika -- BMKG) and the US Embassy. Based on those, his statement was incorrect. The average of air pollution in Jakarta was still high and did not appear to be decreasing.

Tania Roettger, Correctiv/EchtJetzt: Fact-checking only works for statements of fact, not opinions. So, ideally, there is data available to verify claims. We regularly use statistics about topics like crime, HIV-rates or jobs. If there are statistics on a topic, we will consult them. Of course, statistics differ in quality depending on the topic and who gathers the data.

Earlier this year, we debunked the claim that refugees sent 4.2 Billion Euros to their home countries in 2016. Data from the German federal bank showed that the 4.2 Billion Euros in remittances actually came from all migrants working in Germany for more than a year, not specifically from refugees. Most of the money, 3.4 Billion Euros, went to European countries, followed by Asia (491 Million) and Africa (177 Million).

Many use the right data, but the context is incorrect.

Have you seen examples where the same data has been manipulated to support both sides of an argument? If so, how do you ensure that your way of looking at the data isn’t biased?

Anim van Wyk, Africa Check: At Africa Check, we’re fond of the quip that some people use statistics ‘as a drunken man uses lamp posts -- for support rather than illumination’. Depending on what you want to prove, you can cherry-pick data which supports your argument.

An example is different stances on racial transformation in South Africa, or the lack thereof. A member of a leftist political party said in 2015 that “whites are only 10% of the economically active population but occupy more than 60% of the top management positions”. The head of the Free Market Foundation, a liberal think-tank, then wrote: “Blacks in top management… doubled”.

Both were right -- but by presenting only a specific slice of the same data source to support their argument.

Again, you need to find out what the data cannot tell you and try to triangulate by using different data sources.

Africa Check Screenshot — **Fact-check ratings**

Africa Check's ‘mostly correct’ verdict means that a claim contains elements of truth but is either not entirely accurate, according to the best evidence publicly available at the time, or needs clarification.

Matt Martino, RMIT ABC Fact Check: A great example of this was the debate over ‘cuts’ and ‘savings’ to health and education during the early days of the Abbott Coalition government in Australia. The government argued that they were making a ‘saving’ on health and education by reducing the amount spent on what the previous Labor government had budgeted to spend. Labor, now in opposition, argued that this was in fact a cut. We investigated the figures and found that the Coalition was still spending above inflation so it couldn’t be called a cut, but the projections the Coalition had made about savings were over such a long period of time that it was difficult to say whether they would come to pass. In the end we called the debate ‘hot air’.

How do we make sure we’re looking at the data the right way? We always rely on several experts in the field to guide our analysis and tell us the right way to interpret the data. We’re not experts in any of the topics we explore, whilst academics can spend their entire careers researching a single subject, so their advice is invaluable.

Dinda Purnamasari, Tirto.id: In our experience, many use the right data, but the context is incorrect. Then, the data becomes incredible. For example, reports that PT Telkom (state-owned telecommunication company in Indonesia) had provided Corporate Social Responsibility funds of around IDR 100 million to a Mosque and, in comparison, IDR 3.5 billion to a church.

We found that the numbers (IDR100 million and IDR3.5 billion) were right, but the purpose of the funding was incorrect. The 100 million was granted by PT Telkom in 2016 to pay the debt from a mosque renovation process. On the other hand, 3.5 billion was granted to renovate the old church, which also became a cultural heritage site in Nusa Tenggara Barat in 2017.

In this case, again, the context of data becomes an important thing in fact-checking. We must understand the methodology and how the data was gathered or estimated, even by double-checking on the ground, if needed.

Tania Roettger, Correctiv/EchtJetzt: Crime data is a good example. In 2017, crime rates in Germany went down. But this statistic only shows the crimes that have been reported to the police. This has lead some politicians to claim that crime has not actually gone down and that the statistics are ‘fake news’.

When the meaning of data is debated, we consult independent experts to collect arguments about how the data can or should be interpreted. Or, we look at alternative sources. For example, the surveys that some German states conduct with people about the crimes they experienced but did not report (however, the validity of these surveys also is disputed).

Samar Halarnkar, Factchecker.in: In this era of ‘fake news’, data are often used to reinforce biases.

For instance, there was much self congratulation when the government claimed that India’s forests grew by 6,779 sq km over the two years to 2017. We found that this was not incorrect -- it was indeed what the satellite imagery revealed. However, what it did not reveal was that these new ‘forests’ included forests converted to commercial plantations, as well as degraded and fragmented forests, and that the health of these forests was being gauged by satellite imagery with inadequate resolution. In fact, numerous studies had recorded a steady degradation of forests over nearly a century.

Indian remote-sensing satellites produce images with a resolution of 23.5 metres per pixel, which is too coarse to unequivocally identify small-scale deforestation and cannot distinguish between old-growth forests and plantations. To make that distinction, India needs imagery with resolution of 5.8m per pixel.

So, all data are not always what they appear. They need to be verified and cross-checked, either with studies, other databases or ground reporting.

India Forest — Factchecker.in found that this map of forest coverage was not what it seemed. Credit: India’s state of forest report (ISFR) 2017.

How do you fact-check stories or statements when data on an issue isn’t available?

Anim van Wyk, Africa Check: It’s really unsatisfactory to use our ‘unproven’ verdict, but sometimes the evidence publicly available at the time ‘neither proves nor disproves a statement’, as we define this rating. Still, the absence of data doesn’t mean that anything goes in making statements of fact about a topic. We then point out what is known and what isn’t.

Samar Halarnkar, Factchecker.in: If data are not available -- or independently verified data are not available -- there is only one substitute: Verification through old-fashioned, shoe-leather reporting.

For instance, India’s Prime Minister once claimed that his government had built 425,000 toilets within a year. With no independent verification, this claim was hard to dispute. Obviously, it was impossible to verify that 425,000 new toilets had indeed been built in all of India’s schools. But after sending reporters to conduct random verifications in eight Indian states, it quickly became apparent that the Prime Minister’s claim was -- to put it plainly -- a lie.

Matt Martino, RMIT ABC Fact Check: RMIT ABC Fact Check tests the veracity of claims made by politicians and public figures in Australia. If someone is making a claim to influence policy, our position is that they should have good evidence to back it up. Lack of evidence is no excuse so we try and persevere regardless.

Sure, this often leads to less-exciting verdicts, such as ‘unverifiable’ or ‘too soon to know’ but the verdict is not the be-all-and-end-all of a fact-check. In these situations, we explore what data is out there; we consult experts in the field for their opinion, and we present it to the audience as best we can so they can see how we’ve come to our decision.

How RMIT ABC Fact Check finds and checks claims

Dinda Purnamasari, Tirto.id: If the data isn’t available, we will place it as unproven, though this flag is unsatisfactory. But, before we conclude the issue as unproven, we still explain the verification steps that we undertook. This is because we want citizens to understand that, when tirto.id places a claim as unproven, it means we could not find the credible source of the information.

As an example, one of our politicians stated that the LRT development cost for 1 KM was USD 8 billion. After we checked reliable and credible sources, and we couldn’t find the information, then we concluded the issue as unproven.

Tania Roettger, Correctiv/EchtJetzt: Knife crime on the rise is a recent story, but the federal crime statistics do not list crimes committed with knives as a special category. Some states in Germany do, but among them, they differ in what they count as knife crime. That definitely does not make our work easier.

In cases like this, we source as much information for a claim as is available. If it turns out that the material is not sufficient to verify or debunk the claim, we list what is known and clearly state what is missing. If there is no convincing tendency, we give the rating ‘unproven’. But it is important to keep in mind that those making a claim also carry a burden of proof -- if one makes a statement of fact, it needs to be based on evidence. This is one of the things we’re trying to show with our work.

Are there any established guidelines for determining the reliability of a data source? How does your organisation determine which data is appropriate to use?

Samar Halarnkar, Factchecker.in: We do not have established guidelines. In general, we consider if the data source is reliable. Sometimes, it might not be entirely reliable. For example, a government source, in which case we use the data but cross check with experts, independent studies, and/or our own checks. Some public databases are largely reliable: for instance, government-run databases on health, farming and education. We do not consider those data that have previously proven to be compromised or are doubtful.

Matt Martino, RMIT ABC Fact Check: We don’t have any hard rules around it, but generally the source should be a non-partisan organisation. In Australia, we rely heavily on data from the Australian Bureau of Statistics, which is a government organisation with a reputation for providing objective data on a range of issues. This is an example of a good source.

When considering a source, it’s always pertinent to ask: ‘what is their agenda?’ If their motivations for providing data might influence the data in a partisan way, it’s best to leave it alone. As always, it’s a good idea to consult experts in the field on what is the best source to use in verifying a claim.

Dinda Purnamasari, Tirto.id: Since we already know that every data has their own nature, such as context and methodology, we have established a standard for the secondary data that is used. Our first level of the source comes from the Government Statistics Bureau, Ministry/Local Government, company financial reports and the stock exchange. As a second layer, we use world organisations, verified and credible journals, consultants and research companies, as well the national or high reputation news agencies. Although we have this standard, we also cross-check information by consulting with experts in the field, so that we use the best sources.

Tania Roettger, Correctiv/EchtJetzt: When we’re investigating a claim, one task is to understand what exactly a given piece of data is able to tell. We establish how and why it was collected, what it contains and it excludes. Usually we note the shortcomings of a statistic in the article. Whenever we are uncertain about the evidence we have gathered, we discuss the issue among our team.

Anim van Wyk, Africa Check: There’s no way round studying the methodology by which the data is collected. This must then be discussed with experts to get their input. And all data sources, even those considered reliable, have limitations, which has to be highlighted.

What do you think about the potential of automated fact-checking?

Samar Halarnkar, Factchecker.in: I am sure it has immense potential, but this requires coding expertise that we do not currently have.

Tania Roettger, Correctiv/EchtJetzt: There are several ways in which automation could help the fact-checking process: extracting fact-checkable claims from speeches or sourcing relevant statistics and documents from a data-pool, for example. But so far we have not experienced or heard of a tool that would do our work for us.

Automated Fact — An overview of how automation could aid fact-checking from Understanding the promise and limits of automated fact-checking, by Lucas Graves.

Matt Martino, RMIT ABC Fact Check: It’s an interesting area, but one which is currently undercooked. Parsing language is a big part of what we do at Fact Check, and machines are not yet capable of interpreting a great deal of the nuance in language. That being said, anything that allows greater access to the facts in a debate for audiences would be a good thing.

One area where there is already enormous potential is in searching for and identifying potential claims to check key data on government websites, such as Hansard and budget papers.

I think that, like a lot of AI, there’s a long way to go, and we’ll be watching this space intently.

Anim van Wyk, Africa Check: The tools I’ve seen are helpful in monitoring important sources for claims to fact-check, such as transcripts from parliament. But I’m quite hesitant about fact-checks without any human intervention, as nuance plays such a big role. The potential of getting it completely wrong when you are the one claiming to be correcting claims is not worth the potential credibility loss, in my opinion.

What are some of your go-to data tools?

Anim van Wyk, Africa Check: You can’t beat a good old spreadsheet. For illustration purposes, we keep it simple by using Datawrapper.

Samar Halarnkar, Factchecker.in: We use Tabula for extracting tables from PDFs. For analysis, we depend on Excel/Google Sheets and Tableau depending on the size and type of the dataset. For visualisation, we work primarily with Google Sheets, Datawrapper, Infogram and Tableau. We also use Google My Maps and CartoDB for some maps.

Matt Martino, RMIT ABC Fact Check: We use Excel or Google spreadsheets for simple analyses. For more complex ones I use R Studio, which is more powerful and can handle much larger datasets. It requires coding knowledge, but the training is well worth it.

In terms of visualisation, we’ve tried many different platforms throughout the years, but Tableau Public has emerged as our go-to. Its abilities in formatting, design, calculation and visualisation are pretty much unrivalled, in my opinion, and we’ve been able to create really interesting and rich visualisations using the platform, like those seen here and here.

Dinda Purnamasari, Tirto.id: For analysis, we use Excel, SPSS, and other statistical tools. It really depends on the purpose, size and type of our data and analysis. For visualisation, we use Adobe Illustrator and Datawrapper.

Debunking with data: insights from fact-checkers around the globe - "Data is the soul of fact-checking"

17 min Click to comment

Longform reads

Verification Handbook

Data Journalism Handbook 2

New course

Quality journalism

Countering hate speech

New course

Video course

Fundamental search for journalists

Popular course

Coding

Python for journalists