Verification for data

Conversations with Data: #46

Do you want to receive Conversations with Data? Subscribe


When using data as a source for your stories, verification is vital. Today's journalists face the challenge of sifting through enormous amounts of information from social media and open data portals. But how do you decipher what is true and accurate in the digital age?

Joining us for our first-ever Conversations with Data podcast is Craig Silverman, Buzzfeed's media editor and one of the world's leading experts on online misinformation and content verification. He talks to us about verifying the numbers along with his experience of working with Buzzfeed's data team. Listen to the full 30-minute interview with Craig, or read the edited version of our Q&A with him below.

What we asked

How did you get started in verification?

The journey began in 2004. At that time, I was a freelance journalist living in Montreal, and I started a media blog called Regret The Error. Initially, it focussed on finding the best of the worst mistakes and corrections made by journalists and media organisations. I was collecting hundreds, thousands and then tens of thousands of these corrections. I started to connect the dots and research the discipline of verification.

As journalists, it's our job to get things right. Verification is at the core of what we do. But how do you actually do it and is it ever taught? At that point, and even to a certain extent still today, there aren't usually verification courses in journalism schools. The blog evolved from corrections and transparency to accuracy. Later it focussed on the discipline of verification and how we can be accountable as journalists to the public, to our audiences, and to our sources.

The terms 'misinformation' and 'disinformation' seem confusing. What's the difference?

This is a by-product of fake news becoming a term that is widely used but is now kind of meaningless. When people say fake news, they can mean very different things. I try to avoid using it unless it is in the context of specifically talking about that term and what's happened to it.

I credit Claire Wardle and the folks at First Draft who have done some work on trying to clarify the terms. The definition of disinformation is best given as false information that is created to deceive. Misinformation is about the accidental spread of false or misleading information. Misinformation is a big piece of the puzzle because you can have people who are absolutely well-meaning, who have no intention of spreading something false or misleading, but who passed it along because maybe they think it's important information and it's true. Or maybe [they spread it] because it came from a friend and they trust their friends and so they believe it's true.

Tell us about your latest verification handbook.

We published the first verification handbook in 2014, and it was framed in the context of emergency and breaking news. Given the information environment has changed a lot since 2014, this latest handbook focusses on investigating disinformation and media manipulation.

I found people who are some of the best practitioners in this area to contribute to the handbook. We have folks from NBC News, Rappler in the Philippines and BBC Africa as well as research organisations like Citizen Lab. We also have examples coming from different parts of the world like Brazil, the Philippines, West Papua and other countries to show the global extent of this. The handbook comes out in April 2020 in Perugia at the International Journalism Festival. Verification Handbook 2: disinformation and media manipulation will be free and available online.

Unnamed 1

What one chapter is a must-read for data journalists?

We have a chapter and case studies looking at inauthentic accounts and activity around bots. I think that's a place where data journalists might be really interested in because you're making a determination based on data, typically whether you're determining what account is authentic or not.

We show some of the tools, approaches and patterns you want to look for in that data and then that activity. It's an opportunity for somebody who knows how to get data from an API endpoint and gather that data. Then you can think about how you want to sort and analyse that data.

The chapter also gives data journalists some non-technical approaches for how you think about inauthentic activity in our social media environment. I suspect that those case studies on Twitter and bots will be really interesting for the data journalism community.

Unnamed 2

Craig Silverman is one of the world's leading experts on online misinformation and content verification.

You've broken countless stories on media manipulation. Do you ever use data skills for these investigations?

I know some basic HTML, and I took an R programming course at IRE a couple of years ago. However, I'm not proficient enough to be applying that in my work. For me, gathering and analysing data is really important. I'm often using pre-existing tools to help me do that. For example, I use Crowdtangle, which is a great tool and platform where you can query for Facebook and Instagram posts going back in time. You can pull historical data and get that as a downloadable CSV. I do a lot of my work in Excel or Google Sheets in terms of gathering, cleaning, sorting and filtering the data to get the insights that I'm looking for.

I'm fortunate that at BuzzFeed News we have a data journalism team led by Jeremy Singer-Vine, who is fantastic and who I've done a whole bunch of stories with. When there is a data component to it or the manual approach is going to be insanely time-intensive, and there's clearly a way to produce a script to gather it, I team up with Jeremy. We've done a lot of stories together, and it's a really great partnership. We're very fortunate to have folks in our newsroom with those skills who really love teaming up on stories. They are 100 percent full collaborators, and they just bring a whole other skill set and mindset to it.

Do you think the media industry needs a redesign?

Realistically, I don't think there's going to be a massive slowing down. I do think newsrooms have to be conscious of the decisions they're making and fight this drive and desire to be first or to jump on something right away. That is a permanent problem because it is a permanent tension we have in our newsrooms.

If you develop good verification skills for the digital environment as well as good traditional reporting of verification skills, over time you will be able to be faster because you're practising and you know what to look for. That's why these verification handbooks are so essential. They're free and written by experts in a way that gives people actionable, useful hands-on advice.

I think we shouldn't hesitate to rethink our media environment and how we do our jobs better. If more journalists had better verification skills, we would avoid some of the pitfalls that you see in those urgent, early moments where mistakes often get made.

Other happenings on

Data journalism isn't just for in-depth projects or investigative reporting. MaryJo Webster explains how to tell quick and powerful stories with data when the clock is ticking. Check it out here.


Our next conversation

We've all heard about the gender pay gap, but what about the gender data gap? To mark International Women's Day on 8 March, we'll be talking with Data2X's executive director Emily Courey Pryor about how missing or incomplete data on girls and women stop journalists from telling the full story. We'll also discuss what steps data journalists can take to ensure the data they use is inclusive.

As always, don’t forget to let us know what you’d like us to feature in our future editions. You can also read all of our past editions here.


Tara from the EJC Data team,

bringing you, supported by Google News Initiative.

P.S. Are you interested in supporting this newsletter as well? Get in touch to discuss sponsorship opportunities.

subscribe figure