datajournalism.com

OSINT for environmental investigations

Tara Kelly — Wed, 18 Oct 2023 09:45:00 +0200

Welcome back to our latest Conversations with Data newsletter!

Investigating the hidden corners of the climate crisis is no small feat. Exposing environmental crimes or wrongdoing requires data skills and open source intelligence (OSINT) to cover evidence-based investigative stories about climate change. So, what does it take to uncover the invisible with a forensic approach? What are the best sources to track this environmental wrongdoing and corruption?

To gain answers to these questions, we spoke with Sam Leon, co-founder of DataDesk, and Ben Heubl, an investigative reporter with Süddeutsche Zeitung. Listen to the podcast on Spotify, SoundCloud, Apple Podcasts or Google Podcasts.

Alternatively, read the edited Q&A with Sam Leon and Ben Heubl below.

What we asked

Tell us about you and your work.

Sam: I'm Sam Leon and I started Data Desk with Louis Goddard. Data Desk is a research consultancy where we provide insights and analysis on the commodities at the heart of the climate crisis, primarily oil, gas and coal, but also carbon, forest risk, and agricultural commodities like soy and palm. We work extensively with journalists and climate think tanks, providing insights on supply chains. We're both programmers and I conduct open source investigations. Our research tends to combine a degree of automation with the aim of illuminating the hidden corners of our energy system. Prior to this, I set up a digital investigations team at Global Witness, where I worked for seven years.

Ben: My name is Ben Heubl and I work for Süddeutsche Zeitung's investigative team. I currently do open source intelligence, but my background is in computer-assisted reporting or data journalism. I started my career in journalism in 2014 and I got into investigative journalism by conducting complex data investigations.

Sam, You worked on an investigation with Le Monde and Global Witness investigating TotalEnergies' connection to military jet fuel supply chains in Russia. Tell us more about this investigation.

Sam: The story was published in Le Monde about a year ago. I worked on it while at Global Witness, leading a team investigating Russian fossil fuels. Essentially, we found a gas project owned by TotalEnergies was linked to a supply chain that was providing feedstock to a Gazprom-owned refinery that was producing jet fuel for the Russian military, including to airbases where Russian military jets were used to bomb Ukraine.

This story was made up of three components. There was rail freight data that, at that time, was being provided via some commercial quantity platforms. There was satellite imagery to verify the presence of specific kinds of jets at airbases, and there were datasets of Russian military airbases. It was a question of connecting where we knew Western joint ventures were producing this feedstock of gas condensate and how that was then moving via rail primarily to refineries producing the jet fuel. Making those links and establishing which European companies were connected to those supply chains, and TotalEnergies was one of them. This was happening in the wake of the full-scale invasion of Ukraine. It was particularly significant and potentially very damaging for the company in the wake of this investigation.

What was the impact of this investigation?

Sam: Not long after the story was published, TotalEnergies did divest from the gas field in question. They sold it; they denied that any of their gas condensate produced from the project was going into military jet fuel. But what's been particularly pleasing about this investigation is that it enabled other European journalists to scrutinise the fuel supply chains in Russia of other European oil and gas companies.

For instance, in Germany, our friends at Paper Trail Media and DER SPIEGEL replicated our approach to find links between Wintershall Dea operations in Russia and the military fuel supply chain. We were transparent and worked with our colleagues at the Anti-Corruption Data Collective to share this Russian rail freight data more widely. This meant other groups could take forward the methodology and find other interesting stories.

Ben, You conducted an investigation examining how lithium firms in Chile are depleting water supplies with a cheap mining technique compromising the country's environment. Tell us about the investigation.

Ben: The topic of rare earth metals comes back to the very question of how fast and adoptive can we grow the green industry of clean technology. That includes everything from electric vehicles to wind turbines and anything that powers the green economy. We need to ask if this is harming other people. We saw a similar phenomenon with the Industrial Revolution, where it was not good for everyone, although it was perceived as such by the few who got rich from it. Lithium has to come from somewhere and then is put into battery production for renewable technology. This is the same for cobalt and other rare earth materials. If you have to mine so much of it, somebody is probably suffering from it.

With leaked documents and satellite imagery, our analysis at E&T revealed that the lithium mining techniques of open brine pools were wasting a lot of water. This unsustainable extraction method is not ideal, given Chile is one of the driest places in the world. We worked with SpaceKnow to analyse the satellite imagery, which showed how drastically the mining grew over a few years. Water is being taken from local societies and people who need it more. This is an investigation that could be replicated in other parts of the world that use the same method to extract lithium.

What one data source do you rely on regularly for your environmental investigations?

Sam: I like to use Sentinel Hub's EO Browser. It allows you to visualise satellite data from numerous satellites and data collections.

Ben: Regarding geospatial data in my investigative work, I often begin with Google Earth Pro.

How do you develop an OSINT mindset for environmental investigations?

Sam: When examining environmental wrongdoing, the aim is to put a magnifying glass on a particular set of trades. For instance, maybe focus on where there's fraud, leaks or spills associated with something that may have been unreported. Or maybe a company is making claims about a new carbon capture project they've built where the rate of capturing carbon doesn't add up. These are the types of scenarios where OSINT comes into play. There's no oven-ready data set that you can allow you to illustrate this. So, that is where you need to get creative. OSINT is constantly changing. You're using alternative data sources that go beyond what they disclose.

What other organisations do you look up to or follow in this space?

Sam: I'm a fan of Lighthouse Reports and its work on pesticides and food systems. I also follow The Centre for Research on Energy and Clean Air, particularly for its work on the international oil trade. Bloomberg Green does some excellent reporting in this space as well.

Finally, what kind of stories would you like to see more of when covering climate change?

Sam: I think we're at a pretty dangerous time. It's imperative to act right now to accelerate the transition as much as possible, to divert the worst effects of climate change. A huge amount of money is going into data collection and I think it's incumbent upon us to make use of this data and incorporate it or see what story is in that. Otherwise, it's just sitting on a shelf.

Ben: I'd really like to see more environmental stories full-stop. We also have to be better at giving more space to explaining how we got to the results in our investigations. I'd also like to see more journalists do an environmental investigations from countries that are most affected by climate change, corruption or illegal mining efforts. I don't want to just read such stories by Western journalists at The Guardian or The Washington Post. I want to see these excellent investigations done by local news outlets. More local journalism using open source intelligence by partnering with Western journalists could be a way forward.

Latest from the European Journalism Centre

Congrats to all the winners of the Climate Journalism Award. The ceremony took place alongside the News Impact Summit hosted by the European Journalism Centre and sponsored by Google News Initiative. The winners included Unbias The News, Datadista, BBC News, Tages-Anzeiger, The Guardian, DIE ZEIT, and Source Material. The winners were selected by an independent judging panel. Read the blog announcing the award winners from all five categories.

Latest from DataJournalism.com

Visualising the climate crisis matters. But for too long, journalists, researchers and policymakers have relied on metrics that no longer serve our needs. As record-breaking heatwaves, floods, and fires grow, the world needs a new way to understand and communicate what is happening to mitigate this problem. Duncan Geere takes us behind the scenes and explains how UCL's Climate Action Unit and Data4Change developed a new dashboard. Read the full article here.

Scientific evidence clearly shows that women bear the brunt of climate change, yet until recently, scant attention has been paid to their plight. In many instances, the data are there, waiting to be discovered. Sherry Ricchiardi explains how to cover this critical issue with solutions reporting underpinned by data. Read the full long read.

Want to be on the podcast or have an idea for an episode? We want to hear from you. Don't forget to join our Discord data journalism server for the latest in the field. You can also read all of our past newsletter editions or subscribe here.

Onwards!

Tara from the EJC data team,

bringing you DataJournalism.com supported by Google News Initiative and powered by the European Journalism Centre.

PS. Are you interested in supporting this newsletter or podcast? Get in touch to discuss sponsorship opportunities.

Navigating environmental justice with data

Tara Kelly — Thu, 14 Sep 2023 13:16:00 +0200

Welcome back to our latest Conversations with Data newsletter!

For decades, communities of colour have fought for environmental justice in the United States. While some progress has been made, Black and Latino communities still disproportionately bear the brunt of environmental degradation and climate change. Heavily polluting industries are more likely to be based in these communities, resulting in more health and respiratory problems among locals. The data backs this up with higher rates of asthma and COVID-19 deaths among Blacks and Latinos than their White counterparts. Even more frustrating, they suffer higher displacement rates due to natural disasters and less generational wealth from past discriminatory policies like redlining.

So, the question remains: How can today's journalists reflect the impact of the climate crisis through the lens of such marginalised communities? In this episode, we spoke with two award-winning journalists from Los Angeles to the Carolinas to find out: Dana Amihere, executive director and founder of AfroLA News, and Melba Newsome, a freelance journalist reporting on environmental and racial justice. The pair provide insight into how they cover this important issue using data and solutions to inform their reporting.

Listen to the podcast on Spotify, SoundCloud, Apple Podcasts or Google Podcasts.

Alternatively, read the edited Q&A with Dana Amihere and Melba Newsome below.

What we asked

Tell us about you and your work on environmental justice.

Melba: I am a veteran independent journalist with over 20 years of experience in health and science reporting. Since the pandemic, my focus has shifted to environmental journalism, environmental justice issues, environmental health, and environmental racism.

Dana: I'm the executive director and founder of AfroLA News, a local non-profit newsroom in Los Angeles. Our goal is to provide solutions reporting and news for Los Angeles through the lens of the Black community. We also focus on other historically marginalised communities, showing how disproportionality and inequity affect them and us all.

Melba Newsome is a veteran journalist with 20 years of reporting experience.

Can you give us an overview of how the environmental justice movement first began?

Melba: I live in what's considered the birthplace of the environmental justice movement: North Carolina. That's attributed to an event that happened 40 years ago, which was the Warren County protests. This happened in Warren County, when the state government was going to dump PCB chemicals in this mostly Black community. They protested against that and they received training from the people who were in the civil rights movement. The protests lasted for weeks, and about 500 people were arrested. It also got international attention. Ben Chavis, who was leading that protest when he was arrested, famously said, "This is environmental racism". Dumping was always happening in Black communities because they had no political power to stand up against it. The Warren County protests was the beginning of the environmental justice movement, and we're really proud of that here in North Carolina. But 40 years later, I'm not sure how much progress we've made, given certain communities are still experiencing this.

How do you explain the intersectionality of racial justice and environmental justice to naysayers?

Dana: From the very beginning, AfroLA News was built on the premise of intersectionality. I truly believe that nothing can be boiled down to just a monolith. No person, no ethnicity, no community is just one thing. It's more complicated than that because we are complicated creatures, and this is a very complicated world, including environmental and racial justice. They're connected. It's a very complex set of circumstances in which multiple things are set into motion, and they have varying outcomes and impacts on people and communities.

AfroLA is solutions-focused, data-driven and community-centred journalism for Los Angeles, told through the lens of the Black community.

Tell us about AfroLA News' 2035 climate solutions series. How did it come about?

Dana: The 2035 series has been in progress since the inception of AfroLA News. It came about when we conducted an audience needs assessment survey to understand better what editorial priorities we should focus on. We asked them what they wanted from us as their local news provider. Over a third of participants said they wanted climate change coverage. We noticed that the year 2035 kept coming up as a target for federal and local governments under environmental policies. As a result, we used 2035 as a framing for the series to explore some of these issues using a data-driven solutions-reported perspective. Our first story focuses on the pollution legacy of segregation in LA. Some say, "Well, we don't have until 2035". My response to that is we absolutely agree with you. 2035 is not our deadline. That may be the deadline for the government, but we're saying we've got to do something now.

Dana Amihere is a designer, developer and data journalist. In April 2021 she started Code Black Media, a digital media consultancy that lives at the intersection of data, design and equity. She is the founder and executive director of AfroLA News.

What data challenges do you encounter when covering the climate crisis through the lens of Black communities?

Dana: I think the problem extends not just to covering climate change for Black communities but also for many historically marginalised communities. In terms of data collection, sometimes the data isn't there. We don't have it because nobody has bothered to collect it. In other cases, someone has collected it, but a government agency has it, but they don't want us to have it. Another scenario is the data exists, but the format we get it in is deplorable. You need the skills and experience to convert those files into something more easily accessible to use and analyse. It can be a very long, drawn-out cycle.

One workaround that has helped me is collaborating with other developers and groups in the non-profit news space who have built APIs to tap into these government agencies. The programs they've built constantly pull in new reports or data from these systems. One such project is the Data Liberation Project led by Jeremy Singer-Vine. The whole initiative aims to identify, obtain, reformat, clean, document, publish, and disseminate government datasets of public interest.

How do you use data to inform your reporting of local and marginalised communities impacted by climate change in North Carolina?

Melba: So much of what I write about is people and places. For instance, people living in those places and the impacts they face. There is a good deal of data out there to show where certain industries are located and what communities are most impacted. Most of the environmental groups here have geo-mapped them. While we have many issues in North Carolina, industrial animal farms are a huge issue for us. I use the data to paint a picture and show where all of these poultry farms and hog farms are located. This is very telling and helps me explain to people what is happening. By looking at income and race, you can see that clearly. For almost any story linked to environmental justice, I can find the numbers to back it up. Another example is the wood pellet industry, which is booming in North Carolina. The first places where they were situated were all in Black communities, particularly economically depressed Black communities. They manage to get there by promising jobs. However, none of that ever comes through. These industries have deliberately situated themselves in these neighbourhoods.

What one thing do you wish to see changed?

Melba: The one thing I'd like to see is some actual justice. But how that would come about, I don't know. It starts with awareness. Even though we've made strides within the impacted communities over the years, I think that it needs to become as prominent as the civil rights movement became. That is how we can engage people across the strata. And it starts with realising how much is at stake.

Dana: I would love to see better data collection and transparency. When we ask some agencies for the data, sometimes I encounter a defensive attitude. That has to go away. In terms of action, I think the onus is more on local government, federal agencies and policymakers. Our legislators are the decision-makers. They've got to step up to the plate to make this happen.

Finally, what books do you both recommend journalists read to learn more about environmental racial justice issues?

Melba: Dumping In Dixie: Race, Class, And Environmental Quality is a seminal book on environmental racism by Robert Bullard. He is considered to be the father of the environmental justice movement, and he's an adviser to President Biden on this issue. Another book I recommend is WASTELANDS: The True Story of Farm Country on Trial by Corban Addison. It's solely about the hog farm industry in North Carolina and tells the whole story about how this started, where it's going, and follows that lawsuit.

Dana: I recommend the Intersectional Environmentalist: How to Dismantle Systems of Oppression to Protect People + Planet. The book makes the connection between environmentalism, racism and privilege. The bottom line is we're not going to save the planet if we are not listening to people who are marginalised and most affected by this. They have to be part of that conversation as well.

Latest from DataJournalism.com

Scientific evidence clearly shows that women bear the brunt of climate change, yet until recently, scant attention has been paid to their plight. In many instances, the data are there, waiting to be discovered. Writer Sherry Ricchiardi explains how to cover this critical issue with solutions reporting underpinned by data. Read the full long read.

Collaboration is a big part of data journalism. But sometimes, it can be difficult to figure out who is behind the team and what news organisations data journalists work for. That's why DataJournalism.com is joining forces with Lighthouse Reports to build a data journalism roster across Europe. Share your details and topics of interest for collaboration via our Google form. Please note this information will be shared publicly within the data journalism community.

Latest from European Journalism Centre

Are you attending the News Impact Summit in Lisbon? There's just one more month to go and registrations are still open! Come and explore the intersection between journalism and climate science, addressing the opportunities and challenges faced by journalists in an era of rapid technological advancements and growing climate urgency. Register now!

Onwards!

Tara from the EJC data team,

bringing you DataJournalism.com supported by Google News Initiative and powered by the European Journalism Centre.

PS. Are you interested in supporting this newsletter or podcast? Get in touch to discuss sponsorship opportunities.

Connecting local and climate journalism

Tara Kelly — Wed, 26 Jul 2023 11:15:00 +0200

Welcome back to our latest Conversations with Data newsletter!

For decades journalists have examined the global ramifications of climate change on the international stage. From highlighting scientific research findings to covering climate negotiations at UN conferences, audiences are well-versed in the global challenges of our time. But as local communities become more impacted by extreme weather in a variety of ways, there is a strong need for journalists to deliver reporting on climate change through a local and solutions lens.

In this podcast episode, we spoke with Alex Harris, the Miami Herald's lead climate change reporter and Tahmid Zami, the Dhaka-based climate journalist at Thomson Reuters Foundation. We hear helpful advice from the United States to Bangladesh on approaching localised climate stories with data.

Listen to the podcast on Spotify, SoundCloud, Apple Podcasts or Google Podcasts.

Alternatively, read the edited Q&A with Alex Harris and Tahmid Zami.

What we asked

Tell us about your role and how you cover climate change for local communities.

Alex: I cover climate change for the Miami Herald. We now have a three-person team, and I have been writing about how climate change affects communities at the local level for eight years now. I spend a lot of time trying to make the people of Miami understand the issues that we're facing.

Tahmid: I am from Dhaka, Bangladesh, and I work as a just transition reporter for Thomson Reuters Foundation. Just transition is about showing the linkage between climate, climate transition and how it is impacting communities. I've been reporting for more than a year and a half on how the climate transition in Bangladesh is impacting workers, ordinary communities, minorities and women.

Miami is known for its harrowing hurricanes. How have you become a specialist in covering them for your local audience?

Alex: Hurricanes are a huge issue for us. They are a common unifier for folks who may not feel very strongly about the impact of climate change in their life. But you can guarantee that anyone who lives in Miami understands the risks they face with hurricanes and wants to know the most up-to-date information they can get. We spend a lot of time doing day-to-day updates when a storm is coming and when one isn't imminent by doing stepbacks. For instance, we talk about how effective the latest models we use are, how effective the protections that the county or the cities are looking into are, and how the warming world is shaping that risk. We do a lot of deep dives into the science of how climate change affects hurricanes or makes certain aspects more likely, like stronger or wetter hurricanes. Surprisingly, it changes them in ways you wouldn't expect. For instance, we may see fewer hurricanes as the world gets warmer.

How do you use data to understand those hurricane models you've written about for the Miami Herald?

Alex: I use a lot of data when focusing on risk. For instance, we look at mapping for floods; we look at the age and distribution of buildings to understand where the old and new housing stock is located. Newer housing is built to a slightly higher level and, therefore, might be more able to withstand a storm. In our stories, we try to highlight the risk and who is facing most of it. For instance, when Hurricane Ian was coming, we did a lot of mapping and data journalism on the impact of storm surge, which is record high for us on the west coast of Florida. When that hurricane came slamming in last year up to 20 feet, we were able to look at the maps and see where more would be expected and how that would affect some of the buildings there. Data is a really important part of the conversation.

Tahmid, share with us how locals in Dhaka, Bangladesh, are experiencing climate change right now.

Tahmid: Bangladesh is one of the most climate-vulnerable countries and a developing country with limited resources. It has a limited capacity to address the climate challenges it is facing. If we look at the main climate challenges, the most recent and visible one is the current heat waves that have affected the greater part of the world. But this has also become worse in Bangladesh. Dhaka is a big metropolis with 15 to 20 million people. Most of the population lives in informal settlements, which do not have proper cooling systems. Many of these people also work outside in the warm sunshine during the daytime. Given Dhaka's built environment, these people face severe issues related to heat-trapping and the heat island effect. There are also several structural problems that make it worse, for example, the lack of availability of clean drinking water. Another issue is the availability of healthcare services. Climate hazards such as heat waves can make things worse for people. Bangladesh also struggles with longer-term issues such as seasonal cyclones, the longer-term salinity issue in the coastal area and floods in the northern region.

How do you use data to report on the challenges in Dhaka?

Tahmid: When it comes to using data for reporting on issues such as heatwaves, you can use satellite maps to see which areas are more affected by them. Certain built environments might make them worse. We can explore where the main green spaces are within Dhaka and ask why these are distributed where they are. In our reporting, we have shown how most of the green spaces and bodies of water are often concentrated in more upscale neighbourhoods. Meanwhile, in the slum regions, there is less green space. Data can help us explore unseen dimensions at a macro level; we use it substantively in our reporting.

How do you embed climate solutions journalism into your reporting?

Tahmid: Solutions journalism is something that we are increasingly concentrating on. Without solutions, journalism becomes merely a series of doom-and-gloom stories that induce despair and negativity in people. When it comes to reporting on issues like salinity or heat waves, we try to point out emerging solutions. Our main approach is to identify what the emerging solutions are. We highlight small steps and innovations happening in the community that can lead to change. This may not impact a large number of people. But if you point out the changes, then the government, development organisations or financial organisations can pick up on that and potentially dedicate resources to those solutions and scale them up. An example might be creating rooftop gardens to spur more green spaces for the community.

Alex, what data sources do you regularly monitor locally and nationally?

Alex: We have NOAA (National Oceanography and Atmospheric Association). This tracks tidal gauges all up and down the coast. NOAA allows you to examine their predictions of when we might see flooding, but also historical data showing how bad it's been in the past. They also show king tides, the highest annual tides of the year, which can lead to flooding. I also look at heat data from the National Weather Service that we can predict going forward and historically looking back. I also like to use the Environmental Protection Agency's flight tool, which allows us to look at the historical and projected data of greenhouse gas emissions in all sorts of buildings. For instance, we look at power plants, cement factories, and landfills, and you can drill down and see the biggest polluter in our area.

Internationally, I examine the different measurements of global heat. We've broken quite a few global heat records in the last couple of weeks. The ERA5 is a global weather model tracking global temperatures since 1940. We're able to look at that and see we've broken heat records. July 3rd was the hottest day on the entire globe, followed by July 4th and July 5th. These giant global measuring systems and data repositories are really helpful when you're trying to put the local effects into context.

How do you engage audiences on news avoidance on this topic?

Alex: Getting people to read stories about big, scary things is hard. People only have a tolerance for so much bad news in their life. I think that's why today's conversation with all of us has focused on solutions journalism. But every story can't be solutions journalism. Sometimes you have to tell the reality that things are bad and nobody's doing anything to fix them. I've been experimenting for years and trying to find what kinds of topics people will engage with. Yes, solutions stories are great, and they will usually engage with those. But we find the topics like healthcare, hurricanes and how things affect your wallet are the way that people will engage. For instance, we discuss climate change as a health threat and show how it affects you today. If we talk about the way that climate change is making life more expensive for you, your city taxes are going up, your property insurance rates are exploding, your home value could be declining because of the flood risk. If we focus on wallet and pocketbook issues, people do engage with those.

The last couple of weeks have been filled with a lot of very bad news for Florida. We've got home insurers dropping out of the market, global heat records being broken, and outdoor workers dying in the heat. I expect in the next couple of weeks; we'll see a pullback from readers who are overwhelmed. Our strategy after a series of scary headlines is to try to tell a hopeful story for our readers. For instance, is there an activist doing something wonderful that we can zoom in on or think about a building solution that is underway in our community? We ask how these technological or scientific solutions are developing.

Finally, what advice do you have for local journalists new to covering climate change?

Alex: If you are going to start covering climate change in a local area, you don't have to be an expert on climate change. Find what makes your community unique. Find something that holds local significance and then ask how climate change affects it and what solutions are in play. This gives you a recipe for a story that people care about. You can learn a lot of things about your community and how climate change uniquely affects it based on focusing on the things people care deeply about. That's the best way to hit the ground running when you start covering this topic.

Tahmid: My advice is to build a network of climate communicators. These people can be found in different kinds of groups and institutions. For example, policymakers, the development community, academics, media, etc. In Bangladesh, we are trying to develop such networks of climate communicators. This means that if a specific issue needs urgent attention from the media, it can make the rounds very quickly with insights from experts. There is an opportunity for people in other developing countries to build a similar network in their communities.

Latest from DataJournalism.com

Latest from European Journalism Centre

We are proud to present the official programme and speaker lineup for the News Impact Summit on Elevating Climate Journalism in Lisbon on 11 October 2023. Join us for a day of insightful talks and practical workshops from experts in the field. Register and attend the Climate Award Ceremony happening alongside the summit, where we plan to honour outstanding reporting and impactful storytelling, shaping a sustainable future. Check out the entire programme and speaker lineup here.

Onwards!

Tara from the EJC data team,

bringing you DataJournalism.com supported by Google News Initiative and powered by the European Journalism Centre.

PS. Are you interested in supporting this newsletter or podcast? Get in touch to discuss sponsorship opportunities.

Powering your climate solutions reporting with data visualisation

Tara Kelly — Wed, 07 Jun 2023 12:48:00 +0200

Welcome back to our 100th edition of the Conversations with Data newsletter.

Climate change is one of the most pressing issues of our time. Yet communicating about this can present some challenges, particularly with audience apathy. So how can data journalists help engage readers and empower them to take action? Embedding data visualisations in your climate journalism is one way to boost audience engagement.

In this podcast episode, we spoke with information designer and data storyteller Duncan Geere, data journalist Pei Ying Loh from The Kontinentalist, and visual journalist Rodolfo Almeida from Nucleo. Drawing on their combined experience in information design and climate solutions journalism, we hear helpful advice on approaching data visualisation design for impactful climate coverage.

Listen to the podcast on Spotify, SoundCloud, Apple Podcasts or Google PodcastsAlternatively, read the edited Q&A with Duncan Geere, Pei Ying Loh and Rodolfo Almeida.

What most appeals to audiences when designing visualisations for climate journalism?

Duncan: In my experience, reporting on stories around people is most engaging for general audiences. When working with climate data, it's easy to start writing stories about temperature anomalies, sea level rise, atmospheric forcings, or other technical terms like this. Those are the metrics that your data will come in, but those things are a bit too abstract for a lot of people. They don't necessarily care about model outputs and stuff like that. What they care about is what is happening and what's going to happen to human beings. And most of all, they care about what happens to human beings like them.

What other challenges do journalists face when engaging audiences on climate change?

Duncan: Many audiences are super fatigued by stories about disasters happening on the other side of the world. And if you're looking for big engagement on a climate story, you need to centre your readers in it and make it relevant to the people you're talking to. Find the people who are in your audience that are being affected by climate change and tell their stories. And then, finally, I think people have very much become numb to the constant stream of disasters. All our TV shows, movies and books are about dystopias and disasters. And in the work that I do at Possible, the climate charity, we try and very much focus our storytelling around solutions.

At The Kontinentalist, what approach do you take for engaging with audiences on climate change issues?

Pei Ying: I recently attended the International Journalism Festival in Perugia, and they had an entire session about how you should be reporting on the climate in a way that translates into real impact. One of the things that they mentioned is that guilt doesn't work. Alarmist headlines like "the earth is on fire" don't work anymore. People tune out and are not interested in that narrative. That's why at The Kontinentalist, we approach our stories with empathy and put ourselves in our audience's shoes. We talk about things that will have a direct impact or interest. We use language that avoids doomsday messaging or making them feel guilty. We focus a lot on empowerment, not hope. For instance, we give them suggested actions to change the situation at the end of a story. Some of those could be talking to their elected representatives, being more conscious about their own consumption habits or even educating people around them.

Could you cite an example of what formats have worked best at The Kontinentalist for climate journalism?

Pei Ying: Some of our most effective climate stories have been what we call micro-stories. These are told in squares on Instagram with text and visuals. We only post them on social media and not on our website. This is the opposite of long-form visual, interactive pieces. Because of their short attention span, this format is a lot better at connecting with younger audiences, especially Gen Z. It is easier to reach them where they are instead of forcing them to leave a social platform to read something somewhere else. There's also a certain viral-ability factor for these types of stories. People can easily share it on their own Instagram account.

Rodolfo, you gave an inspiring talk on visualising the invisible at The Outlier conference this year. Could you tell us more about it?

Rodolfo: That talk was primarily derived from research I've been conducting at the Universidade Federal do Rio de Janeiro, where I'm investigating how the climate crisis is portrayed in data visualisations. I'm trying to find a theoretical framework to prompt better visualisations of the climate. This is mainly motivated by the discomfort I've felt when designing charts that depict this really brutal climate change data, which after you've made the 10th or 20th map with data on deforestation in the Amazon, you start to wonder if there's a more effective way of delivering this message to a general audience that gets them to feel involved with the problem and digest the scale of it.

How does data visualisation lend itself to communicating about climate change?

Rodolfo: Data visualisation, in a sense, is a fertile ground to work with climate issues, not only because climate change is a phenomenon that is very much attested through data but also because I think data visualisation is really helpful in getting a reader to come up with a mental model of the world to help them grasp a very complex issue in a simple and manageable representation. The general idea that I was calling attention to is that climate change is this hugely complex problem, more complex than we can even grasp. Because there are many interconnected and ecologically interacting data points, it can be easy to lose that sense of scale and report on specific issues without considering how they interact with larger climate issues or with the bigger picture. As data journalists, we need to actively take a step back and form a bigger picture.

Another speaker at that conference, John Burn-Murdoch from the Financial Times, showed us the results of some research on how charts are very effective in changing minds regarding climate change. That is mostly because charts do seem to have an authority and credibility factor to them, or at least they're understood this way by the reader. We should be making responsible use of this to deliver the messages that need to be delivered.

What are some challenges faced by journalists covering the Amazon in Brazil?

Rodolfo: Many challenges exist in Brazil. This is especially the case during the last four years with Jair Bolsanaro in his last term. Most reporters I know who are working with climate data are denouncing corruption in the cattle industry, land grabbing or deforestation in the Amazon. Most of these issues are deeply related to the preservation of the Amazon and, therefore, to the larger climate situation. But they are also related to very powerful institutions and people in Brazil. These journalists who cover this receive death threats and harassment from people who are economically interested in keeping things as they are. Some publications more invested in covering climate justice can get labelled activist publications. Journalistic credibility can be questioned because of their interest in this issue. And the media ecosystem still has this idea that you have to be neutral when reporting specifically on a certain subject, which can damage the credibility of those doing important work and calling attention to situations that need to be denounced.

Duncan, you've worked in journalism and for a not for profits focusing on climate change issues. How does your data design process differ across these two worlds?

Duncan: The goals differ. As a journalist, my job is to help people understand the world better. For instance, it is about drawing their attention to interesting or useful things they might not have seen. This is a somewhat passive role where I'm just showing the information. My work for non-profits like the climate charity Possible is much more about getting people to change their behaviour -- both the general public and politicians. My role is a lot more active and involves thinking about how to make it easier for people to do something different to change their behaviour. It's a lot harder, to be honest, but the rewards are so much bigger. For me, it's worth trying to take on that challenge.

What one thing would you all like to see in the future from journalists when it comes to tackling climate change?

Duncan: I would like to see more journalists use data and visualisation to engage politicians, businesses and other people with power about the sustainability promises that they are making. Are these promises good enough, and how are we going to meet them? I want to see journalists set out the situation that we are in, clearly and truthfully, and show what can be done to fix it. We also need to ask why we are not rolling these solutions out fast enough. We have all the solutions that we need to fix our climate. We just need to see those solutions used. And journalism can play a huge role in making that happen.

Pei Ying: Within Singapore and the Asian region in general, we are far behind in data journalism and much, much further behind when it comes to climate journalism. Climate journalism is still pegged to the latest IPCC report or COP event. But every news topic has a link to climate change and action. Encouraging more conversation around solutions rather than just talking about how bad things are would be helpful. We need to empower people to feel that they have the ability to change things. We also need journalists to show how communities are directly impacted by climate change in their reporting.

Rodolfo: One thing I would like to see is stronger data visualisation. We need more impactful and direct data visualisation that's not scared of packing a punch and delivering an emotionally compelling experience. If we are not allowed to feel emotions when reading data about our own extinction, there's something wrong with how we communicate this data. There's no solving climate change. It's more of a situation to mitigate and adapt to as much and as quickly as possible. And that sense of urgency involves making some tough decisions relating to our daily lives and habits as well as how our economies are structured.

Latest from European Journalism Centre

Building on the European Journalism Centre’s partnership with the Google News Initiative, we are excited to announce that this summer, we will launch a European Climate Journalism Award, with the winners to be announced in the autumn at our 2023 News Impact Summit in Lisbon. Want to know more? Next week we will announce further details about the award's prizes, categories and eligibility for entering. Sign up for EJC's newsletter for the full announcement.

Latest from DataJournalism.com

Onwards!

Tara from the EJC data team,

bringing you DataJournalism.com supported by Google News Initiative and powered by the European Journalism Centre.

PS. Are you interested in supporting this newsletter or podcast? Get in touch to discuss sponsorship opportunities.

Mastering the art of data collaboration

Tara Kelly — Wed, 14 Dec 2022 15:06:00 +0100

Welcome to the latest Conversations with Data newsletter.

Before we dive into this issue, we wanted to remind you to take The State of Data Journalism 2022 Survey! Don't miss out on some cool prizes! Available in four languages, the survey will close at the end of December.

Now on to the podcast.

Mastering the art of data collaboration is no small feat. And this is even more true when conducting document-heavy investigations. But the reward for finding untold stories can make it worth your time. That's what we learned from MuckRock's data journalists Betsy Ladyzhets and Dillon Bergin in a live Discord chat a couple of weeks ago.

The pair talk about their latest long read on how data and collaboration can power public health investigative stories. We also hear how MuckRock's tools can help journalists bring about transparency to hold the government to account for document-based reporting.

Listen to the podcast on Spotify, SoundCloud, Apple Podcasts or Google Podcasts. Alternatively, read the edited Q&A with Betsy Ladyzhets and Dillon Bergin.

Tell us about MuckRock's tools and its mission.

Dillon: MuckRock is a non-profit, collaborative news site that helps give you the tools to hold the government accountable. Our tools aren't just for journalists but also for citizens, researchers or activists. First and foremost, we are FOIA nerds. The rest of what we do largely stems from that. The MuckRock website helps people file FOIA requests, while DocumentCloud helps people analyse and share government documents they get from those requests. And more recently, the editorial team now helps newsrooms use public documents and data in their reporting. We are based in the United States, and Betsy and I have collaborated with reporters in newsrooms from Utah, Missouri, Michigan, Mississippi and Louisiana.

What other resources does MuckRock provide to help people understand document-based reporting?

Betsy: The value that MuckRock provides is showing people what can be done with FOIA requests and document-based reporting. We often will publish reporting recipes taking people behind the scenes of our projects. We also publish explainers that show how to use a certain type of document or how to think about requesting a certain type of information. I believe these resources can be useful no matter where you are.

How does MuckRock's editorial team collaborate with other newsrooms?

Betsy: MuckRock's editorial team is relatively new and came out of the Documenting COVID-19 project, which Dillon and I both worked on. We're still working out the best processes for starting collaborations and ensuring that editorial expectations are clear. These collaborative stories tend to come from datasets, or specific topics that we think can be better uncovered or explained through data or records. We will typically approach local partners that seem like a good fit for a particular region or topic.

One example is the Uncounted Project about excess deaths. But another long collaboration that I've worked on has been looking at the pandemic response in the state of Missouri with the Missouri Independent, a non-profit newsroom that covers statewide politics and policy. Dillon has been very involved with a project in Chicago looking at air quality data from a new network of air quality sensors set up in the city. We've also worked with other non-profit newsrooms, like the Idaho Capital and the Salt Lake Tribune.

Tell us more about the Uncounted Project.

Betsy: This project started with an article that the Documenting COVID-19 Project published in the summer of 2021 with the Kansas City Star, a Missouri news outlet, talking about a coroner based there who told the project that he went against CDC guidance and he would write down a cause of death that did not include COVID-19, even if the person probably had died from COVID-19 if the family didn't want COVID-19 on the death certificate. This story is indicative of larger issues in the death system in the United States, where we have a very decentralised healthcare system, as the pandemic highlighted.

The United States also has a decentralised death system for investigating and recording how people die. Every state has their own process, and even within states, you can have different counties with different procedures and levels of resources. Sometimes the people doing this work for a particular county may even be elected and face political pressure. Or they might not be trained at all or have very limited training in how to do autopsies or understand how somebody passed away. All of this came to light in that story, which went viral and caught the interest of Andrew Stokes, a demography professor at Boston University's School of Public Health.

How did your collaboration with the team of demographers at Boston University happen?

Betsy: Andrew Stokes reached out to us saying he had also been looking at this problem and was trying to figure out where COVID-19 deaths might have been undercounted across the country. He found that individual story about this one coroner in Missouri to be an example of this bigger problem. Our team at MuckRock started collaborating with Andrew Stokes, and we have been working with him and his team over the last year and a half. Dillon was the lead reporter on stories that explored this undercounting problem and highlighted a couple of specific places: one county in Missouri, one in Louisiana and one in Mississippi.

This past year, I've also been working on a follow-up story that looks more at demographic patterns with excess deaths. This involves trying to understand if there are certain groups of people -- looking at race and ethnicity -- that are more likely to be undercounted. That story will come out in the next month or two.

How did this mix of collaboration with experts and local reporters help bring the story home?

Dillon: I think one of the really important parts of the story was the different systems and contexts of death investigation across the country. And that's also what made this project so important for us in terms of collaboration because while we were working with demographers at Boston University to see the high-level view of the metric of excess deaths, we were also wondering what was causing this in these different places across the country.

To be able to do that, we needed to work with local reporters in each of these places and understand how death investigation works in this area of the country. Is it a coroner system? Is it a medical examiner system? Is it something else? What are the trends in deaths in mortality in this area and nationwide? Does that trickle down into these different places across the country? We wanted to find out how that was happening and why.

Dillon, you note that your reporting often begins and ends with two questions: Who does the data serve? What does the data conceal? Tell us more about this approach.

Dillon: In many instances, investigative reporting is exactly that. I think it's about the information that some people, usually in positions of authority or power, want to remain undisclosed. So as a reporter, I often ask myself, who would it serve to have this information or who should know about this data that doesn't know? Who should know about this information that doesn't? This approach to data reporting became clear to me while working on a story about evictions in New Mexico.

Finally, what coding languages are you most comfortable working with?

Dillon: I started coding with Python, but the programming language I now use regularly is R. The R Tidyverse packages are handy for data analysis. I find it easy to do basic math, such as slicing and dicing, pivoting, and aggregating data. Increasingly, I'm using it for data analysis, mapping data, and geospatial analysis. Many great packages are maintained by different academics who study the climate, environment and geography.

Betsy: Dillon is more of a coder than I am. I'm still a novice at R and Python, although I have some familiarity with both. I mostly use Excel and Google Sheets to do basic data analysis. And then I'll work with folks like Dillon, who have more of a coding background. I'm a big fan of Flourish, Datawrapper and Tableau for data visualisation tools.

Latest from DataJournalism.com

The State of Data Journalism Survey 2022 is back for the second year in a row! Help us understand how the field is evolving by sharing your insights. This edition also includes a special section on the Russia-Ukraine conflict. Available in Arabic, English, Italian and Spanish, respondents can also win some prizes. The survey will close on 31 December 2022, and we will share the results in early 2023. Take the survey today and spread the word!

Data has never been more critical for journalists since the COVID-19 pandemic. But collaboration is also crucial for data teams to find untold stories when covering public health. In our latest long read article, MuckRock's editorial team explains how working with a mix of experts and local reporters helped them investigate excess deaths in the United States.

Newsrooms worldwide are stepping up their climate coverage by investing in resources to grow and support their reporters. In this long read article, journalist Sherry Ricchiardi examines some leading examples of data-led climate reporting.

Onwards!

Tara from the EJC data team,

bringing you DataJournalism.com supported by Google News Initiative and powered by the European Journalism Centre.

PS. Are you interested in supporting this newsletter or podcast? Get in touch to discuss sponsorship opportunities.

How to save data journalism

Tara Kelly — Mon, 21 Nov 2022 13:36:00 +0100

Welcome to the latest Conversations with Data newsletter brought to you by our sponsor FlokiNET.is, a web hosting company that was established in Iceland to provide safe harbour for freedom of speech, free press and whistle-blower projects. You can get 15% off FlokiNET's products and servers by using the promotion code DATAJOURNALISM.

Don't forget that all registered DataJournalism.com members have FREE WEB SPACE AND DOMAIN with FlokiNET.is. Just log in to your DataJournalism.com profile and visit the goodies page.

The State of Data Journalism 2022 Survey is back! Help us understand how the field is evolving by sharing your insights. Take the survey and win some cool prizes!

Now on to the podcast.

Print and broadcasting news outlets have long archived stories. But for the past decade, data visualisations and interactive content have not been preserved. So what can be done to save data journalism?

We spoke with Bahareh Heravi from The University of Surrey and Simon Rogers from Google to explore some of the challenges and solutions to this problem. The pair discuss the importance of archiving and the necessary next steps to be taken by the news industry.

Listen to the podcast on Spotify, SoundCloud, Apple Podcasts or Google Podcasts.

Alternatively, read the edited Q&A with Bahareh Heravi and Simon Rogers.

What we asked

Bahareh, you co-wrote a research paper on preserving data journalism. What is stopping news organisations from archiving data visualisations and other dynamic content?

Bahareh: The main issue is how complex interactive data visualisations are. News organisations don't have the mechanisms for preserving this content. They are very experienced in preserving and archiving text-based, image-based or video-based content. First, archiving was analogue, and then it moved to digital. Interactive data visualisations are complex objects with many interdependencies and dependencies. As we move forward, the programming languages behind them change. For example, interactives used to run on Flash, but now they run on JavaScript. The interdependencies between them mean the software, the host, and the connections between the host and the news organisation must work for preservation to happen.

All the bits and pieces of the article need to be properly connected, and if one fails, that link falls apart, and the data visualisation doesn't load properly. Preserving data visualisations is similar to preserving software or digital games. It is not easy. Nevertheless, it should still happen. What is worrying is that a lot of content has already been lost in the past ten years or so.

Simon, as data editor at The Guardian, you worked on many interactive projects. What's your perspective on archiving data journalism?

Simon: It's funny, isn't it? Almost everything I've worked on probably doesn't work now at The Guardian. I can't think of many projects that do in terms of the bigger interactive projects. And the weird thing is, I have a poster at home of the very first issue of The Guardian. I can read the stories from the very first page of the very first Guardian, but I can't see an interactive that was the biggest thing on the site for a few days seven or eight years ago. This is partly because journalists do what we want them to do, which is "live in the now". There's a scrappiness to journalism that enables us to be innovative and get stuff done on a deadline. A different approach is applied to interactive and data content than written content.

How does the newsroom cycle lend itself to archiving content?

Simon: When I was at The Guardian, one of the rooms I enjoyed was the archive room. You could get all these old issues of the paper or have digitised versions. News organisations have a well-worn archive system for generating stories. You create a follow-up and then many other follow-up stories after that. There is almost a circle of life in a news story which can carry on for years and years. Journalists think of interactives differently. They see it as something they've got to get out tomorrow, and that's it. Libraries change, code changes. It starts to be expensive to keep things going. Sometimes you have to rebuild something from scratch because a programming language or tool has been discontinued that relies on a key code library. That means you have to start from scratch if you're going to keep it for minimal return.

For instance, one of the projects we worked on at The Guardian was the MPs' expenses story. It was the first big crowdsourced news organisation project. It was switched off just before I left because otherwise, we would have had to rebuild the whole thing. It costs money to keep these things going. It's expensive, so it dies. It's not that we want a project necessary to live on, but we want to know that it was there. It should have had an imprint, and that's important for institutional memory. There's even an inspirational thing about that for later generations.

How does archiving affect the way you teach data journalism?

Bahahreh: For me, showing those early versions of data journalism is important. It has historical importance. When you want to start teaching a topic to students, you want to be able to give them a bit of history and tell them how it all started. Those data visualisation pieces are part of the history of data journalism. This can be a bit tricky because when you refer to them, they don't exist. A 404 page appears. It is useful for my session on archiving, but there's always this worry that if I open any piece in the class, it may not work.

What recommendations do you have for preserving data journalism that you mention in your research paper?

Bahareh: In the research paper, we came up with two sets of recommendations. One of them involved long-term infrastructural recommendations that are not very easy to do. They are expensive and need time and resources. News organisations could work on these in collaboration with tech organisations to reach a point where they ensure data is not lost. The other set of recommendations is easy to implement and could be done by journalists individually. These are not too far off from what Simon mentioned about not having the interactive work exactly as it does but capturing it somehow for future readers.

In our recommendations, we draw on a concept called significant properties. The idea is that the journalists could develop a set of significant properties for their story by capturing some of the properties or characteristics of a visual or interactive piece. In the case of our research, it is about the work's behaviour, its interactions with the user, and the intent and intended purpose of that specific piece. What did it want to convey to the audience?

Let's take the MPs' expenses story Simon worked on for The Guardian. The journalists could develop a set of significant properties from this specific visualisation by asking what is important for the future reader in 15 years. It could be the overall shape of the project, how it looked like, but also the types of content that were there and what the interactions were. By identifying these kinds of significant properties, then they can start capturing what is important.

What practical steps would the journalists take to capture these significant properties?

Bahareh: One of the recommendations is that journalists should try to capture several screengrabs of the interactive, which is very easy to do. Based on the significant properties you identify, it could be done by one or even five screengrabs. If screengrabs aren't enough, you could create a GIF animation which encapsulates more screengrabs. Some news organisations are already creating GIF animations to promote their content on social media. However, it would be even better if these could be preserved in the same way other image pieces are preserved in their existing archives.

If a GIF animation is not enough, record a video to show the reader an idea of what your visualisation is. What is most important is to make sure that whatever content you have captured from the interactive visualisation, make sure it is also linked to the piece. It shouldn't be an afterthought. As you publish your article, make sure you have recorded it, created it, and put it on whatever platform you use to automatically or semi-automatically archive your work. It is also important to make sure there is a link.

What steps would you like to see for dynamic content to be archived in the medium term?

Bahareh: Our research suggests that in the near future, news organisations could implement some simple "if-then" code in their CMS. For instance, the code could say, "If this visualisation did not load, load this JPEG, GIF or video instead in this piece of content". If that's too complicated to maintain, there could be a caption underneath any interactive data visualisation saying, "If this did not load, access this version as a JPEG, GIF or video through this link". This would allow you to show your reader what was here. It is not perfect. You can't see everything, but you have something that could give an idea of what was there.

How can we build on what is already archived?

Simon: One option would be to have a YouTube channel or something simple that will still be around. Whatever we build, we want to build out redundancy as much as possible. We already have The Sigma Data Journalism Awards, which has three years of data journalism entries. That will build up over time. Before that, we had the Data Journalism Awards, where screenshots were submitted with each entry. The content is there; we need to bring it together.

Bahareh: If this is done at the time of publication, it would only take 30 minutes. But this could ensure the content remains for 10 to 50 years.

How do we convince journalists preservation of their work is important?

Simon: It is a serious issue. People don't realise it's a serious issue, which is part of the problem. The answer is not necessarily that something has to work. The answer has to be that we know that it was there, what it did, and how it worked.

One thing that journalists are all interested in is immortality. That's why people love seeing their byline. You do need some incentive for this to exist. It has to be low enough work that it is manageable for people. It would be best to have people responsible for it to get things started. Librarians exist for a reason. Their job is to do that. Almost every news organisation has a set of archivists or librarians who think about keeping stuff. But this is too difficult or is outside of their work.

What else could be done to encourage the industry to put data journalism archiving front and centre?

Simon: Data journalism needs something similar to a Rock and Roll Hall of Fame. We need to create something with enough prestige for people to want to be inducted into it. Congratulations! Your piece has been chosen to be archived in the hall of fame.

Bahareh: We could also have an exhibit where people could walk through and experience the dead or lost content. It might be hard to compile the old content so it can be put on display.

Finally, what are your parting thoughts on archiving and data journalism?

Simon: This conversation has made me think we need to discuss this again this year. I'm definitely going to apply myself to this. Archiving is only becoming more and more important. Legacy is important, and data journalism has this legacy of really talented people. And to see this stuff vanish -- some of the most innovative and exciting work in journalism right now -- it can't happen. We have to make sure it doesn't.

Bahareh: We need to act quickly and collaborate. There are a lot of people who are knowledgeable and interested in this topic. If we could bring these stakeholders together as soon as possible, we could do something about this. We could involve data journalists, news organisations, tech companies, third-party data visualisation companies, digital preservation organisations, and the academic and scientific community.

Latest from DataJournalism.com

How do you inject data storytelling into radio packages? Radio broadcast journalists can find this challenging given they only have a few minutes to tell the story. In our latest long read, Robert Benincasa, a computer-assisted reporting producer in NPR's Investigations Unit, provides detailed examples and treatments for merging audio and data journalism. Read the long read here.

Onwards!

Tara from the EJC data team,

bringing you DataJournalism.com supported by Google News Initiative and powered by the European Journalism Centre.

PS. Are you interested in supporting this newsletter or podcast? Get in touch to discuss sponsorship opportunities.

Inside the Uber Files

Tara Kelly — Wed, 26 Oct 2022 14:44:00 +0200

Welcome to the latest Conversations with Data newsletter brought to you by our sponsor FlokiNET.is, a web hosting company that was established in Iceland to provide safe harbour for freedom of speech, free press and whistle-blower projects. You can get 15% off FlokiNET's products and servers by using the promotion code DATAJOURNALISM.

Don't forget that all registered Datajournalism.com users have free web space and domains through FlokiNET.is.

Now on to the newsletter.

This episode focuses on the Uber Files, an investigation published in July 2022 that revealed how Uber flouted laws, duped police, exploited violence against drivers, and secretly lobbied governments during its global expansion.

First leaked to The Guardian, the Uber Files include a database of more than 124,000 files from 2013 to 2017 shared with the International Consortium of Investigative Journalists (ICIJ) and 42 other media outlets.

To understand how ICIJ navigated this large database, we spoke with data journalist Karrie Kehoe, and data & research editor Emilia Díaz-Struck.

Listen to the podcast on Spotify, SoundCloud, Apple Podcasts or Google Podcasts. Alternatively, read the edited Q&A with Karrie Kehoe and Emilia Díaz-Struck below.

What we asked

Tell us about the Uber Files.

Emilia: The Uber Files is an interesting leak covering more than 124,000 records between 2013 and 2017 at a time when Uber was expanding around the world. The communications include 83,000 emails showing the tactics Uber used to gain access to the markets around the world, to influence regulations, and to influence people. The files also reveal how Uber would deal with some cases tied to law enforcement in a couple of countries. Regionally speaking, most of the investigation was connected to Europe along with some connections with Asia, Latin America, and other parts of the world. Instead of examining the offshore financial system, ICIJ looked at how Uber influenced and lobbied governments to gain access to new markets.

How did the investigation begin?

Emilia: Troves of files were first leaked to The Guardian. After an initial exploration, they discovered the files were not only tied to the UK but also to other countries. The Guardian then reached out to ICIJ. We thought this could be an interesting project for our network. We started the investigation by reviewing which countries were part of this set of files and started involving partners in a lot of countries. This is when the magic happens. The power of collaboration allowed us to mine these files, and do the reporting.

There are so many angles for this story. How did you go about sifting through the information?

Karrie: First of all, I absolutely love this story because so many of ICIJ's investigations are told from the offshore service provider aspect, where we see what services are being provided for companies. But this is the first time, we got to the heart of the company and saw it behaving badly from the inside. The communications revealed how they spoke to each other and the culture around them.

Our technology team created this wonderful product called Datashare. The 124,000 files were all loaded into Datashare and then OCRed. Some were in English, and others were in French -- they built a translation tool to help us. The technology team created a file pack for the data so we can see what's in different folders and navigate it -- like moving through rooms and seeing the contents of each one. When we first get leaks like this, we sit down and read just like everybody else. While reading, we start looking for patterns and names to figure out what's going on. For instance, I remember putting in Macron's name, and we got two thousand hits.

Who were the stakeholders, and how did you navigate those?

In this investigation, we figured out who contacted who and the purpose of that communication. Our programmer extracted the email addresses, the names of the people behind those email addresses, and the domains. This helped us see where these stakeholders were from -- government, academia, politicians, think tanks, and citizen groups. Combining that information, we found over 1850 stakeholders in about 29 countries. Different parts of our team were looking at different angles. Karrie focused on lobbying, and our other colleague looked at academics. The next step was to figure out how to organise and connect this data and connect it to public records.

Tell us about how you examined and investigated the lobbying aspect of Uber.

Karrie: Lobbying is an opaque thing and can be such a black box. While lobbying could have a massive impact around the world, it's really difficult to see from the outside how it's actually done. The most interesting thing about this investigation was getting to see the heart of this huge lobbying machine. There were three different types of calendars in the files. After we extracted them all, we created a huge spreadsheet. Part of this relied on programming and part of involved just reading through the PDFs to make sure that we hadn't missed anything. We created this long list of meetings between uber executives and public officials. Everyone from a mayor up to a European commissioner to a vice president, if there was a meeting scheduled, we pulled it out.

Because meetings don't always happen, we spent a lot of time verifying this and looking for evidence. We went through all the correspondence, the text messages, and the emails, and looked for proof that Uber executives were in a room with that person. So that was the first part of it -- to make sure the meetings actually happened. The second part was to see if they were declared. The European Commission has a transparency register where you have to declare meetings. This is all publicly available data. After we verified all the meetings that happened, we then went through the public register and checked whether they had been declared or not. Overall, we found over a hundred meetings had happened that we could definitely confirm between 2013 and 2017.

What other interesting things do you find? How did you approach it?

Karrie: With this type of investigation, you want to start at the top. You want to try to find the prime ministers and other top figures. We saw that Uber did a lot of planning around Davos, the World Economic Forum in 2016. We had their spreadsheets and their planning docs, and we also went through all their correspondence and text messages over that time period. That was fascinating. We saw that there was an undeclared meeting between Joe Biden and Travis Kalanick, who was CEO at the time. There was another meeting with Enda Kenny, the former Irish prime minister. They also met with the president of Estonia and Benjamin Netanyahu, who was then the prime minister of Israel. They also had meetings with the prime ministers of Luxembourg, Norway and the Netherlands. The communications helped us see their relationships develop.

Emilia: In total, we had six world leaders in these communications with Uber. The files gave you a snapshot into seeing how Uber was planning to go around the world and expand. We also saw how Uber managed to thwart government regulations and law enforcement with a kill switch. This meant the authorities wouldn't be able to gather information on them.

What has been the impact of this investigation so far?

Emilia: We saw taxi drivers in Europe going out and demonstrating. There have also been demands for more investigations into lobbying efforts.

Karrie: It would be fantastic if they could harmonise lobbying regulations across Europe so that we could look and see what these multinationals are doing in multiple countries and see how their tactics are changing from country to country. I also hope that one of the impacts is going to be that you have more people like Mark MacGann coming forward. As head of Uber's lobbying for Europe, he sat down and thought that maybe he had done harm and wanted to rectify it several years later. He became a whistleblower and came forward and allowed us to do these compelling stories.

What advice do you have for journalists about to embark on a data-intensive investigation like the Uber Files?

Emilia: You need a lot of coffee, patience and a good playlist to help you through it. When you have 124,000 files, like the Uber Files, it is easy to feel overwhelmed by the data. The first thing is to put the mess in order and try to identify what this is about. Spend some time diving into the data. Use keywords to search and identify the key documents. Then you need to identify the stories and the journalistic questions you want to answer. Based on those questions, ask yourself what relevant data you have to work with. It is also important to understand what are the problems with this data. Next, you can come up with a plan and get started. I also advise journalists to devote a lot of time to fact-checking their investigation.

Finally, as data journalists, what is in your toolbox?

Karrie: We regularly use Excel and Google Sheets. I also use a bit of Python, Jupyter Notebook and pandas. I often structure information in my head and write it down.

Emilia: What matters is that you are developing a mindset critical for analysing the data. Balancing automation and manual work with different tools and skills is essential. One person may fact-check with different tools than another person. The magic happens when you combine multiple skills in a data journalism team.

Latest from DataJournalism.com

How can data journalism combat the spread of disinformation covering the Russia-Ukraine war? In our latest long read by Sherry Ricchiardi, she explores the digital tools to help journalists fight back. Read the article here.

Movies are a great source of inspiration when it comes to seeing how data impacts daily life. And for data journalists, entertainment can be an excellent way to learn without the number crunching bogging us down. On the heels of NICAR's mailing list post about this very subject, Andrea Abellan came up with her own favourite films to share with you. Read the blog here.

Onwards!

Tara from the EJC data team,

bringing you DataJournalism.com supported by Google News Initiative and powered by the European Journalism Centre.

PS. Are you interested in supporting this newsletter or podcast? Get in touch to discuss sponsorship opportunities.

Eye on data journalism in Iran

Tara Kelly — Wed, 12 Oct 2022 13:44:00 +0200

Welcome back to the latest Conversations with Data newsletter! This issue celebrates our 50th podcast episode. Since launching in February 2020, we've reached 35,000 listeners and ranked in Apple's top 200 for educational podcasts in dozens of countries around the world.

Now on to the newsletter!

Since the death of Mahsa Amini, a 22-year-old woman arrested in September by the Iranian morality police for allegedly not following the Islamic Republic’s strict dress code, the world’s attention has turned to Iran.

With internet shutdowns and limited access to freedom of information, what does investigative data reporting look for journalists covering Iran? To better understand this, we spoke to Marketa Hulpachova from Tehran Bureau, an independent investigative outlet focusing on data journalism.

As a data journalist, she talks to us about how Tehran Bureau approaches data journalism in a closed-information society like Iran. She explains how she uses publicly available company data to show networks and map patterns revealing corruption within the top ranks of the Iranian regime.

Listen to the podcast on Spotify, SoundCloud, Apple Podcasts or Google Podcasts. Alternatively, read the edited Q&A with the panel below.

What we asked

Could you tell us about Tehran Bureau?

We are an independent news platform focused on Iran, founded in 2008. I joined in 2009 when massive protests took place in Iran. I'm now head of investigations. Tehran Bureau quickly became a source of information for anyone who wanted English-language news on Iran that wasn't in the mainstream media. We had this unprecedented partnership, first with PBS/ Frontline and then with The Guardian. Since 2016, we've been on our own. Over the past five years, we have morphed into an investigative outlet focusing only on data journalism as it pertains to Iran's economy.

Our audience is anyone in the international community focused on Iran and the people inside the country. We publish both in Farsi and in English and seek to inform journalists interested in covering corruption, business and the economy in Iran. We also target internationally-based organisations that focus on Iran, especially in the field of corruption. This transparency issue, as it pertains to Iran, is a global issue. We see so many crossovers with networks of corruption that go far beyond Iran's borders.

What makes the current protests across Iran different from past civil unrest?

This is the first time that women's rights -- and women's rights alone -- have taken front and centre stage in these protests. I'm incredibly awed and proud of these very young women who are ripping off their headscarves and just risking everything to get some freedom, to take control of their narrative. They're doing it not only for all of us women who've been to Iran and worn mandatory hijab, but for anyone who's concerned with authoritarianism and how much it incurs on your personal rights.

What do you think is next for Iran?

No one can know what will happen if the whole thing falls apart. There's a lot of fear because there hasn't been any space given to a real conversation of what any democratic or slightly more democratic alternative for Iran would look like. So I would like to stick to what I can see the regime doing. One thing they're already doing is saying there may be some leeway on hijab. They've got incredible flexibility when they want to because all they're concerned with, I think, is staying in power. And they will do anything to do that because they have so many economic interests in doing so.

Another thing that is quietly happening is this transition towards networked authoritarianism or digital authoritarianism. The idea is to replace this much-hated morality police with artificial intelligence. This technology could pick people out of the crowd and show a person in violation of this dress code, for example. This is not very dissimilar to what you see happening in China with facial recognition software.

Tehran Bureau published a piece about building a surveillance state inside Iran using Chinese technology. Iran is not the only country where they're doing it, but it's one of the countries where they've received the most domestic support to build it as fast as possible. People are aware that if nothing changes inside Iran, this is where it's headed.

Speaking of digital authoritarianism, Iran has also experienced internet shutdowns during protests. What does this mean for your reporting as a data journalist?

During an internet shutdown, as data journalists, our biggest fear is the loss of information. When internet disruptions start happening, we start downloading everything we can. And it's a big concern for us. How do we obtain all these archives and ensure they don't get destroyed -- either in a shutdown or in the case of chaos or anarchy. We encourage everyone in the community working on Iran to do the same because you never know when it's all going to be lost. In the worst-case scenario, we want to ensure that this trove of information is at least partially preserved.

Could you talk to us about your approach to data journalism at Tehran Bureau?

The data that we use is associational and involves no numbers. We are showing networks about how individuals and companies are related to each other. The data we look at is reams and reams of business registration documents. And then, we start identifying patterns. One of the biggest patterns in the Iranian economy is that there is only one shareholder for hundreds of companies. This is not normal, but that's the case in Iran. The association charts to prove this are massive. The story of corruption in Iran is overwhelming and so brazen. The challenge is always to make this into smaller news stories. We also aim to find a way to follow the news cycle to resonate with a wider audience.

You uncovered the financial portfolio of Iran's supreme leader. Could you tell us how you went about this investigation?

As a data journalist, it can sometimes be difficult to lose yourself in the patterns you identify. You go down a rabbit hole and aren't sure where it will lead. Sometimes you discover a treasure trove. This story examines companies associated with the supreme leader's office -- a power vortex of all this corruption. But the way we began this story in the first place was that we noticed that certain companies had the same auditor and that auditor is listed in the business registration documents. After we found out who the auditor was, we realised that this was the official auditor of the supreme leader's office. And by identifying that detail, we could discover thousands of associated companies. The story debunks the supreme leader's claims, saying he has no links to the economy.

What is one overall challenge you face as a journalist when covering Iran?

One thing we have struggled with in the past, we cannot openly cover what we think the story is without sounding like we are too much pro-regime change and not willing enough to work with the regime on some sort of reform. That is not something we're hearing from the Iranian people right now. We are hearing that they are very sick and tired of what's going on. But in the international community, among media and civil society not based in Iran, there's a much more careful approach regarding what happens next. Because of that, even outside the country, there are certain red lines in what you can report on and what you can say without sounding like you're too much of an activist or too much pro-regime change. Then you get labelled as an opposition media outlet -- but what we're trying to do is very much be independent and objective. This is something we have struggled with in the past.

Finally, what's next for Tehran Bureau?

Resources permitting, I would like to uncover the international dimension of all these corrupt networks I've been talking about. We already know where to look. We have significant amounts of information on the topic. But one of the big challenges for us is legal liability. I'm sure you've heard other journalists talk about SLAPPS and other ways to put so much legal pressure on small news organisations, so they cannot proceed with an investigation. It can also mean shutting down altogether due to the huge amount of financial pressure. To continue the story, especially if we're going to start targeting individuals and entities based in places like Europe or the United States or anywhere outside of Iran, you're exposing yourself to more legal risk.

Latest from DataJournalism.com

How can data journalism combat the spread of misinformation covering the Russia-Ukraine war? In our latest long read by Sherry Ricchiardi, she explores the digital tools to help journalists fight back. Read the article here.

Onwards!

Tara from the EJC data team,

bringing you DataJournalism.com supported by Google News Initiative and powered by the European Journalism Centre.

PS. Are you interested in supporting this newsletter or podcast? Get in touch to discuss sponsorship opportunities.

The state of FOI in Europe

Tara Kelly — Thu, 29 Sep 2022 17:05:00 +0200

Welcome to the latest Conversations with Data newsletter!

The latest episode explores the state of freedom of information in Europe with a panel of journalists talking about their first-hand experience in the UK, Ireland and Romania.

We sat down with Jenna Corderoy, an investigative reporter at openDemocracy; Ken Foxe, a journalist and co-director of Right to Know; and Attila Biro, co-founder of Context Investigative Reporting Project Romania.

We examine the different challenges and experiences of requesting FOI and provide some advice for journalists new to the field.

Listen to the podcast on Spotify, SoundCloud, Apple Podcasts or Google Podcasts. Alternatively, read the edited Q&A with the panel below.

What we asked

Jenna, talk to us about your role at openDemocracy.

Jenna: I'm a reporter with openDemocracy's investigations team. openDemocracy is an independent global media organisation. We're known for reporting on money and influence in UK politics, but we also specialise in reporting about LGBTQ rights, the environment and policing. I specialise in getting documents under freedom of information laws and often appeal against refusals. There'll be a few times when I go to court and argue my case in front of a judge. But I also report on the state of freedom of information in the UK and how it is regularly undermined.

What do you think about the state of freedom of information in the UK?

Jenna: The fate of freedom of information in the UK has worried me for quite a long time now. In fact, at openDemocracy, we published two major reports detailing the problems we are currently facing. We found that less and less freedom of information requests are being granted in full by central government departments and that government departments frequently ignore or don't respond to them in a timely manner.

The Information Commissioner's Office, the information regulator here in the UK, is overworked and underfunded. That means when you want to make a complaint about a certain government department, when they have a public authority, you complain to the information commissioner's office, but they take a very, very long time to process these complaints. When you've got lengthy delays and public authorities refusing to provide information despite the public interest in disclosure being so significant and powerful, it undermines this fundamental right.

In your experience, how has the UK's FOI situation changed over the years?

Jenna: When I got into journalism about ten years ago, I thought freedom of information was fantastic. I would send a freedom of information request, and on the whole, I would be very, very successful in obtaining the information I wanted. I sought things like emails and communications within the government. It tended to be on time. But over the years, it has become significantly worse, and I still am quite unsure why that's the case. It's not just me that's complaining. It's lots of other people, too. It's a very serious problem here in the UK right now.

Ken, tell us about your role at Right To Know in Ireland.

Ken: I'm a freelance reporter, and I've worked as a journalist for over 20 years. I work with Right To Know, which campaigns for greater transparency in public life in Ireland. About 10-15 years ago, I began to specialise in the use of FOI and access to information in Ireland.

How does Ireland's FOI compare to the UK's?

Ken: I follow openDemocracy's work quite closely on access to information, and it seems the delays and difficulties around filing FOI requests in the UK aren't replicated in Ireland. I've had some experience making FOI requests in the UK, and I find it much more difficult. One of the annoying things about FOI is that sometimes when you're an expert in your own country, it doesn't necessarily translate into accessing records in another jurisdiction, as there can be subtle differences in how it works. So I suppose, in general, the picture isn't as bleak in Ireland, and we have a review of our FOI act underway.

Ireland has the exact FOI request timelines as the UK. It's 20 working days to deal with the request, but the ability to continually extend that -- as experienced in the UK -- doesn't happen here. You only have one chance to extend in Ireland, and if you don't answer, you can move on to the next stage, which is an internal review. They only have three weeks to deal with that, and if they don't deal with that in time, you can go straight to our Information Commissioner. So it is easier to push the process along in Ireland than in the UK.

What other FOI differences exist between Ireland and the UK?

Ken: One of the aspects of Irish FOI that makes it a little bit different to the UK is that you're able to ask questions. So it makes it a little bit easier for a new requester in the UK, whereas your request has to be for existing records in Ireland. It doesn't sound like a big difference, but it is because sometimes you might have a question on a record that might not yet exist that answers that question.

Another aspect that I suppose is challenging in Ireland is that we have a lot of public bodies that have only partial inclusion in our FOI. So again, in the UK, you can access a lot of information from the police, but in Ireland, you can only access information relating to finance, procurement and human resources. You can't get anything whatsoever related to operational policing in Ireland.

What are the challenges Irish journalists face with FOI?

Ken: One of the challenges is an inconsistency in decision-making. For instance, you could have one public body you deal with who does a terrific job by taking the requests very seriously. They phone you up and help you find the information. Or you can have the opposite experience where a public body literally will do everything in its power to make it difficult for you by invoking rarely used sections of the FOI act. While I have experience dealing with these tactics, I worry that members of the public don't know how to navigate this behaviour.

How much are Irish journalists using FOI in their reporting?

Ken: FOI in Ireland has probably become underused. Even ten years ago, I think a lot more journalists used it than they do now. That is not necessarily the fault of FOI. It's more about how newsrooms have changed and how much discretionary time reporters have. You need to be able to dedicate about five hours a week to it, even to keep up with the bureaucracy and the paperwork of it. I feel that Ireland journalists aren't being given that time anymore. It means many evergreen requests are repeatedly made for the same thing that works every year. And though it is interesting, it's certainly not bringing a huge degree of transparency to public life here in Ireland.

Tell us about your role at Context in Romania.

Attila: I'm an investigative reporter with Context. I've been a journalist working with FOI since 2004. We are a new investigative platform established a few months ago focusing on corruption and money laundering at the national and cross-border levels. We are part of the international network of the Organised Crime and Corruption Reporting Project, one of the world's biggest investigative journalism networks.

What does the FOI situation look like in Romania?

Attila: Romania has a nicely drafted Freedom of Information Act that came into effect in 2001. However, the reality of working with FOI in the country is a different story. Politicians and their minions have managed to curtail the law. They use secondary laws to ignore or block our FOI requests. There is no ombudsman for FOI in Romania. However, you can go to court, which is what we have done after the government rejected an information request regarding hospital data in Romania.

The law says Romania has 10 to 30 working days to fulfil your request. The provision usually takes five days to reject, but you normally hear back in 30 days. There is no punishment for public institutions if they don't reply to an FOI. We have noticed public institutions in Romania and Brussels use data protection legislation to weaponise it against journalists and block them from accessing data about EU funds and expenditures. They have taken a good piece of legislation that should protect people and instead are using it against them.

At Context, how do you approach FOI requests?

Attila: Usually, we request data for stories where we already have leads. However, government officials sometimes send us badly scanned printed-out Excel sheets where not all the information is shown. When that happens, that tells me I am on to a story because it seems they are trying to hide something.

What advice do you have for journalists new to FOI?

Jenna: My advice for journalists who are about to embark on such a project is to Google to see whether the information that you want is already out there. I would also say the more research you do before submitting a request, the more informed you become. This research can help you when drafting your FOI request.

If you send the same request to many institutions like universities here in the UK, you've got to get organised. Use a spreadsheet, and list when you sent your request and when you expect the results. Remember that you must constantly chase down requests when you don't get an answer within 20 working days. You can also complain to the information commissioner in the UK if you haven't heard back.

Ken: in Ireland, all public bodies are obligated to publish a list of all the FOI requests they receive every three months. I recommend journalists look at what people have previously requested. This will help you develop ideas and get into FOI's mindset. I would also recommend reading the FOI act in your country. Journalists need to familiarise themselves with the law and understand what records you can get and what you can't.

Attila: To combat the negative responses to your FOI request from the government, make sure you reference relevant legislation or previous answers from successfully submitted requests. I try to make it hard for the public official to refuse me the information by referencing these details. I would suggest you do your research and find out what they did in the past before drafting your FOI request.

Latest from DataJournalism.com

A journalist's portfolio is akin to one's pride. But what happens when broken links and defunct code riddle your life's work of interactive content? Professor Bahareh Heravi explores the possibilities for saving your content in her latest long-read article. Read the article here.

Onwards!

Tara from the EJC data team,

bringing you DataJournalism.com supported by Google News Initiative and powered by the European Journalism Centre.

PS. Are you interested in supporting this newsletter or podcast? Get in touch to discuss sponsorship opportunities.

Making sense of data collection during conflict

Tara Kelly — Wed, 24 Aug 2022 12:52:00 +0200

Welcome to the latest Conversations with Data newsletter!

Before we kick off today's issue, our colleagues from Structured Credit Investor are still hiring! They're looking for an excellent writer with a background in finance to join their firm as a reporter. You'll be writing real-time stories about the European and US CLO markets and tracking new issue news, deal pricing and secondary market trading activity. Interested in applying for this role? Be sure to send your cover letter and CV to the attention of Mark Pelham at [email protected].

Now on to the newsletter!

Since the beginning of the Russia-Ukraine War, journalists have witnessed, verified and reported on atrocities across the country, remotely and in the field. But for data journalists looking to delve deeper into the data and find new sources, understanding data collection and where it comes from is essential.

To help journalists meet the challenge, we spoke with guests from data journalism and the humanitarian field with experience using data to cover the war:

Peter Bodnar, data journalist at Texty.ua
Dada Lyndell, data journalist at The Insider
Karina Shedrofsky, head of research at OCCRP
Claudia Manili, senior data analyst from ACAPS.org

This conversation is an edited recording from our event in June 2022 with guest host Marianne Bouchart, the executive director of The Sigma Awards. Listen to the podcast on Spotify, SoundCloud, Apple Podcasts or Google Podcasts. Alternatively, read the edited Q&A with the panel below.

What we asked

Could you talk to us about the start of the war? How did you anticipate it beginning?

Dada: When the war began, I was still in Moscow. I didn't want to leave the country until the very end. Every day before the war started, we discussed at The Insider whether or not it would eventually begin. We collected and used data to analyse what was happening and spoke to our sources to determine if or when it would happen. We participated in the research made mainly by a Central Intelligence team, a group of Russian OSINT investigators. They told us that Russia had been bringing troops and weaponry to the borders since November 2021. On February 24th, it was only a matter of several hours before we understood it would begin then.

Peter: Our team at Texty.ua has been writing about Russia waging war against Ukraine for the last eight years. I was a part of it for the past five years. The full-scale invasion wasn't such a big surprise for us. We expected it one way or another. We started understanding that something was going to happen at the beginning of this year when first reports showed Russian troops concentrating around the Ukrainian borders. This was when we decided to create a project tracking the amount of military equipment in Russian military bases around the Ukrainian border.

What action did you take to protect your digital and physical security?

Dada: I always speak to people about digital and physical security because I'm kind of paranoid. I started working not only with the newsrooms that I know in Russia but with all the activists preoccupied with their digital security. This was the time to advise people about digital security, what to do with no Internet connection or if the police take our phones.

Several days after the war began, they made new Russian legislation preventing you from calling it a war. They also said you would be accused of state treason if you collected data about the Russian military. This meant that newsrooms that did investigative journalism, specifically from inside Russia, had to leave the country because it was very, very dangerous for us to stay there. As a result, most journalists in Russia had to leave as many were afraid for their lives and their freedom.

Peter: The start of the war was very messy and unexpected. Even though we had plans and understood that something like that could happen, not everything went according to plan. The first thing we did was move most of our critical digital infrastructure into the remote cloud because we had some computing and data storage on our local computers in our office. We also ensured the digital protection for our computers and accounts, access to databases, and everything we needed for our work, which is critical for us.

We also created a plan about how can our team members evacuate from here. We are working from Kyiv, the capital of Ukraine. We also had some backup plans on what to do and where to go in case the internet connection and mobile connection disappeared entirely. Thankfully, the internet connection was stable throughout the last four months.

Could you talk us through some examples of your work from the war?

Peter: We published an article tracking military bases around the Ukrainian border. We used satellites with active radar to identify the area covered by metal objects on the territory of Russian military bases around Ukrainian borders. We also published a piece describing what a battalion tactical group is -- a basic unit comprising the Russian army in this invasion. Overall, our work involves classifying, storing, and making information useful for the public. We also welcome our work to be reused by other teams.

Dada: At The Insider, the team created a YouTube channel from a small ad-hoc studio. One of our first videos went viral with millions of views. We felt this channel could help show what is happening in Ukraine given half of Russians don't believe this is a war.

Karina: At OCCRP, we have taken a different approach to the general news cycle for the Ukraine war. We have created the Russian Asset Tracker, a project to track down and catalogue the vast wealth held outside Russia by oligarchs and key figures close to Russian President Vladimir Putin. This is an ongoing project with a fact-checked database of these assets. We've recently added another section on reported assets that haven't necessarily been verified or fact-checked. More than 50 news organisations from all over the world have participated and will continue to participate in this project.

Could you tell us more about the Russian Asset Tracker? How does this project mirror what OCCRP has done in the past?

Karina: The Russian Asset Tracker is similar to other work OCCRP has done in the past. For instance, following the money, getting public records around the world and trying to uncover assets that are quite difficult to uncover. One distinction is the sense of urgency we all faced after Russia invaded Ukraine. Within two weeks, we managed to uncover billions of dollars of assets, working endless hours, barely sleeping, to get this information out. In terms of all of the other leaks that have been released since the war began, we don't know the agenda of who is actually leaking this information. We are trying to go through this with that lens and decide what is worth looking into and covering. And it's just vast, vast amounts of data.

Tell us about ACAPS.org and its datasets covering Ukraine.

Claudia: ACAPS.org comprises analysts and data collectors active in the humanitarian sector. Our scope covers natural disaster response or conflicts. What we try to do is to inform the humanitarian community about what is happening in specific situations. We have teams around the world that are focussing on crises. We have one team in Ukraine and one focusing on global analysis. I am a senior data analyst of our global team, but I was helping with data collection for the Ukraine data sets. We have been collecting data about Ukraine since late March. At that moment, we decided to build two data sets -- one covering the civilian infrastructure damages and the other covering humanitarian access events. For the data on civilian infrastructure damages, what we track is pretty clear -- any damages related to civilian infrastructure. Meanwhile, the humanitarian access data set is more related to how humanitarian actors can access a specific place.

What are the missing data narratives that the media hasn't yet covered?

Peter: In my opinion, the humanitarian impact of the war hasn't been covered. We aren't showing the impact on a local basis in specific Ukrainian cities. We aren't seeing stories explaining the problems with access to fresh water, infrastructure damage and how the daily life of ordinary people has changed during the war.

Claudia: One of the narratives missing for us is how humanitarian access improves. It is quite easy to report on the damaged infrastructure of a bridge if destroyed, but it is not so easy to track when things have improved.

Finally, what tips and advice can you give data journalists new to covering this conflict with data?

Dada: Be sure to triple-check everything and always check where your data source has come from. Many newsrooms still don't know how to use satellite imagery and need to learn how to do this and verify videos and images using OSINT skills. It has also become very difficult to maintain objectivity. Many people report on stories that involve picking out certain information to confirm a particular point of view. Always try to be objective if you consider yourself a journalist.

Peter: Try to familiarise yourself with local, historical and cultural contexts when covering any conflict. This can help your data storytelling and ensure your reporting is relevant and doesn't misrepresent what is happening on the ground.

With summer here, there’s no better time to close the laptop and catch up on your reading. Our data journalism team has cherry-picked a collection of books that will refresh your expertise and reignite your passion for the field. Read the blog here.

Latest from DataJournalism.com

A journalist's portfolio is akin to one's pride. But what happens when broken links and defunct code riddle your life's work of interactive content? This is the dilemma news organisations and journalists face thanks to the rise of data journalism -- where complex interactive storytelling and dynamic data visualisations rely on distributed digital infrastructures. Professor Bahareh Heravi explores the possibilities for saving your content in her latest long read article. Read the article here.

Onwards!

Tara from the EJC data team,

bringing you DataJournalism.com supported by Google News Initiative and powered by the European Journalism Centre.

PS. Are you interested in supporting this newsletter or podcast? Get in touch to discuss sponsorship opportunities.

Innovative storytelling for war coverage

Tara Kelly — Wed, 10 Aug 2022 11:58:00 +0200

Welcome to the latest Conversations with Data newsletter!

Before we kick off today's issue, our colleagues from Structured Credit Investor are hiring! They're looking for an excellent writer with a background in finance to join their firm as a reporter. You'll be writing real-time stories about the European and US CLO markets and tracking new issue news, deal pricing and secondary market trading activity. Interested? Send a cover letter and CV to Mark Pelham at [email protected]

Now on to the newsletter!

Since the beginning of the Russia-Ukraine War, journalists have tried to inform people about the region's geography and historical context, along with the latest developments in the field. Though audiences have shown deep concern for the war, its implications and civilians impacted by the invasion, interest has begun to wane. So how can journalists use this moment to spur innovation, fight war fatigue amongst readers and bring about new forms of storytelling?

To explore this topic further, we sat down with Gianluca Mezzofiore from CNN and Sam Joiner from The Financial Times. We explore how newsrooms innovate and develop fresh visual storytelling in their war reporting through their expertise.

This conversation is an edited recording from our event in June 2022. Listen to the podcast on Spotify, SoundCloud, Apple Podcasts or Google Podcasts. Alternatively, read the edited Q&A with Gianluca Mezzofiore and Sam Joiner below.

CNN has a long track record of war reporting. How is this conflict different from others the network has covered?

Gianluca: As you said, CNN has a long track record of war reporting. The network has been covering wars for decades. As digital investigators, we stand on the shoulders of some great war reporters. CNN's mission is to "Go There". It's in our DNA, and the war in Ukraine is no exception. I think the Russian-Ukrainian War has marked a shift.

CNN had a lot of teams on the ground since the beginning of the war, even before the start of the war. There was even a team in Russia, on the border with Ukraine, when the rockets were being fired on February 24th. But with this war, the magnitude of open source videos and images is unprecedented. There's no comparison to Syria. Many people rightly asked why we were seeing so much open source coverage of the war in Ukraine as opposed to Syria. But the reality is we haven't seen that many videos before during the war. This is something experts have noted along with CNN staff internally.

How did your open source investigative work evolve when covering this war in Ukraine?

Gianluca: As open source investigators, our work began even before the war. We were examining TikTok videos showing Russian troops amassing on the Ukraine-Russia border. This showed that Russia was serious about the invasion. After the war began, we spent 24/7 verifying and geolocating incidents in videos. We were also debunking false flag operations by Russia to justify the war. The worst of these videos were sabotage and missiles being fired. CNN also had a lot of reporters in different cities of Ukraine. The intelligence we were getting from open source was unimaginable. Open source investigation really helped the network's Ukraine coverage. That marked a significant shift in how we covered this war at CNN.

How has the Ukraine open source investigation work shaped mainstream media?

Gianluca: Before Ukraine, the social discovery work was a side to the key focus of the coverage. It has become the central point for each story coming out of Ukraine. This war changed the attitude of heritage media in covering the war. Almost every news organisation has to deal with open source and this kind of coverage. For open source investigators, this work is nothing new. For instance, organisations like Bellingcat have been doing this since 2016.

What challenges did you encounter at The Financial Times when this war first broke out?

Sam: The rolling nature of the conflict doesn't lend itself to the kind of visual journalism we would like to produce. We usually spend a week to two weeks turning pieces around; sometimes, it can be significantly longer. My initial challenge was trying to work out how we would contribute to The Financial Times' coverage of it. We usually rely on maps to base our reporting on those immediate day-to-day ways of telling a war and visually communicating what's happening. My challenge was to think about how we could tap into it and tell a bigger story within the rolling conflict. And initially, that wasn't easy. But the longer the war went on, I started listening to more people talk about the second phase of the conflict and the fact that Putin's initial plan had failed. That allowed us to assess what we called phase one of the war. We had to deliver that story quite quickly and published a piece focusing on part one of the war within two weeks.

Could you talk to us more about the Financial Times story focusing on the first phase of the war?

Sam: The story aimed to explain visually how Russia's mistakes in Ukraine, alongside the Ukrainian resistance, contributed to the war stalling. We published this about three weeks into the conflict when it became clear that Putin's approach and attempt to take Kyiv had not worked. We based the whole story on a map. Our readers' understanding of Ukraine and its geography wasn't great then, but it developed as the conflict unfolded.

The piece takes readers in and shows the territorial gains made by the Russians. We show that they were targeting densely populated areas. We also show the vital urban regions targeted around Ukraine at the start of the war. In addition, we also did open source investigation work to show footage from the ground. The piece's overall narrative focused on this first part of the war and showed how the Ukrainian resistance and Russia's mistakes had completely changed the course of the war. And it wasn't the war that many thought would unfold. We interviewed many military analysts, former four-star NATO generals, and people on the ground to explain what was happening.

How do you innovate with your storytelling when covering such sensitive topics?

Gianluca: As Sam noted, the rolling nature of the conflict made it quite hard to innovate. Before Ukraine, I worked on a lot of long-term projects that spanned one to two months that were quite a high production value. For instance, I worked on investigating the conflict in Ethiopia. However, our Ukraine coverage has been quite compressed. For example, for the Mariupol maternity hospital attack piece, we worked with Forensic Architecture for one week, night and day. This is an example of what you can achieve in a short amount of time.

Sam: Our work involves trying to apply visual storytelling techniques to what is the live and current story. Our visual storytelling team has been in existence for less than a year. That meant this was an opportunity to apply these techniques to the biggest story of the moment. Often visual storytelling teams are great at telling the story you aren't expecting to read. Still, when something of this magnitude happens, you're suddenly thrust into the limelight, and you're able to use visuals to tell stories in ways that are the biggest stories. For instance, this piece on Russia's mistakes was one of the most read stories of the year for the Financial Times.

Due to the graphic nature of some of the visuals presented in our work, we decided to introduce a "highly sensitive toggle option" at the FT. This toggle option allows readers to see still images instead of the full video content.

What are the best practices that you follow within your team?

Gianluca: When you have a fast-moving breaking news story like this, you are tempted to deliver quickly and maybe cut corners. However, I'm pretty rigorous with our team about verification. We need to verify the facts and ensure this video is located where it says it is. The amount of misinformation on this war is absolutely staggering. On both sides of the conflict, we continue to see a lot of claims and propaganda. It doesn't matter who the source is. What matters is the metadata.

Secondly, it is essential to communicate what OSINT is to people unfamiliar with this reporting. You need to be fair and kind and explain how this can contribute to incredible reporting. It's also important to emphasise how this kind of intel can help journalists on the ground and further their reporting in the field.

Sam: From my perspective, dividing the work and having specific roles on a project helps. If you give people specific tasks they need to focus on, you don't make mistakes or cut out the chance of making them. The best practice is to speak to experts. So we're experts in the OSINT techniques, but not actually in the subject. And speaking to experts is vital. You need to talk to them all the time and then speak to them again. And people are keen to help. You don't have to know everything.

Finally, what OSINT or visual tools do you recommend?

Gianluca: I recommend Bellingcat's recently updated spreadsheet that lists many open source tools. First Draft (now the Information Futures Lab) also has two toolkits -- one is a basic beginner's guide to open source intelligence, and the other is more advanced. I would also recommend Benjamin Strick's YouTube playlist of geolocation and OSINT tutorials.

Sam: Mapbox has been useful for us. It's a brilliant tool for intuitive mapping and works brilliantly across devices. It's very smooth, powerful and very customisable. Figma is a design tool we use a lot at the FT. You can storyboard what you want to do. This is very helpful when figuring out what story you want to tell.

The critical reporting techniques you need for this visual storytelling are not that dissimilar to the ones you need for great reporting. Visuals can't carry the content alone. The content has to be good enough on its own, and you have to do the reporting. Make sure you are speaking to the right people for the story you are trying to tell.

Latest from DataJournalism.com

Onwards!

Tara from the EJC data team,

bringing you DataJournalism.com supported by Google News Initiative and powered by the European Journalism Centre.

PS. Are you interested in supporting this newsletter or podcast? Get in touch to discuss sponsorship opportunities.

Navigating the Russia-Ukraine War with OSINT technologies

Tara Kelly — Wed, 20 Jul 2022 12:35:00 +0200

Welcome to the latest Conversations with Data newsletter brought to you by our sponsor FlokiNET.is, a web hosting company that was established in Iceland to provide safe harbour for freedom of speech, free press and whistle-blower projects.

Since the beginning of the Russia-Ukraine War, propaganda, misinformation, disinformation, and alleged war crimes have been rapidly shared online and across social platforms.

To understand how journalists are navigating this information war, we spoke with Eoghan Macguire, an editor at Bellingcat, Hayley Willis, a visual investigations reporter at The New York Times and François D'Astier, a journalist with AFP's digital verification team.

The panel explains how OSINT tools and techniques are used to verify and investigate the conflict, from using satellite imagery to weapons analysis and geolocation data.

Listen to the podcast on Spotify, SoundCloud, Apple Podcasts or Google Podcasts. Alternatively, read the edited Q&A with Eoghan Macguire, Hayley Willis and François D'Astier below. This conversation is an edited recording from our event in June 2022.

What we asked

What are the biggest challenges you've encountered when verifying information about this war in Ukraine?

Eoghan: I find it is very tricky to verify instances of potential civilian harm when they take place in rural areas with very few identifiable features. In most instances, if you have a strike in a city or a town where you do have identifiable features, you can do a geolocation or chronolocation investigation. After that, the tricky part becomes assigning responsibility. You can combine OSINT reporting and on-the-ground reporting to verify this. Of course, it is also possible to verify incidents with just OSINT, but it takes a lot longer.

Hayley: I would say the volume of content in this conflict coming is completely overwhelming. You're in hundreds of Telegram channels with hundreds of posts being sent a minute. The content people can dig into for this war is endless. Keeping up with that is a challenge. Adding on to that, there was another conflict in Ukraine back in 2014. This can make it difficult to verify what is from then and now. Deciding what is the most relevant and what is old is the biggest challenge.

François: The speed at which videos are being posted on Telegram makes it difficult to verify who shot a video or image. It's also hard to decipher who first posted the content. This is happening at a much faster rate than a few years before. For instance, back in 2019 most of AFP's verification work involved investigating content from Facebook and Twitter. You could find people more easily and understand the source of content much more quickly. The other challenge is things can take a lot of time, especially if you need to speak to a expert. For instance, it can take time to wait to hear back from a weapons expert on whether a missile is Russian or Ukrainian when verifying a strike. This can slow down your timeline in verifying content.

How do you manage your mental health when being exposed to violent content?

François: One of the hard parts when working in verification is to learn to disassociate and look after your mental health. Some of the images and videos are pretty rough to look at. And we aren't even on the ground, so you can imagine what reporters in the field are facing. It can be difficult to watch hours of disturbing footage, but this is useful and important work. That means you have to learn the coping mechanisms to handle this.

OSINT experts often have to use translation tools or rely on people with hard to find language skills. How do you go about that?

François: I definitely did not learn Russian. For very basic claims of information, I use translation tools. However, when it comes to verifying very specific signs or more complicated claims, I would go to AFP's crisis cell and ask them for more in-depth translation help.

Hayley: Google Translate is an open source investigators best friend. Most journalists have a specific beat or region they cover and that means they probably speak the language. I always say for visual investigations that our beat is visual evidence. That means we cover a variety of regions and topics, so we don't always speak the language. The New York Times is also lucky enough to have a Ukrainian journalist on staff. Of course, there are other reporters throughout the world who speak numerous languages, too.

Eoghan: At Bellingcat we have quite a few Russian and Ukrainian speakers who we can turn to. I don't speak Russian either, so Google Translate is also my best friend.

As an OSINT expert, what are your go-to tools for verification?

Eoghan: State of mind is more important than learning any one particular tool for this kind of visual investigation work. You need obsessive attention to detail and the ability to dig in and solve a problem. You also need to have the skills to sift through social media -- whether it is wading through hundreds of Telegram channels or searching for content on Twitter.

What satellite imagery tools do you use?

Eoghan: At Bellingcat, we subscribe to Planet, a paid subscription that provides daily satellite imagery. I also encourage people to use NASA Firms Data. This can help you understand whether a fire took place at a certain time or place and helps with verifying a video about a particular strike. These tools can help you dig into more detail and verify a claim further.

Hayley: At The New York Times, we work with Planet and Maxar, which are paid service providers for satellite data. However, there is publicly available satellite imagery as well. For instance, Google Earth, Satellites.Pro, Sentinel Hub and Google Street View.

What other tools do you use?

Hayley: For searching on Telegram, I use Telegago. You can search for private groups and channels. For any investigation involving chronolocation and geolocation, a simple spreadsheet is essential. I used Google Sheets to help keep track of what we are investigating.

Eoghan: One other tool I rely on is called SunCalc. It helps you with chronolocation or geolocation. It can help you figure out when something happened or get you closer to understanding when something happened.

François: For verifying old images, we use Google Image Reverse Search. We also use InVID for verifying videos.

How are misinformation and disinformation evolving on social media when it comes to the war in Ukraine?

François: When it comes to misinformation and disinformation spreading online, I find that it starts on Telegram and then sifts through Twitter and mostly ends up on Facebook. You will see screenshots from Telegram groups on Twitter and Facebook mentioning these false videos. AFP also has a team on the ground in Ukraine who are collecting testimony and who can assist us with verification.

Hayley: Telegram is definitely where we are finding most of our content for verification. However, often we have found misinformation and disinformation first gets shared in very private groups and then spreads to Telegram groups. For instance, someone shares a video with a friend on WhatsApp and then that gets forwarded to another friend and so on. Later this will end up on Telegram and other social media channels.

Eoghan: We also notice Telegram has primarily been the platform for Bellingcat when covering this war. In addition to Twitter and Facebook, we've also seen content being posted on VK, the Russian version of Facebook. Another platform is TikTok. Bellingcat has been working with the Centre for Information Resilience, and all of our civilian harm data is on their Russia Ukraine Monitor Map. Leading up to the conflict, they were finding a lot of troop movements within Russia on TikTok.

Finally, how important is methodology with the work you do?

Eoghan: I would say transparency is baked into the DNA of Bellingcat. We're all about showing the method. We'll always try to be open with that and explain the methodology so people can retrace it. We aim to be fully transparent. That usually leads to very long articles, but I think that's useful in terms of providing a public service to people.

François: We're trying to explain step by step what we did for the investigation as much as we can. Hopefully, we have open sources or at least identifiable sources. The audience, in the best-case scenario, should be able to redo the exercise and arrive at the same conclusion. I do agree that sometimes it makes for a reading that's less artistic than the usual AFP report, but it still is necessary.

Hayley: I agree with everything that was said. I would also say in terms of methodology and transparency, something we're very careful of being very transparent about is what we don't know. In a conflict, it's practically impossible to know everything. There are so many facets to what's going on. This is particularly the case for our article verifying the Bucha massacre in Ukraine in March. We don't know how they were killed and by whom. What we do know is that Russia's statement that none of these people was killed while they were there is false. More investigation is needed on who did this.

Latest from DataJournalism.com

Onwards!

Tara from the EJC data team,

bringing you DataJournalism.com supported by Google News Initiative and powered by the European Journalism Centre.

PS. Are you interested in supporting this newsletter or podcast? Get in touch to discuss sponsorship opportunities.

The rise of data journalism in Asia

Tara Kelly — Wed, 29 Jun 2022 14:31:00 +0200

Welcome back to the Conversations with Data newsletter!

In this issue, we travel to Asia with Adolfo Arranz, senior graphics editor from Reuters Graphics and Pei Ying Loh, co-founder of The Kontinentalist. The week's episode explores how data journalism is booming in Asia.

We hear about the pair's data storytelling process and the importance of working with partner organisations connected to data and local communities. They also provide an overview of data accessibility in the region and highlight some of the most notable work today, bringing China, Laos and Singapore to the forefront of the discussion.

Listen to the podcast on Spotify, SoundCloud, Apple Podcasts or Google Podcasts. Alternatively, read the edited Q&A with Adolfo Arranz and Pei Ying Loh below.

What we asked

Set the scene for us in Asia. How vibrant is the uptake of data journalism throughout the region?

Adolfo: Data journalism wasn't that popular here in Asia three or four years ago. Suddenly, it is booming and a big trend. The uptake is now happening throughout newsrooms.

Pei Ying: Four years ago, the field was nascent when I started in data journalism. Only a couple of people did this kind of work that we could look up to and admire. And definitely, the South China Morning Post was one of those. But I think increasingly, especially in the last two years, data is the new kid on the block. Maybe that is because of COVID-19. It seems everybody is very interested in doing data storytelling now. A few years ago, very few regional data resources existed that I could turn to. In our universities, no one taught any form of data journalism or data storytelling at all. But now, it is being taught at polytechnical schools and universities, which is great.

Could you talk to us about how open data is in your respective countries?

Adolfo: This is something that has changed a lot recently. When I came to Hong Kong several years ago, collecting data was very difficult. In my case, the Chinese language made it even harder. This has changed a lot in the past few years. However, the main problem with China is that the data is not consistent.

Pei Ying: Singapore's government is probably one of the few in the region that collects data so obsessively about nearly everything in our lives. We know that data scarcity is not a problem. Instead, it's more about how accessible that data is and how often it gets opened up. We don't have any sort of open information laws or access to information or freedom of information acts in Singapore. It is up to the government's discretion what they choose to make public. And oftentimes, when they do, it is a summarised PDF highlighting one singular trend. And even when they do will release it in a workable format (CSV file); it's not granular enough for us to make a proper investigation.

Where can you get data for your stories if Singapore has no Freedom of Information Acts?

Pei Ying: People tend to turn to more citizen-based or grassroots-type data sources. This is why I think there are some organisations like Open Development Mekong that are so important for journalists to work with. We've worked with them on one of our latest data stories. These development organisations approach other organisations and academics to try and open up datasets for the public to use. I think it's essential that organisations like this exist to help not just journalists but everyone in general.

Many of us have heard of Reuters Graphics and the South China Morning Post, but not necessarily The Kontinentalist. Could you tell us more?

Pei Ying: The Kontinentalist's mission is to bring Asia to the forefront of global conversations. We are a data storytelling studio. In addition to journalism, we also do client work, although that's not hosted on our publication. We find that in the media, discussions around Asian topics are sometimes a little problematic. It's always used with a certain lens, for example, dictatorship, authoritarianism, natural disasters, poverty, etc. We wanted to talk about Asia in slightly more empowering terms and exercise decolonisation in the concepts and words we use.

At the start, we produced very straightforward journalism with no calls to action at the end of our pieces. However, two years ago, our team decided to be cause-driven in the content we create by focusing on social justice issues, climate change, and cultural and societal issues. We now always end up with a call to action encouraging people to support this particular initiative or investigate the topic further. We often also partner with a lot of these non-profit cause-driven organisations like UNHCR, Doctors Without Borders and most recently, Oxfam.

Pei Ying, The Kontinentalist published a story explaining how Laos turned to intense infrastructural development and foreign direct investments to accelerate economic growth. Could you tell us about the data storytelling process for this article?

Pei Ying: Our partner Open Development Mekong, who we've worked with in the past, pitched this story to us about foreign direct investment in Laos. Open Development Mekong is dedicated to opening up information and access to information in the Mekong countries. This is a "follow the money" type story with the aim to understand in a very micro way, not just which countries are investing in Laos, but who is investing. For instance, what organisations and companies are getting involved in these infrastructural developments in Laos.

The process for this story was incredibly tedious and took nearly a year. The major component was compiling the data. We had to identify many major data sources that already had their own repositories for tracking infrastructural development and foreign investment in the region. Even though a handful of organisations do this very well, they all collected the data slightly differently. They have different categories and sometimes spell project names differently. We had to compile all of this and clean it. Next, we compiled a list of companies named in these projects and began researching them. We searched for information like where they were founded, who owns these companies and who the stakeholders are. This allowed us to develop a complete picture of who's connected to what.

In the end, designing it and writing was 1% of the time invested in this project. But we're pleased with the outcome. We used the survey template in Flourish. That resulted in a network graph, which is one of the main pieces of data visualisations in that story. Laos' landscape was one of the key pieces in the story we wanted to show. It's a very mountainous country with lots of flowing rivers which explains why it focuses on major hydropower development investment.

Adolfo, you previously worked at the South China Morning Post (SCMP) and now have joined Thomson Reuters' Graphics team. Let's talk about a visual piece you did at the SCMP before you left.

Adolfo: Life in Hong Kong's Shoebox Housing is a project we began last August. It took almost a year to complete. The most important part of this project involved doing field work and visiting these shoebox apartments to experience how people were living. After a few visits, we collected photos, videos and many notes. Afterwards, we had several team meetings and came up with as many ideas as we could. It wasn't easy, because we had to balance the amount of information and feeling with building a narrative and articulating a story. It was a very exciting project because the topic was very interesting. By using illustrations, we gave some drama to the story and it allowed us to explain the problem better. We were thinking of using video, but we decided illustrations were a better digital medium and more of SCMP's style.

What coding skills do you have?

Adolfo: I have minimal coding skills. The team at SCMP is mostly graphic designers with artistic backgrounds. As I said, we are known for our illustrations in our visual storytelling.

Pei Ying: To be a data journalist, you don't need to know code. I personally don't write any code. In terms of tools, I use Google Sheets, Flourish and Figma.

What advice do you have for people entering data journalism?

Adolfo: I think it's necessary to be very interested in the field and to find a passion for visual data. It would help if you also had a passion for telling stories too. You need to be very curious and passionate. Otherwise, I think it will be challenging because it is a lot of work. And, of course, you need to enjoy the job.

Pei Ying: I think passion is a prerequisite because it's a long process that is also frustrating at the same time. But it's also rewarding. One piece of advice I would give is to be bold about experimenting and do a lot of practice. You can watch all the videos on DataJournalism.com, but nothing can teach you better than applying these concepts in real life. And even if you don't work for a newspaper, start with your own data. Journalism is a field that requires a lot of brainstorming and iteration. Sometimes we must be ready to give up our ideas and not be as emotionally invested in them when they get killed. Iterate, experiment, and get feedback.

Finally, what are some bold predictions you have for the industry?

Adolfo: For me, it is not easy to predict the future. With rampant technology, however, I believe we will see new narratives for journalism. Maybe we will see the uptake of virtual reality -- or not. Who knows? Sometimes these things can be a trend and then fizzle out after a few years. I think the future of visual journalism and data visualisation is to do more simple and digestible things.

Pei Ying: When I started doing data journalism, the big thing was scrollytelling. This is still a big thing for data journalism, but it's nice that there's been a clear shift to do less and deliver shorter articles. This is about making the visuals more meaningful. We see that with The New York Times and their mobile-friendly pieces. We probably will see more machine learning and even AI to inform the data collection or the data design of a piece. The Pudding is already doing this with machine learning-type stories where you can interact with the bot, or you can interact with the tools that they've built in and generate something for yourself. We probably will see more journalists harnessing the power of these technologies to gather data or use it to render a piece of data to get more insights.

Latest from DataJournalism.com

Data journalists get their ideas in various ways — from questions and tip-offs to news events and data releases. But if you're new to the field, sometimes finding inspiration can be a challenge. If you're looking for data journalism ideas, read Paul Bradshaw's guide to generating them — and the types of stories they might produce. Read the full article here.

Did you miss "Covering the Russia-Ukraine War With Data" half-day event with DataJournalism.com and The Sigma Awards? Watch the entire conference here. You can also look out for upcoming podcasts based on the 3 sessions covering data journalism from the Russia-Ukraine War, OSINT technologies and innovative storytelling for conflict reporting.

Onwards!

Tara from the EJC data team,

bringing you DataJournalism.com supported by Google News Initiative and powered by the European Journalism Centre.

PS. Are you interested in supporting this newsletter or podcast? Get in touch to discuss sponsorship opportunities.

Breaking into data journalism

Tara Kelly — Wed, 01 Jun 2022 11:39:00 +0200

Welcome back to our Conversations with Data newsletter! After a long hiatus, we are excited to be back with a new podcast episode.

The latest Conversations with Data podcast features Paul Bradshaw, a professor from Birmingham City University, Michelle McGhee, a journalist-engineer from The Pudding and Carmen Aguilar Garcia, a data journalist from Sky News. This episode is from a Discord live chat held in May 2022, where we explored how to break into data journalism.

Drawing on the panel's varied experience, the trio provided helpful advice and learning resources for those fresh out of university, in web development or journalism, moving laterally into the field.

Spotify, SoundCloud, Apple Podcasts or Google Podcasts. Alternatively, read the edited Q&A with Paul Bradshaw, Michelle McGhee and Carmen Aguilar Garcia below.

What we asked

Paul tells us how the data journalism educational experience has evolved over the years.

Paul: Data journalism is much more popular than it has ever been. For a period of time, a lot of journalism students perhaps thought data journalism skills were nice to have, but they'll be okay without them. I think that's changed in the last few years. And it's become, in my case, more popular than other courses and more international. It's also been interesting to see it pass across different parts of the world. I've had students coming from South America and Europe in previous years. This year, I've got many students interested who are from Asia.

Paul, talk to us about how some of your students have moved into data journalism roles. What do these people seem to have in common?

Paul: In terms of the students who work in the industry, it is very difficult to pick out a common feature. We have students who come with journalism experience, and that helps. We also have students who come with no journalism experience but have a technical background in web development. We've also had students do well coming straight out of university. So it's challenging to pin down a particular quality because I think that the variety of jobs and organisations hiring is so wide It's not just news organisations telling stories with data. You've got charities and data visualisation studios as well. But one thing I would say is that a mix of technical skills and editorial skills certainly helps. It's not just about doing something technically, but having ideas and being able to communicate those well.

Michelle, you've got a technical background. How did your university background tie into a career in data journalism?

Michelle: I guess I'm slightly puzzled by it because I had no idea that this was the career I wanted when I was in university. I started doing things that interested me and meeting people I thought were interesting. I believe that always pays off no matter what -- that's especially the case in this field where people come from many different backgrounds.

Carmen, tell us about your move into the data journalism industry.

Carmen: I started my career as a TV reporter, and I always thought that that was what I wanted to pursue. But when I moved to Chile, I jumped into the digital world. While working for a 24-hour news station, I worked on a data project for the local elections. My editor and I took an online course, and then we built a dashboard and did feature stories about the different aspects of local government in Chile.

I enjoyed the journey, but it was also challenging because I didn't have the right data skills. When I finished that project, I knew this was the kind of journalism I wanted to do. So I started taking online courses, and during my third online course, I told myself it was time to enrol in a master's for my career. And that's why I came to Birmingham City University to study with Paul Bradshaw. I got my job at Sky News while finishing my master's project.

Paul, You teach the Data Journalism MA programme at Birmingham City University. What advice do you have for students studying data journalism? How can they make the most of it?

Paul: My advice is to take advantage of the opportunity you have that you won't get in a newsroom. When you're studying, you certainly make more mistakes. You can experiment more. And actually, employers will be interested in that because they don't have a chance to take as many risks as you might take on your course. Don't be afraid to take risks. One of the things to always remember about a good course is that your mistakes will be part of what you learn and what you talk about in interviews.

The other important thing is to work on projects that will showcase what you can do. Build a portfolio, choose projects carefully in terms of how they will develop you, and build contacts, knowledge and skills. Be mindful of how these projects will show something that will really grab the attention of people. That's something you won't necessarily often get a chance to do.

Michelle, you visualised every spell in the Harry Potter books and put it on GitHub. How did you use that project to help you get a job at The Pudding, where you currently work as an engineer-journalist?

Michelle: I was a software engineer and was thinking about transitioning careers because that career path wasn't entirely aligned with my interests. I discovered the world of data journalism through several publications, including The Pudding. I set out to emulate what I thought were cool and interesting topics to me. The Harry Potter piece was a learning project that I was using to replicate various things that I had seen that I thought were interesting. I worked on this piece for a month or so, and I learned a lot of things.

At that time, I was a patron of The Pudding through Patreon. The Pudding has a Slack channel for all the people who are patrons. I reached out to a member of The Pudding team at the time, and I just asked for feedback. I asked how they thought I could make it better. It felt a little nerve-wracking at the time because I admired them. But that's also why I was interested in what they thought of it. The person I sent it to gave me some feedback and was nice and generous.

They also made it clear that I should pitch a story to The Pudding. This led me to later pitched a story to this person. I worked with them, and they offered me a full-time job a year and a half later. I look at that moment as one where I was able to experiment and create a project that led me to directly reach out to someone I admired and form a connection. That is why I have my current job, and I'm grateful for that.

Michelle, how did your software development experience help you transition into data journalism?

Michelle: I was coming into the field with a pretty strong foundation, having studied computer science in university. However, almost none of it involved web development. I had taken one course covering HTML, CSS, and JavaScript. When I started, I had to try to learn more web-based data storytelling techniques and the basics of web development. While I had some foundation in this, I needed to strengthen it. This was especially the case for learning about D3 and DataViz in web development. Luckily, there are so many resources that are great for self-teaching. Amelia Wattenberger makes some of my favourite blog posts and tutorials on DataViz, D3 and web development tools. In the early days, I had those open tabs all the time. This took a lot of practice and time to get comfortable with.

Carmen, tell us about starting up a data team at Sky News? How did the Master's programme help you navigate the newsroom?

Carmen: When I joined Sky News, I was the first data journalist working there. So I had a big challenge because I needed to show them what data journalism is and how it adds value to Sky News' traditional reporting. So one of the things I based my strategy on was collaboration. This meant working with other reporters, designers and developers to find and uncover new stories where it might be difficult for them to do it without me. By working together, I showed we could publish impactful stories.

Another element that played a big role was visualisation. In broadcasting, editors want visual elements. I ensured my data visualisations meant we could also create more visual stories across the newsroom. Apart from collaboration and finding new stories, I spent time training journalists, designers and developers in the newsroom. The design team now regularly works with Datawrapper and Flourish. I'm very proud now that the team is not just only me, and we work across the newsroom with broadcast, digital and social media.

What advice do you have for pitching data story ideas and managing expectations when you approach an editor?

Paul: The most frequent thing I say to my students is to tell me in one sentence what is happening. Don't tell me what you're going to do. The editor cares about what is going to come at the end. What's the result? What's the story, and why does it matter? Emphasise who is affected and what's the human dimension of your story. It might be technically impressive and exciting as a process, but will you have something that someone in the street will be surprised by at the end of that process? Is it going to be something that tells us something new? Does it shine a spotlight on important issues that tell us the scale of a problem or that something is getting worse or better? Keep it simple. Tell the editor who is doing what and what is happening.

Carmen: After the pandemic, I would say that editors are more interested in data stories than ever before. The challenge is managing their expectations with the time and resources needed to publish the story. I don't have a magic formula for that, but one thing I have learned from working with developers is to try to budget more time than you think you will need. That is not always possible if it is a reactive story. I suggest adding a few more days to the deadline for stories with a longer lead time.

Michelle: Based on my experience reviewing pitches to The Pudding, we sometimes receive pitches where people haven't prototyped their idea yet. Or perhaps they haven't done the initial data analysis to see if this is actually interesting. Of course, some stories can take lots of time and need buy-in before you invest in them. But I like to see freelancers show us the low hanging data exploration and check if there's data available. There are clear steps to help prove the point and bolster the pitch even more. My advice is not to wait until you get approval to get your hands dirty in the low hanging fruit.

Name one tool you could not live without for your data journalism work.

Michelle: I could not live without the Svelte JavaScript framework. Our team at The Pudding has recently fallen in love with this. It is a lot more readable, and it has a lot of excellent features for DataViz.

Carmen: I regularly use Excel and Python daily. But there isn't just one tool I'd choose, as it depends on the story. Each story is different and requires specific tools. For me, the story determines my tool of choice.

Paul: I'd say it's pretty difficult to think of a tool that I couldn't live without. It's difficult to do a story without using spreadsheets at all. OpenRefine is very useful for data cleaning. OutWit Hub is very useful for scraping. If I didn't have R, I would use Python. If I didn't have Python, I'd use JavaScript. There's always another way of achieving the same kind of results. But the one tool I think you really need is a telephone. You need to speak to people to determine the human impact of those numbers and why they matter.

What approach do you take to interviewing the data for a story?

Paul: I would always recommend treating data as a source the same way you would treat a human source. Data can be biased in the same way it can be collected for a particular purpose. I would be just as sceptical of the data as you are of humans. When you interview your data, it might tell you something, but ultimately you're relying on something being measured in the first place. There are many blind spots in data, which reflects power imbalances in society. I would speak to people involved in that particular sector, not just experts, but people who work in that field and who are responsible for that field. Try and get an overarching view of that system.

Finally, what are some of the best resources for those new to learning data journalism?

Michelle: The Pudding has a resource page if you are interested in making web interactives similar to what we publish. The page includes many videos and tutorials showing you how it is done.

Paul: My advice is to pick the area you're interested in and focus on that. If you're interested in visualisation, I would read The Wall Street Journal Guide to Infographics. If you're interested in the analysis, Jonathan Stray has written The Curious Journalist's Guide to Data. Darrell Huff's book How to Lie With Statistics is a classic. I also recommend listening to the More or Less: Behind the Stats podcast. I've also written a few books about data journalism, which, hopefully, people will find helpful.

Latest from DataJournalism.com

If you have been wondering whether you should learn to program for data journalism, this latest blog post by Simona Bisiani explains what to know for those curious about R or Python. This blog provides a valuable comparison between the two languages for those who've already learned one of them. Read the full blog here.

It's no secret that data journalism has had to adapt and change in recent years. To get a better sense of the field's current state, we launched The State of Data Journalism Survey 2021. This involved asking journalists about their work, tools, and thoughts on the future of data journalism. Read the full blog here or watch our panel discussion at International Journalism Festival in Perugia in April 2022.

Onwards!

Tara from the EJC data team,

bringing you DataJournalism.com supported by Google News Initiative and powered by the European Journalism Centre.

PS. Are you interested in supporting this newsletter or podcast? Get in touch to discuss sponsorship opportunities.

Inside Outlier Conference 2022

Tara Kelly — Wed, 02 Feb 2022 15:52:00 +0100

Welcome back to our Conversations with Data newsletter!

In our latest episode, Mollie Pettit from Data Visualization Society talks to us about using human-centred design for her work.

Drawing on her circuitous career path from geology to data science, data visualisation and developer relations, she provides learning resources and useful advice for those new to the field.

We also hear about her favourite data projects and her inside take on this week's Outlier Conference happening 4-5 February.

Listen to the entire podcast on Spotify, SoundCloud, Apple Podcasts or Google Podcasts. Alternatively, read the edited Q&A with Mollie Pettit below.

What we asked

How did your career path begin in data visualisation begin?

I have a very circuitous career path. I studied Mathematics and Geology, and then I worked in geology for a bit. That experience helped me realise this wasn't the career path for me. I decided to lean into my mathematics background by completing a data science bootcamp. That's how I transitioned into the data science field. I got a data scientist job at Data Scope Analytics, where I worked for a couple of years. That's where my love for human-centred design began. I learned how to pull various design thinking exercises into keeping humans at the centre of decisions. I also discovered my love for data visualisation in this role, which inspired me to learn JavaScript and D3. I eventually went freelance to focus on data visualisation.

While working as a freelancer remotely, I wasn't meeting other people in the field. This led me to start a data visualisation community group. I later co-founded the Data Visualization Society, along with Amy Cesal and Elijah Meeks. Next, I began at Netflix as a senior data visualisation engineer, where I worked on data-viz heavy web apps for internal decision-making purposes.

In 2021, I started a new role as a data visualisation developer at Observable, where I later transitioned to become the developer relations manager. The role allowed me to combine my development and community building skills. At the moment, I am currently organising the upcoming Outlier Conference as Data Visualization Society's events director and have done so since it began.

You call yourself a human-centred designer. Could you give us a brief overview of the concept?

Human-centred design is a great phrase. It's about designing with the humans, the viewers, the users at the centre of the thinking. It is rooted in empathy and about understanding the needs of who you're designing for. Human-centred design is about seeking to understand what people say and do, and how they think and feel. We can try to do our best to anticipate and make guesses about what people want. We all do this naturally by filling in the gaps and making assumptions. You might get some things right, but you will likely get plenty of things wrong.

This is where ideation, prototyping and iteration come in. So the first step involves ideating on the ways that you accomplish your goal. You can also flesh this out with user interviews and observations. Next, you choose a direction and build a low-fi version of something. You put it in front of someone and see how they use it and what they get from it. It is essential not to give them hints or clues. Instead, ask them to think aloud and say what they see or notice. It is vital to watch how they interact with it. This helps you to understand if you're going in the right direction -- is it confusing or not intuitive? But it can also tell you things that you've done right.

Who is the Data Visualization Society for? Are data journalists part of this community?

Data Visualization Society is a place where all data practitioners can intersect and connect. This includes data journalists, data designers, data visualisation engineers, data scientists, and anyone who uses visualisation in their work. The main driver for starting the Data Visualization Society is that we found that there were a lot of data visualisation communities, but they were very disparate and separated. We wanted to create one where people could connect across those fields. So, yes, it is a home for data journalists and many others.

Tell us about the upcoming Outlier Conference happening this week.

The goals of Outlier Conference have always been to create a space where attendees can make connections, inspire others and learn from others, all the while keeping accessibility and inclusion at the heart of these like planning decisions. This year the conference is happening on February 4th and 5th. We have an exciting lineup of curated speaker sessions and unconference sessions where attendees create the agenda themselves.

The conference aims to find ways for people to connect. In addition to attendees chatting in Slack, we also plan to have speed meeting networking sessions to help you make new connections and friends. The agenda goes back and forth between curated content with the speaker lineup and the unconference sessions. Last year we had talks, workshops, discussions, panels, games, or even virtual karaoke sessions. A range of ticket prices is available to accommodate everyone from the community.

Tell us about a favourite data project you worked on.

I'm most proud of the Illinois Traffic Stops project that I did with the ACLU. This is because it was part of an effort that led to actual policy changes. The purpose of the site was twofold. First of all, the site aimed to serve as a resource for the public to learn about law enforcement practices. Secondly, it provided a tool for law enforcement agencies to make informed improvements around racial disparities for the good of their officers and the people they serve. The site did not mean to be a finger-pointing device. Instead, its purpose was to help law enforcement think about making things better at their agency. Overall, the data showed there are some law enforcement agencies in which minority drivers are treated significantly differently showing some racial bias.

You learned D3 and also taught it. What advice and resources can you recommend for data journalists looking to do the same?

The first resource I used to learn the building blocks of D3 was Scott Murray's book Interactive Data Visualisation for the Web. I can't recommend it enough. It might not necessarily go into all of the fancy things you can do, but it sets the stage nicely for you. The next thing to do is pick out a project that you'd be excited to visualise with D3. You'll likely keep learning through that process. If you want to do something super specific, you can often find an example of someone who's already done that and shared it online. I'd also add that D3 is a very powerful tool, and some data journalists use it expertly. However, every data journalist doesn't necessarily need to learn.

What other tools can you recommend for data journalists who don't have the time or interest to learn D3?

D3 allows you to create super custom visualisations But not everyone needs th.at all the time. It takes a long to make things with D3, so it's not good at the exploration side of things. For people who want to get more into data visualisation but don't need to learn something so advanced, I recommend giving Observable Plot a try. It is a free, open-source JavaScript library to help you quickly visualise tabular data. Michael Bostock created D3, and he also created Observable Plot last year. The whole point of Observable Plot is you can use it anywhere where you use JavaScript. It allows you to create visualisation very simply with just a line of code.

Finally, who do you admire most in the data visualisation field?

There are so many impressive people in the data visualisation world. Gabrielle Merite creates really cool, unique projects. She brings the human back into the data. Duncan Geere and Miriam Quick have done some inspiring data sonification and data visualisation work. From a functional approach, I like Ian Johnson and Mike Freeman. I admire the elegant and functional ways that they tackle visualisation problems.

Alberto Cairo is always inspiring and I love reading his books. Jer Thorp is another person I follow and I'm very excited for his talk at Outlier this year. I find Jessica Hullman and Matthew Kay's work on researching uncertainty to be so valuable.

And finally, Nadieh Bremer and Shirley Wu are two designers who I admire for their creative ways of visualising data. I'm missing plenty of awesome people, but these are the names that first come to mind.

Latest from DataJournalism.com

How can we use data to create stories that resonate with audiences and make a difference? In our latest long read, Sherry Ricchiardi explains what we can do to humanise data and give a voice to the people behind the datasets. Read the full article here.

There's no question that the future of data journalism is bright. But where did it all begin? Professor and veteran journalist Brant Houston provides a historical look at the field from CBS News' attempt to predict a US presidential election up until today. Read the full article here.

Apply for Google News Initiative's latest fellowship

Applications are open for the Google News Initiative Student Fellowship 2022! It offers paid summer placements to journalism, technology, multimedia and design students as well as recent graduates across Europe who want to gain valuable work experience. Among the 30 participating newsrooms, 11 are looking for candidates to work on data-related projects. The application deadline is 15 March 2022, 23:59 CET. Find out more here.

As always, don't forget to let us know who you would like us to feature in our future editions. You can also read all of our past editions here or subscribe to the newsletter here.

Onwards!

Tara from the EJC data team,

bringing you DataJournalism.com, supported by Google News Initiative.

PS. Are you interested in supporting this newsletter or podcast? Get in touch to discuss sponsorship opportunities.

The history of data journalism

Tara Kelly — Wed, 12 Jan 2022 14:56:00 +0100

Welcome to the first Conversations with Data newsletter of 2022!

There's no question that the future of data journalism is bright. But where did it all begin? Veteran journalists Brant Houston and Stephen Doig share their firsthand experience of working in the early years of data journalism.

The pair identify the key data journalism stories and books that shaped the industry. We also hear what is next for the field and some helpful advice on making it in data journalism.

Professor Brant Houston holds is the Knight Foundation Chair in Investigative and Enterprise Reporting at the University of Illinois. He is also the editor of the online newsroom at Illinois, CU-CitizenAccess.org.

Stephen Doig is a journalist, professor of journalism at Arizona State University, and a consultant to print and broadcast news media concerning data analysis investigative work.

You can listen to the entire podcast on Spotify, SoundCloud, Apple Podcasts or Google Podcasts. Alternatively, read the edited Q&A with Brant Houston and Stephen Doig below.

What we asked

Talk to us about how you two met.

Brant: Back in the 80s, I was working on a story that required me to use Statistical Software. I called up Steve to ask him a question about it. Later we met up at conferences and spoke. There were very few people doing data journalism at the time. It was a lot like rock n'roll. However, thanks to NICAR and the Investigative Reporters and Editors network, there was a very collaborative and cooperative feeling among fellow journalists working with data. It was always word of mouth because you didn't know who was working on what.

Brant, you wrote a piece for DataJournalism.com on the history of data journalism. Give us a brief overview.

Brant: Someone from The CBS Network got a bright idea that there were computers, and if you fed in a bunch of data into the computer, you might be able to predict who won that election that year before all the results came in. From what I could tell from my research, they had a pretty good idea of who won. It was Eisenhower. But they froze. They were like, "We're not sure if we're right. This is taking a big risk." That was what I call the false start.

If you jump ahead 15 years, you get up to reporter Philip Meyer really taking apart the assumptions of what caused a race riot in Detroit. Local Detroit folks said it was all these outside people who came in. But it turned out no, they had problems within the city and the people rioting were actually people who lived there. Another start was Philip Meyer's book on precision journalism. That's when things really started rolling.

What are some of the most defining pieces that shaped data journalism?

Brant: Philip Meyer's Detroit riot story was remarkable. Elliot Jaspin's piece on the school bus drivers and criminals also moved the industry forward. I would also include Steve's 1992 story on Hurricane Andrew. Through mapping, it showed how lax zoning, inspection and building codes had contributed to the destruction. The colour of money on mortgage discrimination done by the Atlanta Journal-Constitution was also a breakthrough moment because the field was getting recognised for what it could bring to investigative journalism.

Steve: The Atlanta Journal-Constitution mortgage story was a Pulitzer-Prize winner that had a huge data element to it. It made editors sit up and pay attention. It also helped catch the attention of all those other investigative reporters out there who were used to looking at documents, shoe-leather reporting and interviewing. They saw data as another tool they needed to learn.

How do you believe COVID-19 has impacted data journalism?

Brant: I think the pandemic has helped people know the revolution is here and that it's occurred. I think it's been a great marketing campaign for why journalists should use data. It's completely global now, and when you suddenly get a global pandemic, everyone can see all these journalists are using data. It may encourage many others to use data, but I think it's been waiting for somebody to advertise the power and importance of this reporting. And here it is. It's data every day, a visualisation every day.

Steve: I would say 20 years before the pandemic came along, we were already at the point in data journalism that any serious newsroom had people like Brant or me doing some of this. In the 80s and 90s, innovation in data journalism was happening in mid-sized metros like the Miami Herald and the Atlanta Journal-Constitution.

National newspapers like The New York Times were much slower. When they needed to do a data-led project, they would hire a consultant. But at some point, The New York Times looked around and said, "Hmmm. We need to get some of that." So they cherry-picked from great teams all around the country and brought them all in. They were instantly able to put together a fabulous data journalism team. I think the pandemic has raised the visual visibility of data journalism, not only within the newsroom but also to the consumers.

Both of you are professors who teach data journalism and have worked in the field. What advice do you have for those starting out?

Steve: The thing I try to get across to students is don't feel like you have to have mastered it all to start doing it. Get out. Pick an easy thing like a daily story where all you need to do is put something in a different order. And if you have learnt how to get your list into excel, you hit the sort button, all of a sudden you started doing journalism. You moved it from the boring alphabetical order into the best on the top and worst on the bottom. Suddenly, there is journalism at each end of it. My advice is don't be intimidated by it. You'll grow additional skills when you need them. You don't have to master it all to get started.

Brant: One of the approaches I have is to start with the story. Journalists are so busy and hyper. They want things to be applied as quickly as possible. So Steve's suggestion that you put something into columns and rows and put it in a different order to see the story is a great idea. Some journalists love numbers, and they're going to find lots of stories. But for someone who's very busy, my suggestion is to find a story with a database and see if it helps you.

What technologies are you most excited about?

Steve: I think the tools will continue to proliferate to deal with problems. A tool to deliver better text analysis still hasn't been created yet. We'll be able to essentially read a long text and find deeper patterns in it. Transcriptions and translation software have both improved. Another technology I'm excited about is virtual reality. In science fiction movies, you see a person sorting through data that's floating in front of him. I think that we may start seeing some of these patterns once virtual reality matures. Another is augmented reality where a reporter is out in the field wearing glasses. Somehow they are able to access additional information about the thing covered right there and then, and readily available.

Brant: Steve is right about the need for better software when moving unstructured data to structured data. Another thing that is happening in the industry is artificial intelligence. We're getting more sensible about how much we can do or not do with it. It's going to get better. Throwing algorithms at vast amounts of data to help us see possible patterns will help. Of course, we still have to have a journalist with brains looking at what comes through. But I think we will see a lot happening with A.I. that will aid and abet us. The other part that will be exciting, but maybe difficult, is how we translate a lot of what we're doing to the starting point on a mobile phone or mobile device. The question is how do you present data on a smaller screen in a more effective way? And how do you make that data even more interactive so that people can be informed by the data?

What other innovative tools are moving forward in the field?

Brant: There's a lot of innovation in the presentation of data, but I've always been fascinated with how you gather the news -- using the data, using these tools. I think we'll see a lot more integration with people in the field going and feeding back into a database and the database feeding back to them too. This is something that happens in the utility industry all the time where there's the mothership back at headquarters and the utility guys are here and they're doing stuff there. I can see speeding that process up while reporters are in the field.

Steve: That brings up sensor journalism, the idea of having access to live data that is being gathered -- not just CCTV every 10 steps in London. I'm talking about weather sensors, all the little devices that are scattered around your city that are capturing the air quality, the temperature and the noise levels. It will become even more important to be able to pull that data in and produce useful information for finding or telling stories. The technology is cheap enough now for newsrooms to use their own sensors and handle the data collection. This will become easier to measure over time.

Latest from DataJournalism.com

How can we use data to create stories that resonate with audiences and make a difference? In our latest long read, Sherry Ricchiardi explains what we can do to humanise data and give a voice to the people behind the datasets. Read the full article here.

There's no question that the future of data journalism is bright. But where did it all begin? Professor and veteran journalist Brant Houston provides a historical look at the field from CBS News' attempt to predict a US presidential election up until today. Read the full article here.

As always, don't forget to let us know who you would like us to feature in our future editions. You can also read all of our past editions here or subscribe to the newsletter here.

Onwards!

Tara from the EJC data team,

bringing you DataJournalism.com, supported by Google News Initiative.

PS. Are you interested in supporting this newsletter or podcast? Get in touch to discuss sponsorship opportunities.

Uncovering systemic inequality with data

Tara Kelly — Wed, 08 Dec 2021 07:30:00 +0100

Welcome to the latest Conversations with Data newsletter!

Have you had a chance to take our State of Data Journalism 2021 Survey? You can win some fantastic prizes like a trip to Perugia for the IJF conference or Amazon gift cards. Take the survey in Arabic, English, Italian, and Spanish. The findings will be shared with all survey respondents.

Now on to the podcast!

Using data to investigate systemic inequalities is a powerful way for journalists to tell stories with an impact. In this week's issue, we caught up with Sinduja Rangarajan, Bloomberg's senior investigative data reporter, to discuss her experience in covering a range of topics, including immigration issues in the United States. She talks to us about the importance of community and the power of bringing your lived experience to work.

You can listen to the entire podcast on Spotify, SoundCloud, Apple Podcasts or Google Podcasts. Alternatively, read the edited Q&A with Sinduja Rangarajan below.

What we asked

Tell us about your career path into data journalism. How did that come about?

When I decided to be a journalist, I already had some background in data in another profession. But my primary reason for being a journalist was to write about systemic inequities and to write stories that mattered the most, particularly thinking about the idea of community. Data was just a way to tell those stories in a more powerful way. I had these skills already, which made it easy for me to pivot.

I started as a data reporter at Reveal and did data journalism at Mother Jones. I have approached my reporting to uncover systemic inequities using data and grounding it with human stories. I've done investigations around disparities in the tech workforce. I've also written stories about immigration and inequities that resurfaced during the pandemic. Most of it is driven by data, but it has a lot of narrative components to it as well.

Let's hone in on your immigration reporting. You covered a story looking at the rejection of H-1B visas for highly skilled immigrants in the United States during the Trump administration. Tell us about how that story came about.

Journalism is such a community-driven profession. Fundamentally it's about telling stories about what's happening around you. In my case, it just happened that I'm an immigrant. My husband's an immigrant here in the United States, and he is on an H-1B visa, a short term visa for highly skilled immigrants. The H-1B visa is a painful and annoying application process. You have to renew those visas, and they don't come with much long term stability. But at the same time, the renewal would happen.

Many of our family and friends were also on H-1B visas, and I was seeing things and hearing things that I'd never heard before. People who had stayed forever in the United States on those visas and held the same job were packing up their bags within three days because their visas wouldn't get renewed. This came completely out of the blue for them. I began looking into this more thoroughly.

What did your investigation reveal?

Former President Trump had enacted a lot of policies using memos. He and his political advisor Stephen Miller started cracking down on H-1B visas. Their political reasoning was that these people were taking away American jobs, and they were not really high skilled. But my investigation found that they were turning away a lot of really highly qualified candidates on H-1B visas. These were people with PhDs from Stanford University or master's degrees from elite universities.

I determined that by building my own database. I knew that a lot of people were filing lawsuits. Many were asking for appeals to the United States Citizenship and Immigration Services Appeals Board when their applications were getting denied. By looking at those appeals and lawsuits, I got a sense of who was getting rejected and how many lawsuits were being filed. Historically, that was at a high. The data also told me that people were getting denied, and they would go and appeal to that board, and their own internal board would reverse the decision at a historic rate. And if you filed a lawsuit, the judge would say they have a legitimate case.

In most cases, they would get the approval. This all pointed to the fact that these memos and the rules that Trump had created were misapplying the law. They were designed to track and crack down on immigrants. He was obviously doing it on the border, but he was also doing it with high skilled immigrants -- and this was the missing piece of the jigsaw puzzle my story focused on.

You also wrote another story for Mother Jones about three Afghan families who managed to win the diversity green card visa lottery to the United States as American troops were pulling out of Afghanistan? How did that story come about?

I knew about the closure of American consulates and visas overseas due to the pandemic. As a result, many different types of visas were delayed or not processed. When President Biden announced pulling forces from Afghanistan, and news broke over the Taliban taking over, I began to wonder what this meant for the people who won the green card lottery visa in Afghanistan. Then I realised that these people from Afghanistan who won the lottery weren't going to come to the United States. They were in an extremely difficult situation, especially because of the news the Taliban were taking over.

The rest of it was finding the source. I reached out to a few of my past sources. One attorney put me in touch with a few Afghani families. I spoke with them through a translator, and I wrote the story. A lot of this immigration data is public. At that point, it was just a matter of digging and finding the relevant numbers and context for the story. Next, I pulled those numbers from the U.S. State Department. I also worked with an attorney filing a lawsuit who knew the numbers.

How does this lived experience help your reporting?

Bringing your own personal background to work can be really helpful. I'm not an immigration reporter. That's not my beat, even. However, I uncovered this story because of what was happening in the community. I noticed most reporters in the country didn't pay attention to this issue because there was so much going on with immigration at the time. There are some terrific immigration reporters in the United States. I've noticed that some of the people who do the best immigration work understand that particular community well and have some connection with that community.

An example that comes to mind is Aura Bogado at Reveal, a former colleague of mine. The empathy and the humanity that she brings to her stories means she can challenge many supposedly well-accepted myths about certain people and then point out that that's not true. I think it's very similar to the way I approached the H-1B visas and another story I did on H-1B doctors during the pandemic. Those kinds of stories come about because nobody needs to explain the context or background to you. You are already connected to that community. That means you can understand their position when they make an argument. You can build trust with your sources very easily because they believe you will be fair to them and their life story.

As a data journalist, what new skills are you most interested in learning next?

I would love to learn Ruby on Rails or Django. I want to skill up with my JavaScript and CSS because those languages have changed significantly. I used to write code in JavaScript and build tiny apps and use CSS. I've built websites as side projects when I was in school, but things have changed so much. It's so much more sophisticated, and therefore, a whole other challenge is just staying on top of things. You have to pick and choose what you want to keep on top of and lean on others for everything else.

Would you say you spend most of your time working with data analysis?

Yes, absolutely. Data analysis ties very well with investigative reporting. It supports and is the foundation for a project or an investigative story. You analyse the data. You have the findings. You talk about systemic inequity in a particular way. Then you find characters impacted by the systemic inequity that the data has already borne out. Finally, you write the story. Because I've chosen investigative reporting as one of my paths, I've also stuck with like data analysis.

What advice do you have for budding data journalists?

There are different ways to do data journalism and carve out a path for yourself. What's worked for me is leaning into a platform's strengths, especially when you're in the earlier parts of your career. There's so much energy, enthusiasm and little power in the organisation you're working in. I love doing data analysis, but I was pushed to do my own stories to get out of my comfort zone because that's what was needed to get my stories to the finish line. I couldn't wait for another reporter to finish their story and come and work on the data analysis of my story. So then I learnt writing and investigative reporting. At Mother Jones, I got to do personal essays and do different kinds of stories on a deadline. I learnt how to do things quickly.

I think every platform has its strengths, and at every stage in your career, you would probably have to get out of your comfort zone. My advice would be to go with the flow and try to get out of your comfort zone and get as many skills as you can. You can either be a super-specialist, and if life's taking you there, great for you. But if you are in a place that allows you to do different kinds of stories or report and write along with data analysis, then that's a different path as well. And that's the path that I've taken that's working for me.

Latest from DataJournalism.com

Machine learning can help journalists analyse massive datasets and pinpoint misclassifications for their investigative reporting. Contributor Monika Sengul-Jones explains how with a series of in-depth case studies from Buzzfeed News, Grist, ICIJ and more. The piece also includes a visual explainer on machine learning in journalism. Read the full article here.

DataJournalism.com recently launched The State of Data Journalism 2021 Survey. We want to hear from you! Take the survey in Arabic, English, Italian and Spanish. Participants can win Amazon gift cards or a trip to Perugia, to attend the International Journalism Festival. Fill out the full survey today!

As always, don't forget to let us know who you would like us to feature in our future editions. You can also read all of our past editions here or subscribe to the newsletter here.

Onwards!

Tara from the EJC data team,

bringing you DataJournalism.com, supported by Google News Initiative.

PS. Are you interested in supporting this newsletter or podcast? Get in touch to discuss sponsorship opportunities.

Vaccinating Europe's undocumented: A policy scorecard

Tara Kelly — Thu, 25 Nov 2021 15:00:00 +0100

Welcome to the latest Conversations with Data newsletter!

What are the challenges in vaccinating undocumented people in Europe, and how do country policies differ? This is a critical issue Netherlands-based investigative outlet Lighthouse Reports aimed to examine in its latest cross-border investigation.

The crowdsourced open data project involved data journalist Eva Constantaras and data scientist Htet Aung developing a policy scorecard by working with researchers and crowdsourcing data from 18 European countries. What's more -- data journalists and immigration reporters covering these countries can still take part in the investigation.

You can listen to the entire podcast on Spotify, SoundCloud, Apple Podcasts or Google Podcasts. Alternatively, read the edited Q&A with Eva and Htet below.

What we asked

What inspired Lighthouse Reports to examine the vaccination policies for Europe's undocumented people?

One of the original missions of Lighthouse Reports was to improve the quality of migration reporting in Europe. Our reporters in our network had surfaced this issue that kept coming up: Politicians across Europe realised that anti-immigration policies were undermining public health and social services overall -- but they also made for really good politics. Anti-immigrant rhetoric was really popular, and it was playing well in countries experiencing a rise of populism.

But at the same time, these governments realised that denying healthcare to undocumented people is a pretty bad policy. What our reporters were asking us was how was this playing out in real life? Are governments actually taking care of the healthcare needs of undocumented people or not? That was the kernel of the investigation, and that was our starting point.

How difficult was obtaining this data, and how did this tie in with the policy scorecard?

I think I have a longer list of the data we could not collect than what we were able to collect. So, for example, governments make it very difficult even to estimate the number of undocumented people within the borders of any specific country. Because we don't know how many undocumented people there are, we also don't know how many undocumented people are getting vaccinated. We don't know how many undocumented people are being hospitalised, nor how many have died during the pandemic.

Knowing that those were kind of our limitations on the data that we wouldn't be able to collect, our question was what data are we able to collect that would inform migration reporting, and that would be meaningful to newsrooms? That's what led us to this policy scorecard idea. So what policies are governments proposing in terms of the COVID-19 vaccination, and are undocumented people included in those policies or not?

How did you establish the criteria for the scorecard?

To figure out what should be included in the scorecard, we sat down with PICUM, the Platform for International Cooperation on Undocumented Migrants. This is an umbrella organisation that advocates for undocumented people. We mainly spoke with their advocacy officer Alyna Smith. We wanted to find the barriers to undocumented people getting vaccinated, and how do we measure those barriers? They put us in touch with service providers in about five different countries to have these initial conversations. On the ground, when an undocumented person wants to get vaccinated, what could get in their way? And how could policies address those? That's how we came up with these categories.

Talk to us more about these scorecard categories.

The first question is quite obvious: Are there even any policies in place? Is there transparency around a country's vaccine policy? Our starting point was can we find this information out or not? Our second big question was, are undocumented people included in these policies or not? We found out from our interviews that in many countries they don't even mention how undocumented people are going to be treated. Instead, service providers and undocumented people have to read between the lines to determine if they can get vaccinated.

That's how we came up with other categories. If you think about it practically, if you're an undocumented person and you can't get vaccinated in a country, it might be because they require a national ID when you register. If you don't have a national ID, you can't register. It might include things like are there ways to register if you don't have internet access? Can you just make a phone call? That would be a big barrier. So that's how we came up with the marginalised access category. And then finally, are there guarantees that if you do get vaccinated that you won't get deported or reported to the authorities?

Tell us more about the process of running the investigation. Who did you work with and how did you work together?

I spoke with Paul Bradshaw, who ran Help Me Investigate through Birmingham University. He's done a lot of crowdsourcing of data projects. I also spoke with a data journalist in Argentina, who has done groundbreaking data investigations using crowdsourcing for data collection. After speaking with them both, I designed the project based on their experience and their advice.

We had one researcher for each European country. Once we recruited our base of volunteer researchers, we had an online orientation session where we explained the overall objectives of the project. We took them through the structure of the scorecard, so our different categories and our different questions under the categories. Using Google Sheets, we gathered all of these documents, all the references, and they identified which of the documents were going to help them answer the questions in our research survey.

Then we brought everybody together again for a scoring sprint. We had a Google Form that our data scientist Htet helped us build. Once all the researchers were finished, we went back and did a lot of data cleaning, and we learnt the hard way that some of the questions we had asked were much too specific or too broad. We went through about a month or two of cleaning the data and organising the spreadsheets before we were able to turn it over to test for processing.

What are some of the trends you've seen policywise from this investigation so far?

One thing I found surprising uniformly is how risk-averse policies are. Across Europe, countries tended to be more transparent about issues that are considered uncontroversial. For example, it was pretty easy to find most countries ID policies, residency policies, privacy policies because these are politically low risks. But as soon as it came to anything that might cause some sort of populist backlash -- issues like whether undocumented people would have access, whether the vaccine would be free for them, whether they had the same choices of the vaccine.

Any other surprising findings?

All of these countries that you think of as reasonably progressive were fairly silent on these issues. For example, Germany is regarded as having a fairly robust social welfare state. However, it was very difficult to evaluate how well they were taking care of these more vulnerable people in society. For me, journalistically, it brings up a lot of questions that I would want to FOI for more data.

We also found that some countries had a more positive story than we would have expected. For instance, Belgium, which tends to be viewed as fairly hostile to undocumented people, has a pretty decentralised approach. For example, we found groups providing services offer a very robust programme in Brussels to vaccinate undocumented people.

What are some of the lessons learned from this investigation?

I think you're one of our big lessons learnt is that there we probably should have run through the process with a couple of researchers from start to finish. This would have allowed us to eliminate some of these questions that were not relevant or our researchers could not answer. Most of our researchers didn't necessarily have a specific background in covering migration issues or detailed policy work. Instead, most of them had more of a general data journalism background.

This meant we had to repeat the process quite a few times until we were happy with the completeness of our research. It would have been better to go a little more thoroughly through the research and identify more potential places where we could find the research, which would have sped up the process.

How can interested journalists get involved?

This policy scorecard is an open-source cross-border investigation, and all of the documentation and data is freely available on our Github. Send us your pitches if you are a data journalist or an immigration reporter interested in doing a case study or a deep dive into an immigration health policy. We have several stories that have been published (or are about to be) looking at specific European countries such as Ireland, Greece, Germany and Portugal. We are particularly interested in hearing from journalists in Poland, the Czech Republic and Slovakia.

Latest from DataJournalism.com

This week DataJournalism.com launched The State of Data Journalism 2021 Survey. We want to hear from you! Take the survey in Italian or English. Participants can win Amazon gift cards or a trip to Perugia, to attend the International Journalism Festival. Fill out the full survey today!

Newsletters are a great way to keep up to date with what's is happening in the field of data journalism. But with so many out there, how do you know which ones you should really subscribe to? Simona Bisiani from DataJournalism.com analysed over 100 newsletters on data and journalism. Here is our selection.

Our next conversation

Our next conversation will feature Sinduja Rangarajan, a senior investigative data reporter at Bloomberg. She is an award-winning journalist based in the United States with a background in investigative reporting, data and collaborating with academics. Previously, she worked for Mother Jones and Reveal. Drawing on her current and past stories, we will discuss how journalists can use data to power their immigration reporting in the United States.

As always, don't forget to let us know who you would like us to feature in our future editions. You can also read all of our past editions here or subscribe to the newsletter here.

Onwards!

Tara from the EJC data team,

bringing you DataJournalism.com, supported by Google News Initiative.

PS. Are you interested in supporting this newsletter or podcast? Get in touch to discuss sponsorship opportunities.

Exploring data journalism in Brazil

Tara Kelly — Wed, 10 Nov 2021 14:47:00 +0100

Welcome to the latest Conversations with Data newsletter.

In this week's conversation, we focus on the data journalism scene in Brazil. As shown in many countries, data journalism in Brazil has served to increase quality and credibility, combat disinformation, and build trust with audiences.

To better understand Brazil's media landscape, we caught up with Natália Mazotte, a Brazilian data journalist, consultant and social entrepreneur. She speaks to us about the Coda.Br, Latin America's largest data journalism conference and the chapter she wrote in the second Data Journalism Handbook -- now out in Portuguese.

You can listen to the entire podcast on Spotify, SoundCloud, Apple Podcasts or Google Podcasts. Alternatively, read the edited Q&A with Natália Mazotte below.

What we asked

How did you first become interested in data journalism?

To explain my involvement with civic tech and data journalism, I have to return to my childhood. My father was an electrical engineer and had a business in the 90s involving technologies. This meant I was exposed to computers at a very early age. Few people had access to personal computers back then in Brazil. During this time, I developed a huge love for technology that followed me into adulthood. Meanwhile, my mom is a cultural producer. As I grew up in theatres, this shaped my passion for storytelling.

Then I went to college to study journalism. At the end of 2008, I did an exchange programme in the US and came across data journalism entirely by chance. I found an IRE workshop with David Donald; he was giving this workshop on computer-assisted reporting. I realised that there was this whole new universe where I could explore my passion for technology and stories and within journalism.

Tell us about your journey to founding the School of Data Brazil.

When I came back to Brazil, I started researching and studying everything I could about data journalism. The IRE materials were essential because we didn't have materials in Portuguese, and it made me realise that communities matter when we are learning something new. In 2011, one of my old professors generously opened up some space in one of her classes at the Federal University for me to teach data journalism. After teaching this class, I met the founder of Open Knowledge Brazil, an NGO that focuses on open government and data literacy.

I volunteered to start the School of Data project, which launched at the end of 2013. We developed partnerships with important players in the media philanthropy landscape and trained more than 60,000 students face-to-face and online. We viewed dozens of online tutorials in Portuguese and fostered communities of journalists. This was the most important project I've run so far, and it also opened the door for other opportunities in my career.

Talk to us about the data journalism scene in Brazil.

Today data journalism is a very strong field with a vibrant community here in Brazil. But back in 2014, it was just beginning. The School of Data and the Brazilian Association of Investigative Journalism were decisive in this change. Almost all the big newsrooms in the country now have a journalist specialising and working with data. We have dozens of courses and tutorials about the topic in Portuguese and even some graduate courses like the one I developed at Insper University.

It is still a growing field, but I've seen more and more job opportunities for journalists who have some data experience. Data journalism has not yet become mainstream in Brazilian newsrooms; it's still in a silo. We still have work to do, and I hope that one day the practice of working with data will be so widespread amongst journalists that it will no longer make sense for us to talk about data journalism as a field. It will be just journalism.

You've moved between civic tech, academia and journalism. How have these experiences advanced your career?

That's an interesting question. Working with technology and transparency has given me a vision that journalists usually don't get in traditional careers. I realised by being amongst programmers that the practice of collaboration is quite common. I believe it is also essential to be open about our projects and methodologies. After all, collaboration is at the heart of some of the major technology projects of our time.

But journalists used to be the complete opposite: Super secretive, closed, competitive. This is the usual ethos of journalism, and I didn't feel I fit in. So since most of my career, I spent doing journalism projects in NGOs, I was able to try another ethos. Programmers influenced me, and this made a huge difference for me. The very fact that I wanted to document and share what I was learning about data journalism put me in a place of reference that I didn't expect to reach. So I engaged in advocacy for more transparency and open data, mostly in governments and data literacy amongst journalists. And that led me to lead Open Knowledge Brazil. So it was a very unusual path for a journalist.

You helped start up a digital magazine called Gênero e Número. What's the focus of the publication, and what motivated you to launch this?

The initiative to start this project came from Giulliana Bianconi, the current executive director of Gênero e Número. She wanted to create a journalistic project about gender. She came to talk to me, and at the time, I was running School of Data Brazil, where I was developing workshops and classes but not publishing data-driven stories. And I missed that. I wanted to publish stories with the knowledge I was sharing. So we started to develop what became a digital magazine to cover gender issues beyond the anecdotes -- contextualised with data.

In our first year, we got some funds from the Ford Foundation, and we managed to publish stories about women in sports, the gender gap in politics and so on. We used to publish nine or 10 data-driven stories per month. But then the magazine became another thing. Now it is an organisation, and it has different projects. I'm glad that I had a chance to start this initiative because it's brilliant, and it's going super well right now.

What are your favourite examples of data journalism in Brazil?

This is difficult. We have a lot of excellent examples, and I must say that the data-driven stories produced by Brazilian journalists have reached the level of the best in the world. No wonder we always see Brazilian projects shortlisted in the main international data journalism awards. A recent example that has received several awards is this project made by the Brazilian fact-checking agency Lupa with the support of the Google News Initiative. The piece is called "At the Epicentre". The story shows what would happen if your neighbourhood was at the epicentre of the coronavirus pandemic. No Epicentro was first published in Portuguese and then translated and published in English by The Washington Post.

We have several newsrooms doing a great job investigating and telling data stories in Brazil. Another example is by Gabriela Caesar at Globo. She produced a very interesting data journalism piece to monitor the votes of city councillors in some Brazilian cities, and it made it easier for citizens to follow the work done in city councils. A data journalist's role is not just about producing an incredible data visualisation. It is also important to capture and organise the available data and make it more accessible to a wider audience. Gabriela does this with incredible competence.

You wrote a chapter in the latest Data Journalism Handbook and also led on the Portuguese translation. When is it coming out?

Liliana Bounegru and Jonathan Gray have done a fantastic job with this second handbook: The Data Journalism Handbook: Towards A Critical Data Practice. We are launching the Portuguese version of the handbook at the Coda.Br conference, the Brazilian data journalism conference happening this week. On Thursday, November 11, we will be talking about the handbook in a session with Jonathan, Liliana, and Cédric Lombion, the School of Data programme director. The Portuguese PDF version of the Data Journalism Handbook is available from today on DataJournalism.com.

What advice do you have for aspiring data journalists keen to break into the field?

This is a question that I received a lot. The first thing you should do is join a community of data journalists. My second piece of advice is to work on a project and pick a topic that you like. The project will allow you to apply your skills and learn by doing. The third one is to keep a steady routine with reviewing free resources. My final tip is to see if this is a real passion for you. Explore if this is something that you really want to do and go for it. Choose your community, develop a project and keep a steady learning routine and you will move in this ecosystem.

Latest from DataJournalism.com

Data journalism isn't just for hard news. From fashion to culture, travel and food, journalists can power lifestyle coverage with data too. Journalist Sabrina Faramarzi explains how she found fertile ground for data storytelling in her passion lifestyle. She also points to some practical techniques and formats to get you started. Read the full long read here.

Breaking into the data journalism field can often be challenging. In Duncan Anderson's blog post, learn from those who did it. Explore experiences from academia and the newsroom with interviews from Prof Bahareh Heravi, Michelle McGee, Florian Stalph, Roberto Rocha and Natalie Sablowski. Read the full blog here.

Join our next Discord chat

What are the challenges in vaccinating the undocumented in Europe, and how does data play a role? Tune in live to our upcoming Discord chat with Eva Constantaras and Htet Aung as they go deep into Lighthouse Reports' cross-border data journalism investigation. Eva Constantaras is a data journalist specialising in building data journalism teams in the Global South. Htet Aung is a data scientist with a specialism in machine learning and AI. Join us on Discord on Wednesday, November 17, at 3 pm CET. Add it to your calendar.

As always, don't forget to let us know who you would like us to feature in our future editions. You can also read all of our past editions here or subscribe to the newsletter here.

Onwards!

Tara from the EJC data team,

bringing you DataJournalism.com, supported by Google News Initiative.

PS. Are you interested in supporting this newsletter or podcast? Get in touch to discuss sponsorship opportunities.

Inside the Pandora Papers

Tara Kelly — Wed, 27 Oct 2021 14:55:00 +0200

Welcome to the latest Conversations with Data newsletter!

Last week we hosted a live Discord chat about the Pandora Papers with Pierre Romera, Chief Technology Officer of the International Consortium of Investigative Journalists (ICIJ).

In our hour-long conversation, he took us behind the scenes of the Pandora Papers investigation. He explained how hundreds of journalists trailed through almost 12 million leaked files revealing hidden wealth, tax avoidance and, in some cases, money laundering by some of the world's rich and powerful.

With questions from the audience, we discussed everything from data tools to digital security along with the processes of running this massive investigation.

If you missed the live chat, listen to the edited podcast on Spotify, SoundCloud, Apple Podcasts or Google Podcasts

Alternatively, read the edited Q&A with ICIJ's Pierre Romera below.

What we asked

Tell us about the Pandora Papers. Why are they called that, and why do they matter?

The Pandora Papers is the biggest collaboration ever in the history of journalism, with hundreds of journalists working on publishing the best stories you can imagine. This is a very impactful investigation based on almost 12 million documents leaked from 14 offshore providers -- firms that offer to create wealth through offshore companies. The project is called the Pandora Papers because this collaboration builds upon the legacy of the Panama and Paradise Papers, and the ancient myth of Pandora's Box still evokes an outpouring of trouble and woe. Inside this huge amount of data, we found a lot of hidden elements that make Pandora the perfect analogy.

Talk to us about your role at ICIJ. What does a chief technology officer do in an investigative newsroom?

ICIJ is a very small organisation with only 40 people. What makes it stand out as a news outlet is that half of our organisation comprises data and tech people. We have about 20 data journalists, developers, system administrators and UX designers. That is a lot of people that use technology to do journalism. A big part of my work is to coordinate this effort and help produce a platform for the investigation, dig into the data, help the journalists find stories, and get in touch with the source. ICIJ is a very tech-centric organisation, and so that's why we need so many resources, people and lean technology. I believe that's also why we can produce these kinds of investigations unlike others before us. We have the team to do it.

Talk us through the process behind the Pandora Papers investigation.

It all started two years ago when we got in touch with the source. In this case, the source is anonymous and wants to stay anonymous. That was a source we knew about, and when we had the chance to talk with that person, we started to discuss the files and understand what they could send to ICIJ. Progressively, we began to meet with the source and get the files. That was the very first step of the investigation.

Once we had the file, that's when everything began. We have to make everything available for all the journalists. We started to operate on our servers securely, and we started to index it. We put it into a search engine to make it easy to search for the huge amounts of text in these documents. Then, before starting to investigate, our team started to create what we call a country list. So the country list is at the very beginning of every ICIJ investigation. This is the point where we look into the files and try to look for matches and occurrences related to each country in the world.

After we have a good list and enough information to share with our partners, we contact them and tell them, "Well, we have information related to that person, so maybe you could be a good partner for this project." That's when the partner joins the project. Once we have them on board, we train them to use our platform and use the security layers we have. We also try to point them to interesting stories inside the documents.

Then we start a very long process of sharing the findings on our digital newsroom. This is called the iHub. This is where we try to sort out some structured data from the document. As you know, all those documents are unstructured, and they are PDFs, excel docs, other files -- so we need to find a way to extract sense from these documents -- that's what we call structured documents. That's also a very long process that lasts until publication.

Then, right before communication, we fact check everything. We submit it for legal review, and we ensure that everything we publish is bulletproof and verified. That's a very important part of the process, and it's also our secret sauce to avoid being sued and publishing wrong information. Finally, one month before publication, we reach out for comments from the people involved in our story. Then we publish.

What are the essential tools you used to coordinate this investigation?

One of the most important platforms at the centre of everything is called Datashare. It is a tool to distribute text extraction from documents to many servers. So imagine that you have a leak that is as big as the Pandora Papers. With Datashare, you can ask dozens of servers to work on the document to put them into a search engine and extract the text from the document. It is also able to provide an interface to explore and search the documents.

For the leaked documents, we also have another very important platform called the iHub. So the IHub is our digital newsroom and a platform that allows journalists to share findings, leads, testimony, videos, whatever they produce related to the investigation. On the iHub, we develop the philosophy of radical sharing. This means we encourage all partners to share everything they found. Even if it's bad or there is not enough proof, we encourage them to publish it. And that's also one of the reasons why ICIJ can publish stories in many countries simultaneously, even if there is some censorship. If a journalist cannot publish a particular story in their country because of political pressure, other journalists will be able to publish it somewhere else.

How important is digital security for ICIJ?

Digital security is a very important part of our work, and that's one of the only mandatory things to know before joining an investigation. You have to be able to encrypt your emails, and we use PGP to do that. After a user can use PGP, they send us their key to import it into the system and securely communicate with them.

Confidentiality is also very important to ICIJ, and it is a part of our security. Every media organisation signs a non-disclosure agreement, and we try to ensure that this privacy remains until publication. So far, we've been very good with that, and I think it's also a reason why our security stays strong throughout the investigations. Just one day before publication, we got a DDoS attack on our website. Luckily, our system team was able to block the attack.

How did ICIJ's work on the Pandora Papers differ from the Panama Papers investigation?

The big difference is the size of the project, but the methodology is almost the same. The steps were the same - we indexed the data, built the countries list, and contacted all the partners. All of those processes we're already applied to the Pandora Papers and all the investigations we do. That's also one of the reasons why we were able to involve so many reporters because we didn't try to reinvent the wheel. Instead, we tried to capitalise on the methodology we had already implemented at ICIJ.

From a reporting point of view, one of the Pandora Papers' challenges was to keep the audience's attention on offshore finance. We needed to find groundbreaking stories to maintain some attraction for the readers.

What did you learn from this investigation?

Even if ICIJ has a lot of experience with huge collaborations, we were not prepared to tackle so many journalists. That was hard, and it takes time, a lot of resources, and ICIJ is about the same size now than it was one or two years ago. We also have to improve our platforms to work with so many journalists, which was one of the lessons we learnt from this investigation. Even if the technology we used for the Pandora Papers was very similar to our other ICIJ investigations, improvements were still necessary. For instance, we still had to improve the servers to accept a lot of requests from users and handle a significant amount of data.

We also learned a lot about the fact-checking process during this investigation. There are a lot of stories that we did not publish because we realised that we didn't have enough proof. We may have more proof in the future to release more big stories in the coming weeks. We realised that we had a lot of potential inside this leak that could prolong for many years after publication. It seems we will continue to produce stories for a while with this investigation.

Finally, what advice do you have for aspiring data journalists?

My number one piece of advice is always to learn to program. If you want to work on interactive visualisation or interactive storytelling, learn JavaScript, learn D3. But suppose you are more interested in data scraping and data analysis. In that case, you should learn Python or R. I guess that will make a massive difference between you and other journalists who don't have any programming skills. I also think that will help you to work with developers. If you can better understand their work, the collaboration will be much easier.

It is also essential to develop a data mindset in your own newsroom. For instance, when there is a story, are your colleagues interested in having some sort of data angle? If you want to develop your own skills in data journalism, you also have to educate your colleagues to ask you for help or think from a data perspective.

Latest from DataJournalism.com

Do you want to broaden your horizons in data journalism while on the go? We’ve got a suggestion for you! Stay up to date with our list of the best data journalism podcasts you should be listening to right now. Some top picks are from podcast hosts Andy Kirk, Simon Rogers and Alberto Cairo. Read Andrea Abellan's full blog here.

Apply for EJC's latest journalism grant

The Global Health Security Call is supporting up to 20 journalistic projects to be delivered by freelancers or teams of freelance and/or staff journalists. The call will provide grants of up to $7,500 USD per project and is aimed at journalists publishing stories in opinion-forming media organisations across France, Germany, the UK, the Netherlands, Italy, Norway and Sweden. The grant is powered by the European Journalism Centre and funded by The Bill and Melinda Gates Foundation. Applications close on 5 November 2021 at 17:00 CET. Find out more here.

Photo credit: Mansi Thapliyal

As always, don't forget to let us know who you would like us to feature in our future editions. You can also read all of our past editions here or subscribe to the newsletter here.

Onwards!

Tara from the EJC data team,

bringing you DataJournalism.com, supported by Google News Initiative.

PS. Are you interested in supporting this newsletter or podcast? Get in touch to discuss sponsorship opportunities.

Eye on Africa's data journalism eco-system

Tara Kelly — Wed, 06 Oct 2021 14:44:00 +0200

Welcome back to the Conversations with Data newsletter!

Last week we hosted a live Discord chat with Code For Africa's Tricia Govindasamy and Jacopo Ottaviani. The pair spoke to us about the state of data journalism in Africa by citing some compelling projects from the region, including WanaData, InfoNile and Mapping Makoko.

If you missed the live Discord chat, listen to the podcast on Spotify, SoundCloud, Apple Podcasts or Google Podcasts

Alternatively, read the edited Q&A with Code For Africa below.

What we asked

Let's start with the general media landscape in Africa. What makes it different from Europe?

Jacopo: If we compare it to the European media landscape, the average age in African newsrooms is often lower. This means our African colleagues are more open-minded to experiment with techniques that they have never seen before. There is a very fertile ground for innovation in Africa. But again, I think we should focus on individual countries and single initiatives and explore them very carefully because the situation changes dramatically from country to country.

Tricia: Something about media in Africa that may be different is that radio journalism is still quite big. I'm not talking about podcasts. I'm just talking about plain old fashioned radio. Many are grassroots radio stations run by journalists as well as citizen reporters. They may not have formal journalism backgrounds, but they are passionate about it and just go for it.

Could you give our listeners give a quick overview of the data journalism scene in Africa?

Tricia: Data journalism is growing in Africa, and especially since the advent of COVID-19. Journalists now realise that data is really important and why they need to develop data literacy skills. There's generally a lack of digital literacy skillsets amongst the journalists that we work with. Many of them have very little formal journalism education. Often many journalism schools don't cover data journalism as part of their curriculum. You also have those in newsrooms who never had any experience working with data journalism. Code For Africa has a programme called academy.Africa, which offers data journalism courses for free. Hopefully, this can bridge the gap.

Talk to us about cross-border investigations in Africa.

Jacopo: I think cross borders journalism is becoming bigger and bigger in Africa. We're trying to invest a lot into that because we strongly believe it can make a difference. Why so? Because topics are cross border nowadays -- transnational trade, the international exchanges, climate change and even the pandemic. Another important issue is African migration. Most of the migrants in Africa move from one African country to another. African journalists can and do collaborate. We worked with a network of water journalists in East Africa called InfoNile. This specialised network is made up of journalists from 11 countries all in the Nile Basin region tackling water stories.

What African countries are leading the way when it comes to providing access to information and open data?

Tricia: Statistics South Africa is our national statistical department. I believe they are at the forefront in Africa in terms of open data and sharing the data. Most of the data is available online and you can register for a free account. They provide a database where you can access a lot of survey data. You can download that into a spreadsheet format. My colleagues in Nigeria often tell me that they can walk in at any time (or they have the right to walk in at any time) to a government office and request data. However, the data may not always be available in a spreadsheet.

Jacopo: In terms of data liberalisation, the type of challenges we sometimes have in Africa is very similar to the challenges we have in Italy where data is not always available. There is an issue with the census. According to The Economist, almost half of the continent in Africa is counted with a census that is older than 2009. These are not fresh figures about the population in Africa in many regions. This is a problem for journalists, but it's also a problem for policymakers who have to shape their policies with data that is not up to date. If public institutions could invest more in a census, that would be great for everyone, including the media. Of course, it's challenging and not straightforward.

Talk to us about WanaData.

Tricia: WanaData is a pan-African network of 500 women in journalism and data science working in Africa. We have chapters in different countries, and due to lockdown it's all virtual. So during these meetups, we do a training session or a talk related to data or data journalism. Any female journalist, data scientist or technologist is welcome to join. One of the projects I like to highlight is about the COVID-19 outbreak. We've just awarded data fellowships to 12 journalists from four different African countries. In total, they will produce about 40 stories on the vaccination rollout in Africa. These fellowships provide grants to write these stories. They are mentored by Code For Africa. Data analysts will assist them in finding data and visualising their data. They are trained in using tools like Flourish and uploading data on Code For Africa's open data portal called Open Africa.

What other examples have come from WanaData?

Tricia: We have a tool called GenderGapAfrica, which takes data from the World Economic Forum's Global Gender Gap Report. And the particular data set we examine the gender pay difference in African countries. So we look at how much a female gets paid and how much a male gets paid for a similar type of work in a particular country. So we have created this tool. The user goes onto the website, they select the country they're from, they select their gender, how much they're getting paid. And then it actually shows the amount more or less that one gets paid.

Tell us about the Mapping Makoko project. Why does mapping a forgotten community like Makoko matter?

Jacopo: Makoko is one of Africa’s floating inner-city slums, with a third of the community built on stilts in a lagoon off the Lagos mainland. The rest of the settlement is on swampy land with little sanitation and few public services. It is estimated that around 300,000 people live there today, but the lack of data means the actual population is unknown and largely politically unrecognised.

Before beginning the Mapping Makoko project, it didn't feature much on current maps, and there was very little information on its geography, population density, or land ownership. The team knew that an accurate map is fundamental to plan and improve the living conditions of those who live in Makoko. Instead of doing it ourselves, we trained locals to fly the drones around and map the community. The team used OpenStreetMap to draft a layout of the area complete with landmarks and streets.

Finally, if you had a magic wand and could change one thing about journalism in Africa what would it be?

Tricia: I would want there to be more initiative from newsrooms to give their employees time to attend training sessions for data journalism.

Jacopo: I would love to see stronger business models to make media more sustainable and more open to innovation. This would enable more newsrooms to experiment. Many institutes focus on media sustainability, but I wish more resources existed to expand their work.

Our next conversation

Our next conversation will feature Sinduja Rangarajan, Data and Interactives Editor at Mother Jones. She is an award-winning journalist based in the United States with a background in Investigative reporting, data and collaborating with academics. Drawing on her current and past stories, we will discuss how journalists can use data to power their immigration reporting.

Latest from DataJournalism.com

Do you want to broaden your horizons in data journalism while on the go? We’ve got a suggestion for you! Stay up to date with our list of the best data journalism podcasts you should be listening to right now. Some top picks are from podcast hosts Andy Kirk, Simon Rogers and Alberto Cairo. Read Andrea Abellan's full blog here.

Our latest long read article features case studies examining how a team of researchers, journalists and students used digital recipes to delve into COVID-19 conspiracy content sold on Amazon, the world's largest online retailer. Written by Jonathan Gray, Marc Tuters, Liliana Bounegru and Thais Lobo, this walk-through piece serves as a useful guide for researchers and journalists seeking to replicate a similar collaborative investigation using digital methods. Read the full article here.

As always, don't forget to let us know who you would like us to feature in our future editions. You can also read all of our past editions here or subscribe to the newsletter here.

Onwards!

Tara from the EJC data team,

bringing you DataJournalism.com, supported by Google News Initiative.

PS. Are you interested in supporting this newsletter or podcast? Get in touch to discuss sponsorship opportunities.

Humanising climate data for solutions reporting

Tara Kelly — Wed, 22 Sep 2021 15:37:00 +0200

Welcome back to the Conversations with Data newsletter!

We hope you had a relaxing and restful summer and are ready to dive back in! At DataJournalism.com, we are excited to bring back our monthly live Discord chats. Our next live conversation will take place next Wednesday at 3 pm CEST with Code For Africa. Add the Discord chat to your calendar.

Now on to the podcast!

Climate data journalism offers a vital opportunity for organisations and people looking to spur action, find solutions and hold power to account. But what do data journalists need to be mindful of when covering this pressing issue?

To better understand this, we spoke with data journalist Clayton Aldern from Grist, a nonprofit, independent media organisation dedicated to telling stories of climate solutions and a just future. In this episode, he explains how climate investigations can rely on data analysis and machine learning for solutions reporting. We also hear about his go-to coding languages and digital tools for wrangling climate data.

You can listen to the entire podcast on Spotify, SoundCloud, Apple Podcasts or Google Podcasts. Alternatively, read the edited Q&A with Clayton Aldern below.

What we asked

Tell us about Grist and its approach to solutions journalism when covering climate justice reporting.

There is often this tension in media between solutions journalism and perhaps the more traditional vein of journalism known as accountability reporting. I don't think there needs to be that tension there. At Grist, we're interested in illustrating what a better future looks like. By extension, we're interested in people reaching out toward that future, realising that it's possibly within grasp if one indeed puts a little effort in. So, I think where accountability reporting or investigative journalism and solutions journalism meet is indeed at that point of desire.

Talk to us about the data-led investigative piece you worked on covering abandoned wells that is now a finalist in the ONA Awards.

This piece was a collaboration with my Grist colleague Naveena Sadasivam and The Texas Observer's Christopher Collins. During the middle of the investigation, I joined them to provide some data support, which became an excellent analytical and visual aid to the piece. But the two of them were responsible for the deep investigative dives here looking at abandoned oil and gas wells in the Permian Basin region of the United States.

As oil and gas companies weathered volatile oil prices last year, many halted production. More than 100,000 oil and gas wells in Texas and New Mexico are idle. Of these, there are about 7,000 "orphaned" wells that the states are now responsible for cleaning up. But statistical modelling by Grist and The Texas Observer suggests another 13,000 wells are likely to be abandoned in the coming years. A conservative estimate of the cleanup cost? Almost $1 billion. And that doesn't consider the environmental fallout.

Clayton Aldern is a climate data reporter at Grist. He is also a research affiliate of the University of Washington's Center for Studies in Demography and Ecology, and with the housing scholar Gregg Colburn, he is the author of the forthcoming book Homelessness is a Housing Problem.

How did you use machine learning in this story?

This is a perfect problem for machine learning classification. There are two classes of inactive wells -- those that are not producing wells at any given time, and then there are proper orphaned wells that have been inactive for a number of years or even up to a decade. There is a clear distinction between these two as categorised by the states. We looked at the data to see if we could build a model that distinguished between orphaned and inactive wells.

It's important to remember there are a whole lot of characteristics that describe wells. There's the depth, the type and the region. You can imagine building a logistic regression model that seeks to distinguish between the two classes and then asking how well it does on a chunk of the data it's never seen before. We sought to distinguish between inactive wells and orphaned wells as categorised by the state. We found that there's this subpopulation of inactive wells that are statistically indistinguishable from orphaned wells.

What has been the impact of this investigation?

Well abandonment and well plugging is a hot button issue in the United States right now. If you think about the infrastructure negotiations, for example, it's come up again and again. Why is that? There are many wells in the country, and there aren't great rules on the books. Some of these plugin cost formulas have bonding formulas written decades ago. Since then, drilling technologies have really improved, allowing for more of this activity. This isn't just an issue for Texas and New Mexico. It's also an issue for California, Colorado, and the Dakotas, where there's a lot of shale drilling.

As soon as we published this investigation, many advocates, environmentalists and environmental lawyers and indeed oil and gas industry folks came out of the woodwork and said, "Hey, we're working on this issue in this other state. Can you apply your model to that state? We would like to know what the coming wave of abandonment looks like in our environment." This is an open source model that anybody can use. It takes a little bit of technical know-how. And obviously, you need a data set. But this analysis is replicable in other contexts.

Tell us about your coding skills. What tools do you rely on regularly?

With my background in computational neuroscience, I learned Matlab. I don't use any Matlab these days, but it was my entree to programming. I'd say I spend about a third of my day in R for most of my scripting. I use Python for web scraping. Then there's room for some D3. Those are the three Musketeers of data journalism: Python, R and D3. At Grist, we run our website on WordPress. Unfortunately, there's no great D3 and WordPress integrations. That's why we use this other JavaScript library called amCharts, which makes responsive, interactive design easy. For most of my exploratory work, I'm a big fan of Tableau.

Finally, what advice do you have for journalists looking to specialise in climate data reporting?

My advice for folks looking at the climate crisis is not to forget where you've come from. If you come from a traditional journalism background where you wrote about people, you ought to be doing the same thing when you're writing about the climate crisis. I think there's this tendency to imagine data journalism as a kind of island that sits off the coast of journalism. Some see it as this special thing that the programmers do, and then you end up with these nice, immersive visual features. That's all well and good, but you still want to use your traditional investigative reporting toolkit.

You should still be doing archival research, filing public records requests, and most importantly, you should still be talking to people. Remember to examine how the dataset impacts communities. The climate crisis is a story of environmental injustice and climate injustice. If you are missing those angles, you are either missing the lead or burying it.

Join our live Discord chat on 29 September at 3pm CEST.

Our next conversation will be a live Discord chat on Wednesday, 29 September at 3 pm CEST with Tricia Govindasamy and Jacopo Ottaviani from Code For Africa. Tricia is an award-winning South African scientist specialising in Geographic Information Systems and data science. Jacopo is a computer scientist and senior strategist who works as Code for Africa’s Chief Data Officer. We will discuss what the data journalism and media landscape look like in Africa. We will also hear about some regional data projects related to climate change and gender data. Add it to your calendar here.

Latest from DataJournalism.com

News Impact Summit is happening tomorrow! Register today!

Want to make your journalism more diverse and inclusive? Join us tomorrow for European Journalism Centre's News Impact Summit on Thursday, 23 September. Learn from industry experts about innovative approaches to bringing about equality in the newsroom. Register here.

As always, don't forget to let us know who you would like us to feature in our future editions. You can also read all of our past editions here or subscribe to the newsletter here.

Onwards!

Tara from the EJC data team,

bringing you DataJournalism.com, supported by Google News Initiative.

PS. Are you interested in supporting this newsletter or podcast? Get in touch to discuss sponsorship opportunities.

Inside the OpenLux investigation

Tara Kelly — Wed, 08 Sep 2021 15:32:00 +0200

FlokiNET.is offers DataJournalism.com members a special package with free shared hosting webspace and a free domain name. To access the 5% discount on all their products, use the promo code OpenLux.

Shrouded in secrecy, Luxembourg is a global financial hub known as the crossroads of Europe. After the LuxLeaks investigation in 2014 revealed its close ties to offshoring and tax evasion, journalists at Le Monde sought to find out if that was still the case.

This led to the OpenLux investigation, a cross border collaboration with Le Monde, OCCRP and numerous other news organisations around the world. Instead of relying on leaked documents, the investigation began by scraping an open database provided by the Luxembourg government using Python.

To find out more, we spoke with OCCRP editor Antonio Baquero and Le Monde journalist Maxime Vaudano about the hidden side of the Luxembourg offshore industry.

They explained the data collection and analysis involved along with the benefits of using OCCRP's Aleph, an investigative data platform that helps reporters follow the money.

You can listen to the entire podcast on Spotify, SoundCloud, Apple Podcasts or Google Podcasts. Alternatively, read the edited Q&A with Antonio Baquero and Maxime Vaudano below.

What we asked

Talk to us about OpenLux and how the investigation began.

Maxime: Luxembourg is not completely new for journalists interested in tax avoidance. There was a big scandal in 2014, the LuxLeaks based on leaked documents from tax rulings from the accounting firm PricewaterhouseCoopers (PwC). We wanted to know whether this scandal and the regulation that came afterwards had any impact on Luxembourg. Was it still a tax haven or had it become something else?

This investigation was not a leak (as in the LuxLeaks). Instead, it relied on open source data. This came about because a few years ago the European Union voted for a new regulation that asked all the EU members, like Luxembourg, to publish an online register saying who really owns the companies in every country. We call this the ultimate beneficial owner. It's a great way to bring transparency to the corporate world. There was this register that was published online in 2019 and we just used it by scraping the data.

We did some data analysis and gathered a dozen of partners from different countries and tried to figure out what we could do with that. What did all those names mean? What did this mean for the Luxembourg economy? Could we prove that Luxembourg is still a tax haven or that it moved to something else? And that was pretty much the starting point.

OpenLux was a cross-border investigation. How did the different newsrooms work together?

Antonio: From the beginning, the data belonged to Le Monde. At OCCRP, we were focused on criminality and we found many interesting profiles from the data. We shared all of them with Le Monde and with the other media partners. We invited any partner if they saw any interesting topic or name to create a group and to start a project together. It was amazing because even in cases where Le Monde, or other media, were not interested in an aspect of a story, they helped us.

There were some cases where OCCRP wouldn't publish that story, but we helped the other media partners. At the end of the project, personally, I was exhausted because it was a year and it wasn't an easy project. But I was really satisfied not only by the final result but by the path we took to deliver that result. We built a project based on friendship and true cooperation from journalists from all over the world.

Talk to us about the data scraping process.

Maxime: We had a developer at Le Monde who did most of the scraping process. He used Python to scrape the register. It's quite easy to scrape compared to other websites or registers because there are no real technical difficulties. Each company in Luxembourg has a different number. So you just type the number "B123" or "B124" and you can obtain information on that company. What we did was to automate the code, so it went to the website register to gather information.

Then the challenge was to be able to keep it updated because the register is updated every day. We had to do it very regularly to be able to gather new names to determine when the name was wiped out of the register. This was because if someone was no longer a beneficial owner anymore, it disappeared from the register forever. Being able to scrape it every two or four days made it easy for us to have a history for our investigation. And then there was a big challenge of what we do with all this data.

How did you sort through this massive amount of data?

Antonio: That was the big contribution of OCCRP because we have a tool called Aleph, where you can put in every kind of data that you want. It makes it very easy to organise this data and to cross-check it with other data sets. We put the data in Aleph and with a few manipulations, we were able to cross-check it with other registers in other countries or with names of politically exposed persons. So it makes this process of selecting interesting names very easy compared to going through every name individually. That was the first part of the data storytelling process.

Tell us how Le Monde handled its approach to finding relevant stories from this data.

Maxime: As a data journalist, I'm very used to working with a quantitative approach. For Le Monde, we found out that there were many very rich families from France in the data set. Instead of focussing on one family or two families, we invested a lot of time trying to map all the assets of the top rich families. We were able to prove that 37 of the most 50 wealthiest families in France were in the data set and owned assets in Luxembourg.

That was more striking for us to say that most rich French families have their assets in Luxembourg than focussing on one and doing a name and shame approach. We always had to give names in order for the public, for the reader to be able to understand what it's about. But it's more striking to have big numbers and to be able to determine whether there is a trend. This data was so rich that we could do this.

How did data visualisation come into play for this investigation?

Maxime: We didn't have a big focus on data visualisation for Le Monde's OpenLux publication, but during the process of working and investigating the stories, it was important. For example, we used a mind mapping tool to be able to reconstruct the structures of the company because it's usually very complicated. The structures were very complex with subsidiaries in numerous countries. We were able to rebuild the structure from zero by looking at the documents. Using this data visualisation tool helped us understand what it's about. Because it's so complicated to digest for the public, it was not worth it to publish the raw visualisation. But it's still very useful for us to be able to have a clear mind about what we are looking at.

Antonio: For most of the stories, we wrote articles. But there's one story where we decided to use a visualisation. The story aimed to explain how heads of state from all over the world used Luxembourg for having real estate properties across Europe. We created an interactive map so the reader can see a map of Europe and see the properties and who the owner is.

What did you learn from this investigation?

Antonio: For me, it was the first big investigative project that I worked on as a coordinator in OCCRP. So for me, I learned how important it is to be fair and to be frank when you try to cooperate with others. When you work with others on a project like this, you need to commit to sharing everything. I also learned how important it is to ask for help. This is not a competition amongst journalists to see who is the most intelligent. This is a collaboration and if you don't know something, ask for help.

I also learned how important it is to not only have journalists in other countries, you also need to have journalists with different skills. Especially relating to data, financial documents or company records. It is really essential to realise you need others to make your work the best journalistic work in the world.

Latest from DataJournalism.com

Drones aren't just for photojournalists. Data journalists can also take advantage of them for their stories. Monika Sengul-Jones explores how to boost your storytelling with this technology, as well as the potential pitfalls for using them. She also provides a guide for journalists getting started. Read the full article here.

Data journalism training opportunity

Are you a freelance journalist reporting on development issues? Do you want to gain data journalism skills? Then the data bootcamp for freelancers is for you! Organised by the Freelance Journalism Assembly, this interactive 20-hour, two-week virtual training will teach you how to find, clean and analyse data. You'll also learn how to create data storytelling formats. Apply for one of the 25 scholarships. Deadline: 10 September, 24:00 CEST.

Our next conversation

Our next conversation will feature data journalist Clayton Aldern from Grist, a nonprofit, independent media organisation dedicated to telling stories of climate solutions and a just future. Clayton is a writer and data scientist currently working at the intersection of climate change, environmental degradation, neuroscience, and mental health. We will discuss how to best cover environmental issues and climate change through data journalism.

As always, don't forget to let us know who you would like us to feature in our future editions. You can also read all of our past editions here or subscribe to the newsletter here.

Onwards!

Tara from the EJC data team,

bringing you DataJournalism.com, supported by Google News Initiative.

PS. Are you interested in supporting this newsletter or podcast? Get in touch to discuss sponsorship opportunities.

Exploring the sound of data

Tara Kelly — Wed, 25 Aug 2021 05:00:00 +0200

Welcome to our latest Conversations with Data newsletter.

This week's episode features data journalists Duncan Geere and Miriam Quick, the co-hosts of Loud Numbers, the new data sonification podcast. The pair speak to us about what sonification means for data storytelling and what kind of stories work best for this medium.

You can listen to the entire podcast on Spotify, SoundCloud, Apple Podcasts or Google Podcasts. Alternatively, read the edited Q&A with Duncan Geere and Miriam Quick below.

What we asked

Talk to us about Loud Numbers, the data sonification podcast you launched.

Miriam: Loud Numbers is a data sonification podcast created by Duncan and me. Data sonification is the process of turning data into sound, and we take it a step further by turning those sounds into music. In each episode, we introduce a data story, explain how we sonified it, and then play the sonification we’ve created.

Duncan: To us, sonification is the art and the science of turning data into sound. There are loads of different ways to do that -- from the simple pinging of a smartphone notification all the way up to a complex eight-part symphony where the data governs how all the different melodic lines interact with each other.

Talk to us about your career and how it intersects with sonification.

Miriam: I've been a data journalist and a researcher since about 2011, but I started working with data and data visualisation through music. I did a PhD in Musicology at King's College London. As part of that, I used quite a lot of data in my doctorate. I've always been involved in music and data. For several years, I've been keen to work on a project that turns data into music in a systematic and musically exciting way.

Duncan: I started out in tech writing. I later moved into data journalism and began covering science and environmental topics. As I began to write about more complicated subjects, I realised that I wanted more tools at my disposal than just words. In my 20s, I did some DJing and was in a couple of bands. I thought it would be really cool to combine these things. I'd come across a couple of inspiring examples of sonification, but I wondered why there wasn't a sonification podcast out there. After waiting to see if anyone would develop one, I realised that I had to just do it myself.

What other sonification work had you done prior to this?

Miriam: I had some experience with sonification before. I'd done a project called Sleep Songs, which was a collaboration between myself and the information designer Stephanie Posavac. We measured our breathing rate while asleep along with our husbands and took that data. Stephanie turned it into a visual artwork. I turned it into two pieces of music where the rhythm of the inner parts in the music corresponds to the changes in our breathing over the eight hours that we were asleep.

How much music theory is required for data sonification?

Duncan: My musical knowledge is limited. It depends on what kind of sonification you want to do. If you want to sonify something that is very musically complex, then you'll definitely benefit from having some theoretical background. But you do not necessarily need it. There's a lot that can just be done with tempo. For example, you can show how frequent events happen maybe throughout history or perhaps just by triggering specific audio clips at different volumes at different times. There's a fantastic sonification piece called "Egypt Building Collapses" by Tactical Technology Collective. You don't need any music theory or even code for that. You just line up the sound files on your timeline and then play them at different volumes.

Let's talk about your Tasting Notes episode where you sonify the experience of beer tasting.

Duncan: For this episode, we had 10 different beers, which made 10 different pieces of music. So one per beer. Each one has 10 different parameters associated with it around taste, aroma and look. We got these numbers from Malin Derwinger, a professional beer taster in Sweden. We asked her to taste 10 different beers and give us her scores for them. The louder the sound associated with each parameter, the stronger that taste or that aroma.

Miriam: We're not creating these with the intent that people are going to be able to read specific numbers out of them. We wanted that to be an intuitive connection between the sounds that we use to represent the taste and the taste themselves. For example, the dizziness has an upward sweep because of its bubbly sound, the sweetness is a pleasant, harmonious chord because the sweetness is a pleasant sensation. Bitterness is hard-edged. For malts, we used a guitar chord because we associate beer with being in a bar, listening to a band.

What data stories work best for sonification?

Miriam: The data stories that work best are those that are based on time-series data. It works particularly well for showing a clear and simple trend. For instance, something that gets larger over time or something that gets smaller over time. All but one of our Loud Numbers sonifications uses time-series data. The exception is the beer episode. Stories about speed and pace work well too. There's a good New York Times video from 2012 about Olympic men's 100 metre finish times. It uses sonification to play a note every time one of the athletes crosses an imaginary finish line. It aims to show how narrow the margins are in sprint races from 1896 to 2012.

Duncan: There's also a fantastic New York Times article from 2017 about the speed of fire of automatic weapons. This is obviously a highly emotionally charged subject. Instead of using a gun sound, which I think would be quite tasteless, they use a very minimal sound to signify the speed at which these different weapons fire. I thought that was very interesting because one of the powers of sonification is that it can carry emotional weight in a way that a standard bar chart doesn't. You can create a sonification so that it gets so loud that it hurts. You can never make a bar chart where the bar is so long that it hurts. Sound reaches people in a way that is much harder to do with traditional visuals. It's hard to match the emotional intensity that you can reach with sound.

What other fields are using sonification?

Duncan: There are people working with sonification in journalism, but also in science, particularly earth sciences and astronomy. There are loads of astronomers who work on education, including partially sighted astronomers who use sonification to understand trends in complex datasets. Artists are also creating installations that involve sonification. Others using sonification are from the worlds of traditional music and computer-based music.

What is your technical process and methodology for sonifying data?

Duncan: For most of our sonifications, the tech stack is pretty simple. We use Google Sheets to get the data together and analyse it. Then we load that data into a piece of software called Sonic Pi. That's where it becomes sound. Sonic Pi is a coding language based on Ruby that allows you to automate the most tedious parts of the sonification process, like figuring out which data values to turn into which sound values.

So, if the data is eight, what does that mean in terms of volume, pitch or whatever else you're deciding. That process is called parameter mapping and it's a really important part of sonification. Then we run the data through the code and it generates a series of notes or volumes. What comes out of it is more or less a complete piece of music or, worst case, a collection of sounds. Then we polish it up and turn it into an actual track in Logic Pro, which is a digital audio workstation. The sonification work happens in Sonic Pi, but Logic Pro is where it becomes music.

What challenges have you encountered with creating sonifications?

Miriam: From the very start of this podcast we wanted the tracks that we made to sound like music and not to sound like experiments. The issue we had is that data and music don't really follow the same rules. They're quite different to music. It's got its own rules and structures. So when you translate data into sounds without thinking carefully about the system, they can easily sound random and a little bit meandering or meaningless. We thought this is not enough. We want to make tracks that actually sound decent. You've got to think carefully to come up with systems that optimise the storytelling potential of the data or showcase the trend that you want to reveal. A lot of it is trial and error, but a lot of it is really about simplification of the data.

What is next for data sonification? Do you think it will become mainstream?

Duncan: I think it will become one of the tools that people use in their toolbox. It will definitely become more acceptable, particularly as stories become more multimedia. This is because sonification really does have the potential to tell data stories powerfully and emotionally. It also conjures up what you might call new virtual worlds, particularly when it's combined with video, animation, and graphics. But sonification also works by itself.

Miriam: One thing that may increase the acceptability of sonification is the explosion in the popularity of podcasts in recent years. People are getting more used to absorbing information through audio alone. At the moment, sonification does have a novelty appeal. Some mainstream news organisations are starting to use this medium. The BBC and The New York Times have been using it for quite a few years now. The Economist have a COVID-19 podcast that includes a regular sonification feature. I think that it's going to become more and more familiar territory.

Latest from DataJournalism.com

If you have ever wondered why you should include data in your reporting, read our latest blog post. DataJournalism.com's Andrea Abellan outlines 10 reasons to invest in telling stories with data. We also link to some useful resources to help you get started.

Data journalism training opportunity

Our next conversation

Our September conversation will feature journalists Antonio Baquero from OCCRP and Maxime Vaudano from Le Monde. They will speak to us about working on OpenLux, a collaborative international investigation on the hidden side of the Luxembourg offshore industry. We will hear how they uncovered this story with Python and OCCRP's Aleph, an investigative data platform that helps reporters follow the money.

As always, don't forget to let us know who you would like us to feature in our future editions. You can also read all of our past editions here or subscribe to the newsletter here.

Onwards!

Tara from the EJC data team,

bringing you DataJournalism.com, supported by Google News Initiative.

PS. Are you interested in supporting this newsletter or podcast? Get in touch to discuss sponsorship opportunities.

Inside The Economist's "Off the Charts" newsletter

Tara Kelly — Wed, 11 Aug 2021 07:00:00 +0200

Welcome to our latest Conversations with Data newsletter.

This week's episode features Marie Segger, a data journalist at The Economist. She speaks to us about launching "Off the Charts", a newsletter taking us behind the scenes of The Economist's data team. She also tells us about her learning and career path into data journalism, along with her thoughts on the best way for data teams to collaborate.

You can listen to the entire podcast on Spotify, SoundCloud, Apple Podcasts or Google Podcasts. Alternatively, read the edited Q&A with Marie Segger below.

What we asked

Talk to us about your journey into data journalism.

The first time I came across data journalism was at a Google event in Berlin many years ago. I heard a talk from the Knight Lab on how to create different data stories with interactive tools. At that time, I had applied to study for a Master's degree in Digital Journalism at Goldsmiths, University of London. Led by Miranda McLachlan and Andy Freeman, the programme is half journalism and half computing. The computing side was split into different modules. We learned basic HTML, how to build a website and use WordPress. We also worked with Google Sheets and Excel and learned how to scrap data from websites. The course was practical, where you learned by doing.

Talk to us about The Economist's "Off the Charts" newsletter. Why did you launch it?

We launched the newsletter in February 2021, and it has been a very exciting journey. We developed plans and prototypes with the newsletter team. The motivation to launch it was partly because our team really enjoyed other data newsletters out there. Some of them include "Conversations with Data", Sophie Warnes' "Fair Warning", Gavin Freeguard's "Warning: Graphic Content," and Jeremy Singer-Vine's Data is Plural. Before this, The Economist published Medium articles talking about our work behind the scenes.

The newsletter is similar to that. We had some really interesting pieces. One of them is called "Mistakes, we've drawn a few" by Sarah Leo. This was one of our most popular pieces looking at lessons learnt from our errors in data visualisation. This educational content about our work makes it more accessible to our audience. I found this industry very difficult to break into and understand. That's why I love it when people make an effort to be more transparent.

What is the workload for this type of weekly newsletter?

The good thing is we are always rotating the workload. The data journalism team is made up of 15 people. It's a different person writing the newsletter every week. I write the introduction and edit the copy for each issue.

Who is on the team? What skills do they have?

We have a handful of data journalists, some visual journalists and two interactive data journalists. The data journalists gather and analyse the data and write the text. The visual data journalists take that data and create a design for our different sections: Graphic detail or Daily charts. The interactive data journalists create really beautiful interactives.

How have you worked with other editorial teams within The Economist?

The data journalists contribute to other sections besides Graphic detail and Daily charts. We work across the paper and collaborate with other people. During the pandemic, we've done a lot of stories with our health correspondents. I am hoping to work on a climate story this year. For that, I plan to collaborate with our main climate correspondent.

How has the pandemic shaped your work, and what are some shining examples to come out of this?

We've seen huge interest in our journalism during the height of the pandemic, especially during the first knockdown. My colleague James Tozer was the first to gather excess death data looking at excess mortality statistics. On the back of that, we started our Excess Mortality Tracker. Another colleague, James Fransham, looked at Google mobility data and people's behaviour during the lockdowns. After that, Google made the data public. The Economist also launched "The Jab", a podcast exploring the global vaccination race.

How do you come up with data stories?

I think there are two different ways of coming up with data stories. One is to find the data, analyse it and find a story in it. The other is to notice something in real life or be inspired by a recent news event. The next step is to look for the data. Some people in data journalism think that one way is more valid than the other. But I think both are excellent ways to find data stories.

What data journalists do you look to for inspiration?

There are so many really inspiring people in our field. Mona Chalabi is obviously one of the greatest inspirations. She's a trailblazer who has done some fantastic work. I also admire Jane Bradley, who is an investigative reporter for The New York Times. I follow academics, too. I reviewed Carl Bergstrom and Jevin West's book, "Calling Bullshit", which is about detecting shoddy data and lies. It is very similar to Tim Harford's Data Detective books, another person I follow.

In 2019 you spoke at a BBC conference in Manchester on breaking into data journalism. What advice do you have for fellow journalists trying to do the same?

I'm passionate about trying to show that there are so many different ways into data journalism. Getting into journalism full stop is hard. It's not always a straight line. I didn't graduate and then start as a data journalist at The Economist. Millennials have very high expectations, and we expect a lot from ourselves. It's essential to take your own time with it.

I always recommend networking, which can be difficult during this pandemic. I started by going to a brilliant meetup called JournoCoders in London. It's organised by Max Harlow, a developer who works for the Financial Times and Leila Haddou, an investigative journalist. Hacks/Hackers is another meetup I recommend. This isn't purely a data journalism meetup. Instead, you'll find a mix of developers and journalists giving presentations from different organisations.

Latest from DataJournalism.com

Our next conversation will feature data journalists Duncan Geere and Miriam Quick, the co-hosts of Loud Numbers, the new data sonification podcast. The pair will speak to us about what sonification means for data storytelling, how they got started and what stories work best for this medium. Prime yourself by reading the long read data sonification article they wrote earlier this year.

As always, don't forget to let us know who you would like us to feature in our future editions. You can also read all of our past editions here or subscribe to the newsletter here.

Onwards!

Tara from the EJC data team,

bringing you DataJournalism.com, supported by Google News Initiative.

PS. Are you interested in supporting this newsletter or podcast? Get in touch to discuss sponsorship opportunities.

Build a data hypothesis for your next story

Tara Kelly — Wed, 28 Jul 2021 01:30:00 +0200

Welcome to our latest Conversations with Data newsletter.

This week's episode features data journalists Eva Constantaras and Anastasia Valeeva, who joined us for a live Discord chat earlier this month. Drawing on their vast global experience of teaching data journalism, we discussed the power of using hypothesis-driven methodologies to tell data stories about hidden or forgotten communities.

You can listen to the entire podcast on Spotify, SoundCloud, Apple Podcasts or Google Podcasts. Alternatively, read the edited Q&A with Eva Constantaras and Anastasia Valeeva below.

What we asked

Talk to us about how you define what a data hypothesis is.

Eva: A hypothesis is an affirmative statement that can be proven true or false with data. So instead of starting with sort of a general overall question, you start with your theory and then use data to prove your theory, true or false. I have adopted this approach because a few years ago, I came across Mark Lee Hunter's book, "Story-Based Inquiry", which is actually a manual for investigative journalism, but it has a much wider applicability. He had one tip in that manual that really addressed many of the concerns that I had been having when I was teaching data journalism. He says a hypothesis virtually guarantees that you will deliver a story and not just a massive data.

What I saw a lot of in data journalism stories was less story and more data. So instead of delivering a concise, coherent set of information that could help citizens make better decisions in their lives, a lot of data journalism was presenting this massive data. To apply this approach, you must do your research, build a carefully constructed hypothesis and then focus your analysis on answering questions that can prove that hypothesis, true or false. In a nutshell, that's the idea of developing a data-driven hypothesis for this and leading a story-based inquiry approach when teaching data journalism.

How can a data hypothesis-based approach help unearth those untold stories about forgotten communities?

Anastasia: I found this data hypothesis approach to be very helpful, not only for teaching data journalism but also for mentoring journalists. This is because it helps divide the whole process of working on a story into measurable steps. You can prove or disprove each question that ultimately builds up into a broader hypothesis. When Eva and I have taught data journalism together around the world, the main challenge has been to explain to journalists what a data hypothesis is.

When data journalism is taught, often skills are the focus, and less emphasis is placed on critical thinking with data. But this is exactly where journalism comes in. We try to explain to our students that these data can help you find systemic biases and prove inequity in society. This very often depends on the statistics and how they are provided. Often the data is not very granular, but even if you have a sex-disaggregated data set, you can build a hypothesis. One could be females being disadvantaged in certain areas or males being disadvantaged in other areas.

What motivated you to create a manual for teaching data journalism?

Eva: What Anastasia just mentioned was what I was finding to be true. So when data journalism was being taught, journalists were learning how to build maps using California wildfire data, they were using OpenRefine to clean data looking at candidates running for New York City Council. So journalists were being taught full entirely out of context with data that didn't relate to their own work. I believe there was a backlash against data journalism because donors and media houses were investing a lot of money in what I call the "data journalism bootcamp model". For instance, in one week, we're going to introduce a bunch of data journalism tools to you and then go take them back to your newsroom and produce brilliant data journalism pieces.

I think part of that came out of Meredith Broussard's concept of "technochauvinism", the idea that technology will somehow solve all of our issues. So my starting point was, yes, the tools are important. But what is most important is the process. The journalists need to learn how to follow a research methods process to realise the story they've wanted to investigate.

How do you begin this process of teaching data journalism with this methodology?

Eva: I would start with journalists who are really passionate about their beat. Technical skills aren't that important. We would then discuss, based on the data we have, what hypothesis can be developed. And from that hypothesis, then we would then break it down into question categories. So once you have a hypothesis, how are we going to measure the problem with data? How are we going to measure what populations have been impacted by data? How are we going to measure the cause of the issue with data? How are we going to measure the solution? Once we bring in all those data sets and have all of those interview questions for our data, we're almost guaranteed to find a story at the end of the process.

The reality is that many of these newsrooms will not have the resources to have someone do data visualisation, have another person do the analysis, and another do a regression model. So with the resources we have, what kind of data-driven stories can come out of it? My methodology is based on how we use data to uncover these systemic issues, whether it's health care inequity or education inequities. In that case, it's less important to have granular data and more important to have very clear in your mind how you're going to approach this question and what specific area you're going to tackle through a hypothesis.

Anastasia: We worked in a critical period in Kyrgyzstan and Albania, where we completely fleshed out this programme, and it became a 200-hour course. So that's a five-week full-time course in total. So far, we've both taught it in other countries like Pakistan and Jordan. We've done part of it in Myanmar. The idea is that we're adapting to the realities of the data environment. We're creating a process for motivated journalists who really are passionate about the topic they're covering to have a way to produce data stories independently after they go through this programme. That's also why I stayed in Kyrgyzstan because I felt that even 200 hours is not enough, and I really wanted to make it work.

How do you adapt your data journalism training for audiences in the Global South?

Eva: At present, I'm teaching a data journalism course for the Mekong region. We are working with two organisations, Thibi and Open Development Cambodia, data consultants based in Myanmar and Cambodia, respectively. We spent quite a lot of time localising the manual. We localised it for the country or the region, and then by the sector. You can't learn data journalism from start to finish if you're jumping around from economic data to education data to environmental data. Early on in the process, we choose a sector or thematic focus, and then we localise all the content.

Half of the content is devoted to the journalism and the analytical skills research methods side. The other half is tool based. We realised that without localising it, these journalists would never be able to go through the entire data pipeline and data storytelling process and understand how to do it independently. That's why it is worth the investment and to slow down the training so that they're able to practise each of the skills.

Anastasia: Localising the content for the context is very important. It takes several days to study the local agenda, the local data sets and to use meaningful examples for the training. So it's not an easy process. I usually go to the local statistical agency website and start digging through to see what data is available and how it can answer some of the issues discussed in the news agenda.

This is how ideas are born. Once you show these journalists what is possible and apply it to their news coverage, they start thinking this way. Many people come to data journalism thinking that they will work with big data or do interactive visualisations. But what we teach is not really about that. It's more about finding the systemic biases and inequities in society.

What digital security precautions do you take when training data journalists in high-risk countries?

Eva: The one thing that we've actually found, and I first discovered this in Kenya, was that telling an investigative story with data, especially if it's data that's been made public by the government, can actually be safer than traditional forms of investigation. We try to stay away from leaked data and avoid having to hack sites to get the data we need. But we work with the data that's in the public domain or that we can get through FOIA requests. There are two advantages to that. There are very few data journalists in the country, which means nobody has actually explored this data. The second is if they're analysing publicly available data, the story they can find can be much safer to publish because it's much more difficult for the government to criticise a source originally from them.

When you pitch a data journalism story to an editor, how much of your hypothesis should you explore and develop on your own before sending in your pitch?

Eva: That's a good question. Often the biggest barrier for data journalists to do data journalism in the newsroom are editors. We see many problems with editors trying to embrace this idea of long-form or explanatory reporting around a specific beat or subject. Try to demonstrate to the editor that you have an evergreen topic, your initial hypothesis and several other hypotheses that you could explore going forward. For instance, that could be a series on crime or unemployment. I would try to show the editor that not only do you have a sound hypothesis, but you have a plan for how your time investment is going to pay off for them in the long term. Demonstrate that it's not just a one time burst of effort for one high profile story, but you have a plan for making them sustainable in the long term.

Anastasia: I would say it really depends on the context. For instance, in Central Asia, oftentimes, the editor would not have worked with data journalism before. So it is important to ensure everybody understands what it is and is on the same page to create a dialogue between the journalist and the editor. So that's why we also try to engage editors in some of our training sessions. This allows them to understand the methods of data journalism and the amount of time required to tell a data story. Journalists need to have done their background analysis -- a key part of Eva's manual.

Finally, what are some of the big misconceptions you encounter from journalists new to data storytelling?

Eva: The most common misconception is that the tool is going to do the thinking for you. Often there's this notion that if you just combine the data and the technology, somehow, a story is magically going to emerge. So I think going in with the attitude that somehow data will make the storytelling process easier is a pretty common misconception. Another common mistake I see is wanting to jump straight to the visualisation. It's important to understand you need to have a solid analysis underlying your visualisation beforehand. We have to instil the idea that data can be quite tedious but rewarding because of the stories you uncover. Data journalism is a lot of trial and error -- learning these tools and doing a lot of research to pick up the skills you need to tell the story you want to tell.

Anastasia: One big stereotype about data journalism is that it always involves working with big data. In our training, we spend some time discussing what big data is and what it isn't. When we teach data journalism, we often start by working with small data. Another area of concern is when people download and start digging into a dataset and immediately begin drawing out random conclusions. In the end, this doesn't make a story. All this does is provide a description of the data set. That is why the hypothesis method is useful -- it shows you how to interpret the story, to lead the exploration with questions, not just randomly perform calculations on the data set. Attaching human meaning to the data is something that is a skill to be learnt.

Latest from DataJournalism.com

Spreadsheets are the backbone of data journalism. To help journalists unlock hidden stories from those imprisoned cells, read Abbott Katz's long-read article. Drawing on data from ProPublica and the European Centre for Disease Prevention and Control (ECDC), he provides mini-tutorials to help you use spreadsheets at speed. Read the full article.

Our next conversation

Our August conversation will feature Marie Segger, a data journalist at The Economist. She will speak to us about launching "Off the Charts", The Economist's data newsletter. We will also hear about her learning and career path into data journalism as well as her thoughts on the best way for data teams to collaborate.

Latest from European Journalism Centre

GNI Startups Lab Europe is open for applications! Created in partnership with the Google News Initiative (GNI), Media Lab Bayern, and the European Journalism Centre (EJC), the programme supports early stage news organisations with workshops, coaching, high-profile networking opportunities and up to €25,000 in grants for running revenue-generating experiments. Are you ready to grow your digital news business? Apply by 20 September 2021.

As always, don't forget to let us know who you would like us to feature in our future editions. You can also read all of our past editions here or subscribe to the newsletter here.

Onwards!

Tara from the EJC data team,

bringing you DataJournalism.com, supported by Google News Initiative.

PS. Are you interested in supporting this newsletter or podcast? Get in touch to discuss sponsorship opportunities.

Delta variant vs. the vaccine rollout

Tara Kelly — Wed, 14 Jul 2021 13:43:00 +0200

Welcome to our latest Conversations with Data newsletter.

Confusion and uncertainty continue to grow over the numerous variants spreading across the world. To help us navigate the latest vaccine research and help explain how journalists can best cover this ever-complicated narrative, we invited vaccinologist Dr Melvin Sanicas back to the podcast.

In this episode, he covers everything from how the Delta and Delta plus variants differ to the pros and cons of mixing vaccines. He also explains how boosters work and why journalists should avoid using pre-prints in their reporting.

You can listen to the entire podcast on Spotify, SoundCloud, Apple Podcasts or Google Podcasts. Alternatively, read the edited Q&A with Melvin Sanicas below.

What we asked

Talk to us about the Delta variant and the Delta plus variant. How are they different?

Delta was designated as a variant of concern because of evidence of increased transmissibility. The increase in the reproduction number compared with the Alpha variant, which is the B117 is estimated to be around 55 percent. Given the increase in transmissibility, the Delta variant is expected to rapidly outcompete other variants and become the dominant variant over the coming months.

Delta plus is a sublineage of Delta. In India, researchers observed this K417N mutation, another kind of mutation on top of the mutations of Delta. So they called it Delta plus. While it has not yet been designated as a variant of concern by either the WHO or the CDC, Indian health authorities are closely monitoring this Delta plus variant because it's been reported in 11 countries, while the Delta variant has been reported in over 98 countries. It's not yet clear if this Delta plus carries additional risks or is associated with increased transmissibility like the Delta variant.

What new symptoms are associated with the Delta variant?

Researchers in the UK have reported a shift in symptoms that may be associated with the Delta variant. There's this application that's being used in the UK, called the Zoe App. People who have downloaded this application have reported symptoms like headaches, sore throat, runny nose and fever, similar to what people may experience with a bad cold. This is also similar to what doctors in the U.S. have encountered so far. More physicians are seeing more upper respiratory complaints such as congestion, runny nose and headaches, which were not very common in the previous versions of SARS-CoV-2. It's not yet clear why cold-like symptoms are increasingly being reported or if there is a link, if any, to the Delta variant. But this is something we are closely following.

Why do we see an increase in cases if certain countries in Europe have fully vaccinated half of their populations?

We tend to automatically think that this is because of the virus or the variant. But that's just one of the many reasons why cases are increasing. We should remember that this respiratory virus is spread mainly between people who are in close contact with each other or through aerosols or droplets in close spaces. An infected person needs to bring the variant to another person, and they don't do it on their own.

The countries seeing an increase in cases have moved out of restrictions even before they have fully vaccinated 50 per cent of their population. Because of social mixing and mobility, the number of gatherings has also increased. So we are expecting -- and we are now seeing an increase in cases. But what's important to remember is the fact that the number of hospitalisations and deaths are not increasing as much as the number of cases. That is showing us that the vaccines are actually effectively taming COVID-19. The vaccines remove COVID-19's ability to make people severely ill or put them in hospitals or kill them.

How much more severe is the Delta variant compared to the Alpha variant?

It is definitely more transmissible. We've seen a lot of data on that now. But in terms of the severity, more research is needed. But there are indications that the Delta variant may cause more severe disease. A study published in The Lancet in June looked at the impact of the Delta variant in Scotland, where it had become the dominant strain. They found out that the risk of hospitalisation from COVID-19 was roughly doubled for patients infected with Delta than people infected with the Alpha variant. Also, researchers in the UK are seeing similar trends in terms of the numbers of patients showing, as we've discussed earlier, different symptoms. So it might be the case, but I'm sure this is something that we will be seeing a clearer picture in the next few months.

Talk to us about the vaccine efficacy rates of the different vaccines.

This is the million-dollar question. The vaccines we have at the moment appear to offer good protection against the Delta variant. Most virologists and vaccinologists agree that fully vaccinated individuals likely face little risk with the Delta variant. Moderna, for example, announced last week that the vaccine is effective against the Delta variant.

Similarly, promising results have been found with both the Pfizer-BioNTech and the AstraZeneca vaccine. In fact, an analysis released in June by Public Health England (PHE) found that two doses of the Pfizer vaccine were 96 percent effective against hospitalisation from the Delta variant and two doses of the AstraZeneca vaccine were 92 percent effective. A previous analysis from the PHE also found that a single vaccine dose was less effective against symptomatic illness. The message here is that two doses are needed.

In Israel, where 57 percent of the population is fully vaccinated, a recent spike in COVID-19 cases was reported with the Delta variant, including infections amongst vaccinated individuals. But this did not mirror an increase in hospitalisations and deaths. As for Sinovac, the vaccine still offers protection. In the Guangdong province, where the first cases of the Delta variant were reported, none of those who were vaccinated developed severe symptoms. And all of those severe cases in Guangdong were from unvaccinated people. So that's also a good thing.

Sputnik V released a press release last week saying that the vaccine is 90 percent effective against the Delta variant. This is slightly lower than the reported vaccine efficacy against the original version of SARS-CoV-2. So that the high-level message is that vaccines work. Two doses work better than one, and the more people who get vaccinated, the better for everyone. Though, there's likely not much cause for worry amongst people who are fully vaccinated. Outbreaks can happen in places with low vaccination rates. And at the moment, most countries in the world have very low vaccination rates.

The UK plans to roll out boosters in September. Explain to us what boosters are and why they matter?

Vaccines protect us from dangerous pathogens, and once you have had your shot for a particular disease, you might think that you're always safe from it. But that's not necessarily the case. For some diseases, you need another shot to build strong immunity, and for others, your protection wears over time. Some viruses change or mutate like the flu over time, making your vaccine less effective. For most vaccinations, you need an extra dose of the vaccine known as a booster to help your immune system remember the pathogen. For COVID-19, we still do not know whether we need booster shots, but the elderly or the immunocompromised people may need boosters in the future.

What research is out there about mixing vaccine doses?

A study in Spain called Combivacs showed that vaccinating people with both the Oxford AstraZeneca and Pfizer/Biontech COVID-19 vaccines produce a potent immune response. A similar study in the UK called Com-Cov, analysed combinations of the same two vaccines. They found that people in the mix and match groups had higher immune responses and experienced higher rates of common vaccine-related side effects, such as fever if you compare these people to those who received two doses of the same vaccine.

Giving people first and second doses of different vaccines makes sense, but we do not know what will happen if people need a third dose to prolong immunity. Will it work as well? We are not sure. But this is already being done in many countries, including Germany. In fact, German Chancellor Angela Merkel took Moderna after receiving AstraZeneca as the first dose. This is now more accepted because we have some scientific data to show that this is OK.

What are the biggest blunders you are seeing with COVID-19 reporting in the media?

The one thing that I've seen a lot lately is the kind of reporting which says, "28 fully vaccinated doctors in Indonesia dead". However, the article fails to account for the denominator of say, 180,000. You don't protect everyone from Covid. We never said that Covid vaccines protect 100 percent of the population against infection.

And then, once a person is infected, there are other factors involved. Is this person healthy? Does this person have chronic medical conditions? Does this person have access to good health care? Or is this person a doctor in a small town where he has no choice but to work 72 hours straight? These things are not presented clearly. Or if they are presented, it is at the end of five paragraphs, and people don't read up to the fifth paragraph.

I saw another article saying, "Countries using Chinese vaccines continue to see Covid cases". These are a misrepresentation of what's happening. For example, you're saying that countries using Chinese Covid vaccines have cases, but you see Covid cases in the United States, in Israel and the UK that do not use Chinese vaccines.

What final piece of advice do you have for journalists?

I think, for the most part, journalists are keeping themselves up to date with whatever is out there in terms of new research on COVID-19. One thing that is important to remember not to report on preprints -- publications that have not been peer-reviewed yet. This is because pre-prints are useful, but they could change. Last year, many journalists were reporting preprints like they were a real peer-reviewed publication. At the moment, I don't see a lot of that, which is a good thing.

Latest from DataJournalism.com

Upcoming conversations

Did you miss our live Conversations with Data Discord chat last week with Eva Constantaras and Anastasia Valeeva? Worry not -- the edited version will be out in our next issue. The pair discussed building a hypothesis for your data storytelling and the different training opportunities around the world for data journalists.

As always, don't forget to let us know who you would like us to feature in our future editions. You can also read all of our past editions here or subscribe to the newsletter here.

Onwards!

Tara from the EJC data team,

bringing you DataJournalism.com, supported by Google News Initiative.

PS. Are you interested in supporting this newsletter or podcast? Get in touch to discuss sponsorship opportunities.

Closing the data literacy gap

Tara Kelly — Wed, 23 Jun 2021 13:46:00 +0200

Welcome to our latest Conversations with Data newsletter.

This week's Conversations with Data episode features Data Literacy's CEO, Ben Jones, who joined us for our first live Discord chat earlier this month. He spoke to us about how data journalism can help close the data literacy gap. We also heard from him about his new book, "Learning to see data: How to interpret the visual language of charts".

You can listen to the entire podcast on Spotify, SoundCloud, Apple Podcasts or Google Podcasts. Alternatively, read the edited Q&A with Ben Jones below.

What we asked

What's the best way to describe the term data literacy?

That's a great question. I think different people understand it in different ways. One of the textbook definitions you can find online is that data literacy can read, understand, create and communicate data as information. I like that definition quite a lot because it covers a lot of ground. But I also really like a definition that I found in one of the early research papers published in 2013 on this topic by Javier Calzada Prado and Miguel Ángel Marzal. They define it a little differently and look at it from the point of view of someone trying to understand how information is collected, stored, and communicated. They say that data literacy can be defined as the component of information literacy that enables individuals to access, interpret, critically assess, manage, handle and ethically use data. That was helpful for me as I began to look to launch my business and try to understand how I could help other people effectively speak the language of data.

How did your career intersect with data journalism?

I started in engineering back in the late 90s at UCLA in the Los Angeles area where I grew up. I have a degree in mechanical engineering, and I spent the early years of my career designing products for automotive and medical devices. Then gradually, I started to use statistics and data visualisation more and more to the point where I really saw that it was a very powerful medium to both learn about the world and communicate that to others. I started to write a blog called DataRemixed.com. That allowed me to begin getting connected with others who were excited and passionate about data visualisation, including many data journalists I greatly admired.

I think the big connection happened when I was asked to move up to Seattle to work for Tableau Public, a free offering for anyone who wants to tell interactive data stories on the web. Of course, a huge contingent of Tableau users was at news organisations from all around the world. That led me to many different data journalism conferences where I started to present, train and work with journalists who were preparing interactive graphics about everything from elections to the Olympics to everything in between. And that really led me to learn a lot from them.

Tell us how you've been inspired by data journalists and their work.

Yes, I was inspired by so many data journalists. A handful that comes to mind -- the team that runs La Nación data blog in Buenos Aires includes Momi Peralta, Gabriela Bouret and others. This is just a really inspirational team. They started off with just a small number of individuals who were diving into data for the first time. But they were under a lot of pressure and were uncovering corruption in the political climate in Argentina. They were finding things that were embarrassing for politicians. There were even examples where the government was one of their financial sponsors and they were pulling ads -- if I recall correctly. But they were very dedicated. They would make requests of the government for data and they would deliver them in these boxes of paper and they would gather as many people as they could to sift through them for stories and digitise them. They were devoted to it and they just went after it and were publishing these really important articles that were exposing some corruption -- and at great risk to themselves sometimes. So if that's not inspirational, I don't know what it is.

Journalism has faced a lot of economic challenges in the digital world. One way publishers have approached this is through the attention economy and clickbait. How do you see this impacting the data journalism space?

That's certainly a risk. If the charts and graphs only show one side of the story or only one very biased angle to the story, people might walk away with a very skewed view of the world. Unfortunately, people are much more likely to want to share that. There is a profit motive there, and it runs counter to the best journalists I know who believe in journalism as a public good and try to tell accurate stories, perhaps that embrace multiple views that don't show an overly biased perspective.

Readers who are highly literate are more likely to see through that and understand that bias. At least that's my hope, that if we can help raise data literacy levels across the board, that people will be more likely to recognise when someone might be misleading them or telling an overly biased version of the story. I think it's a challenge connected to it and not in any way dissimilar from the broader human challenge of communicating thoughts, feelings and facts. The charts are just one outworking of that overall communication challenge.

What advice do you have for people to understand data that is inherently uncertain?

There are ways to communicate uncertainty. The chart that has crisp, clean lines and places points in precise spots conveys a notion of precision. But it's possible to make them fuzzier. For instance, you could use error bars or ranges to blur the lines a little bit. Studies show that sketch styles with wavy lines can be perceived as less accurate or precise. In a situation where the data is highly uncertain, that might be appropriate. It might be appropriate to use some design techniques to convey that.

Oftentimes those can even seem more confusing or intimidating to a layperson who may not be well versed in statistical methods. I think some journalists that I've seen out there are trying to find ways to translate those techniques into ways that a reader might not need that technical education to understand. But I do think it's a big challenge.

At Data Literacy, have you developed any courses specifically for data journalists?

We don't have courses specifically designed for data journalism right now, but I designed them right after I left Tableau. So because of that, there are many examples peppered throughout that have journalism style examples. I've found that that's actually interesting for people in business. They like that because it's data about their world instead of about a fake store that's not even a real company. For instance, we talk about deforestation and then learn to visualise that data. I believe that crosses over well into the journalism world. I imagine our courses will be more interesting to someone in data journalism, and that's probably because I've been so influenced by it.

How should those who want to become data journalists approach training?

Should data journalists or those who want to learn that discipline seek out training that's specific to that field? I think the answer is yes. The best learning you're going to get is from individuals like Cheryl Phillips at Stanford University and Christian McDonald at The University of Texas at Austin. Those are individuals that aren't just teaching you the techniques and tools for the field. They're also teaching you the context and the challenges in the newsroom. I had a chance to visit the University of Missouri, and they have a fabulous data journalism programme. I still think you're probably going to get a lot of value out of data journalism at the university level because that's where many senior and experienced journalists go to learn.

But there's also probably some value in just learning the tools through the training you're going to find online. It's just going to be up to you to translate what you're going to learn, which may be about something that isn't really applicable to the news cycle. You're going to say to yourself, 'Well, OK, how can I translate this?' So I wouldn't say don't take data science or data analytics courses that aren't geared toward journalists. I think that you will probably find some better tool based training over in that world. But I do think it's helpful to then go beyond and look at resources that organisations like Hacks/Hackers so you can talk to individuals who are applying those tools in the newsroom. There are so many great, generous individuals in data journalism who are happy to share their knowledge.

Latest from DataJournalism.com

How can journalists use a hypothesis-driven methodology to build a succinct narrative that serves forgotten or overlooked communities? Eva Constantaras and Anastasia Valeeva share their expertise. Read the full piece here.

How can data journalism be used to ensure the accuracy and impact of war reporting? Sherry Ricchiardi provides a journalist's guide for using data to report on conflict-affected regions. Read the full long read article here.

Our next conversation

Our next Conversations with Data podcast will be a live Q&A with Eva Constantaras and Anastasia Valeeva on Tuesday 6 July at 3pm CEST / 9 am ET. The pair will discuss the power of building a hypothesis for data journalism, a topic covered in our latest long read article. The conversation will be our second live event on our Discord Server. Join the live recording on Discord and share your questions with them.

As always, don't forget to let us know who you would like us to feature in our future editions. You can also read all of our past editions here or subscribe to the newsletter here.

Onwards!

Tara from the EJC data team,

bringing you DataJournalism.com, supported by Google News Initiative.

PS. Are you interested in supporting this newsletter or podcast? Get in touch to discuss sponsorship opportunities.

Designing better data visualisations

Tara Kelly — Wed, 26 May 2021 16:29:00 +0200

Welcome to our latest Conversations with Data newsletter.

We are excited to announce the launch of The Data Journalism Handbook: Towards a Critical Data Practice published by Amsterdam University Press, supported by Google News Initiative and the European Journalism Centre. The handbook provides a rich and panoramic introduction to data journalism, combining both critical reflection and practical insight.

Edited by Liliana Bounegru and Jonathan Gray, it offers a diverse collection of perspectives on how data journalism is done around the world and the broader consequences of datafication in the news, serving as both a textbook and a sourcebook for this emerging field. Read the full book on DataJournalism.com.

Now on to the podcast!

In this week's Conversations with Data episode, we spoke with Maarten Lambrechts, a freelance data visualisation consultant based in Belgium. He talks to us about his journey into the world of data visualisation and how he helps other organisations like the World Bank communicate their numbers by evaluating, designing and developing static and interactive visualisations.

You can listen to the entire podcast on Spotify, SoundCloud, Apple Podcasts or Google Podcasts. Alternatively, read the edited Q&A with Maarten Lambrechts below.

What we asked

Tell us about your career. How did you find your way into data journalism?

Like many people in the field, my career path is filled with twists and turns. I graduated with a degree in Bioengineering, Forestry and Nature Conservation. Then I worked as an engineer for a couple of years before moving to Latin America. I lived in Bolivia for over two years where I worked as an agricultural economist. I also started blogging to keep my friends and family at home informed. That's how I learnt how websites work. I also started writing for a small magazine here in Flanders about my experience abroad. After coming back to Belgium, I became a webmaster. Because of my background in engineering, I had an affinity for numbers. That was when data journalism started to take off. I began experimenting with data journalism and making visualisations. I was later hired as a data journalist by De Tijd, a newspaper based in Flanders. After working a couple of years there, I decided to go freelance.

Most recently the World Bank hired you to help create the World Development Report 2021. Tell us more about that.

The World Development Report is one of the World Bank's flagship publications. It is translated into many languages and is published in print and a PDF. Because this year's topic focused on data, they wanted a digital version with a modern shape. The report's tone of voice is a bit lighter. Readers will find visualisations and animations in the articles. I developed three of the stories. This involved building the storyline, designing the visualisations and programming the stories.

Tell us about those stories you worked on.

The stories focused on competition, the new data economy and how governments could take action to have a healthy ecosystem in the data market. One story I wrote examined how countries regulate the flow of data across borders. Some countries have decided that whenever data is produced, there should always be a local copy within the country. Others have a much more liberal approach that does not control what data is flowing in and out of the country. These different models have different advantages and disadvantages. Another story was about building trust and regulating the data economy. More specifically, it looked at cybersecurity and how countries are governing the protection of privacy and trying to enable the data economy by having good open data laws.

Describe your workflow. What software and tools did you use?

For the visualisations, I used R programming language and ggplot2 package. I made very rough sketches in Google Slides to show how the storyline would unfold. It was a collaborative process. For instance, we had many discussions with the authors of the chapters to develop this. They had a big say in how it should look and how the story would flow. To develop this we had a React-based template to programme the stories.

Who else did you collaborate with on the report?

For the data-heavy visualisations, I worked with Jan Willem Tulp, a freelancer in The Netherlands. We split up the chapters in the book and gave each other some feedback. We also worked with Beyond Words, a data visualisation agency specialising in creating narratives with data. They worked on the more conceptual stories and designed the styling and provided the template to build our stories.

How do you think journalists should handle missing data?

I think one important aspect is the metadata -- the data about the data. Sometimes we must be more explicit about the metadata and maybe even show it. I'm now involved in a new project where we are not just thinking about visualising the data, but also visualising the metadata. For instance, showing how many missing years we have in the data or whether the data is from a government database or a survey. Being more explicit about the metadata can help you as the data journalist and the reader. Data journalists should ask themselves what data is missing and whether they can still draw conclusions from this and tell their story.

What story are you most proud of?

My favourite is a project I worked on a couple of years ago. It's called Rock n' Poll. It's an interactive story where you learn how political polls work and what that means for the uncertainty that is inherent in the polling results. At the time that I built it, I was working at a newspaper and I was really annoyed about how journalists were reporting on political polls. They were focussing on very small differences, well within the margin of error. I wanted to develop something that could explain that uncertainty without using any statistical formulas.

Finally, what advice do you have for journalists who want to begin coding?

One of the best decisions I made in my career as a data journalist and data visualisation expert is to learn ggplot2 as part of the tidyverse. It allows you to build visualisations layer by layer. Once you understand how the system works, you can get really creative and quickly design data visualisations. If you're an Excel user and you're interested in learning to code, start with the tidyverse in R. It's a set of packages that all work well together. I found this easier to work with and understand than base R. As an exercise, I recommend you try to do all of the data operations you'd normally perform manually in Excel and do them in tidyverse. This will allow you to have a script that does all the manual manipulation that you would do in Excel that you can rerun in R.

Latest from DataJournalism.com

Our next conversation

Our next Conversations with Data podcast will be a live Q&A with Ben Jones, founder of Data Literacy, an organisation aiming to help you learn the language of data. He will reveal his top tips and tricks for becoming an effective data storyteller. The conversation will be our first live event on our Discord Server! Join the conversation to take part and find out more.

As always, don't forget to let us know who you would like us to feature in our future editions. Share with us your thoughts on our Discord Server. You can also read all of our past editions here or subscribe to the newsletter here.

Onwards!

Tara from the EJC data team,

bringing you DataJournalism.com, supported by Google News Initiative.

PS. Are you interested in supporting this newsletter or podcast? Get in touch to discuss sponsorship opportunities.

Visual storytelling inside The Pudding

Tara Kelly — Wed, 12 May 2021 13:57:00 +0200

Welcome to our latest Conversations with Data newsletter.

In this week's Conversations with Data episode, we spoke with Jan Diehm, a senior journalist engineer at The Pudding. She talks to us about her creative process for designing visual stories, what coding languages she can't live without and how to pitch The Pudding.

You can listen to the entire podcast on Spotify, SoundCloud, Apple Podcasts or Google Podcasts. Alternatively, read the edited Q&A with Jan Diehm below.

What we asked

Tell us about your career. How did you develop an interest in visual journalism?

I always knew I wanted to do journalism. I was in a high school journalism class where I did the design for the newspaper. I was also a photographer. That was my first step into the visual side of journalism. I went to college to study photojournalism, but then the design side had a stronger pull. So I ended up doing a lot of newspaper page design at college, and my first job out of school was at the Hartford Courant doing the sports pages. I really just enjoyed the puzzles of fitting visuals together. After a few layoff situations -- because that's what happens in the industry -- I had to reinvent my brand and vision of visual journalism. That eventually led me to data journalism and developing and designing stuff for the web.

Who founded The Pudding and how did it come about?

The name comes from the old adage, "the proof is in the pudding". In 2017, Matt Daniels, Ilia Blinderman and Russell Goldenberg wanted it to be called "The Proof". But the name was already taken. They were joking around and said, "what if we just called ourselves The Pudding?" It was just a joke, but then it stuck. That name reflects the playfulness of the publication. It's inviting and unintimidating, but then it's attached to the rigour and investigative intentions.

What advice do you have for journalists pitching The Pudding?

My first piece of advice is don't be intimidated. The second thing I would say is to make sure your idea is looking for a deeper truth. For those new to data journalism, you want to explore the data and every single asset of it. You want to know what the shape of it is and how it looks like. Those raw data explorations don't work well for our platform because we want to find the deeper truth or the kernel that really connects it to how people live their day to day life.

The first questions you ask are the whats and the wheres. At The Pudding, we're trying to answer the how and why questions with the data. So I think we're trying to add that qualitative piece in a lot of our reporting in the projects that we do. Taking that next step and asking yourself, "Why is this important? Why am I intrigued by this idea?" If you're really passionate about it and it excites you, that comes through in your pitch as well. At The Pudding, you'll see a lot of personal stories that draw on our own lived experiences that lead us down a path.

What story are you most proud of?

I think the piece examining the size of women's jean pockets is definitely what I refer to it as my magnum opus. It is what I have become most known for on the Internet and what people ask me about the most. There wasn't a data set there. That meant me and Amber Thomas had to go make it. We couldn't go online and just look at the product descriptions of all these jeans and get pocket measurements because they weren't there. We both physically went to stores and measured these pockets. We developed a system for measuring before we went out. As we were two white ladies, how we looked afforded us the ability to go collect that data safely. After we got the data, we were able to put it online in a piece that really resonated.

What is your creative process for coming up with story ideas at The Pudding?

We all keep a document that we call our personal backlogs -- it's just the most random of ideas that we have come into our head at any point in time. I'm a millennial, a child of the Internet. That means a lot of my ideas come straight from there. We all consume a lot of different media and different things. The more that we consume, the better our output.

As a team, we do a process that we call "storytime". It involves bringing those little snippets of ideas and saying, "Hey, this is an idea. What do you guys think of it? How can we make it better? What's the next approach to it?" It's about seeing if this idea excites the team. You can see it in people's body language or how quick they are to respond to an idea. That's the best gauge.

What programming languages do you know? How did you learn them?

I know HTML and CSS, which I taught myself. But then I came into the data visualisation side of it through D3. I learned D3 before I learnt JavaScript by taking some classes at General Assembly when I was living in New York City. I also took a D3 data visualisation night class. This involved applying what I learned from classes to on the job. I also learned from other people that I worked alongside. I now know JavaScript well. Recently at The Pudding, we've started using Svelte, which is a new framework library developed by Rich Harris at The New York Times. It's an awesome framework for beginners because the syntax just makes more sense. For instance, it is like you're writing closer to plain English than you would in other languages.

Finally, what is next for The Pudding?

We're currently hiring for a managing director role for our editorial operation. Right now we're juggling different projects at the same time and we have no centralised system for how we ensure our pieces have an editorial standard and somebody to look over them. This role would also involve managing our freelance pool. This will allow us to grow as a platform beyond our core team and be a voice and a source for other storytellers as well. We are trying to keep the momentum going and continually innovate. That involves trying to figure out what's next. What's the next big tool or medium for telling visual data stories. I don't know if we have it yet, but we're always looking for it.

Latest from DataJournalism.com

Our next conversation

Our next Conversations with Data podcast will feature Maarten Lambrechts, a freelance data visualisation consultant based in Belgium. He will speak to us about his work as a consultant and how he helps organisations communicate their numbers by evaluating, designing and developing static and interactive visualisations. Most recently, he has worked with Eurostat, Thomson Reuters, Google News Initiative and The World Bank. Before he turned freelance, he worked as a data journalist at Flemish newspaper De Tijd. Check out his portfolio here.

As always, don’t forget to let us know what you’d like us to feature in our future editions. You can also read all of our past editions here. You can also subscribe to this newsletter here.

Onwards!

Tara from the EJC Data team,

bringing you DataJournalism.com, supported by Google News Initiative.

P.S. Are you interested in supporting this newsletter as well? Get in touch to discuss sponsorship opportunities.

Tackling quantitative imperialism

Tara Kelly — Thu, 29 Apr 2021 15:01:00 +0200

Welcome to our latest Conversations with Data newsletter.

Our team at DataJournalism.com recently launched a data journalism Discord server for our community. We are looking for data journalism enthusiasts to test out the BETA version. So far we've had over 300 users join our server. Come share your latest piece, connect with your peers or tell us what topics you're interested in reading or listening to! It will only take a minute to create a Discord account. Sign up here!

Now on to the podcast!

In this week's Conversations with Data episode, we spoke with Professor Deborah Stone, a renowned policy expert and political scientist at Brandeis University. She speaks to us about her latest book, "Counting: How We Use Numbers to Decide What Matters". She explains how policymaking is shaped by the worship of numbers and why journalists and policymakers should be sceptical of "quantitative imperialism".

You can listen to the entire podcast on Spotify, SoundCloud, Apple Podcasts or Google Podcasts. Alternatively, read the edited Q&A with Professor Deborah Stone below.

What we asked

Tell us about your career path. How did you first become interested in social science?

When I started out, I really grooved on math and science. Math and science came easily to me, unlike literature, history and social studies. Besides, I was searching for the meaning of life. In biology, I looked through a microscope and I was just enchanted that there was so much pattern in the world. I thought the answers were to be found in science and math.

When I got to college, I majored in Biology. But my friends told me that the best professor was the teacher who taught Introduction to Government. I had no interest in politics, but I took it. And it turned out the whole first semester was Political Theory. We started with Plato and we marched on up through all the political theorists. I thought "This is wonderful. This is a subject that's about how to make justice in the world." That seemed like the most noble aspiration you can imagine.

I went to graduate school in political science. As I came out of graduate school, a new field called public policy emerged: the study of what government does to make life better for people, to solve problems and improve people's well-being. It combined the philosophical quest for justice and goodness with the practical "how do we accomplish it?" So that is how I came to do what I do.

In the 1960s, quantitative analysis became central to Political Science. How did that sway you to write this book?

I had my first run-in with what I call "quantitative imperialism" when I took my undergraduate Introduction to Economics class. I remember the professor put up a whole bunch of supply and demand curves on the blackboard and talked to us about how people decide what to buy or how much to buy according to the price, and that everybody works to maximise their self-interest and get the most for the least. I raised my hand and said, "I don't think price is the only thing people think about. I think that's really an oversimplification." And the professor said, "Yes, I agree with you that it's an oversimplification, but if we strip things down and we make this simplification, it can lead to some very powerful predictions."

Something rubbed me the wrong way about the idea that you could ignore things that are really important to people and then say that you're making powerful predictions. As the decades progressed, we went to what people are now calling "market fundamentalism". We've organised entire societies and political economies on the idea that people pursue their self-interest and care only about maximising either profit or personal gain.

Photo credit: © Goodman / Van Riper Photography

What did you hope to achieve by writing this book?

It's a bit grandiose, but I wanted to launch a counter-revolution against quantitative imperialism. I think there's so much worship of numbers, especially in public policy. Government calls for "evidence-based" policymaking by which they mean numbers. Journalists call for facts by which they mean numbers. At universities, most professors tell students, if you don't do statistics, you won't get a job or you're not doing real science.

A lot of mathematicians and economists, even ones who are very critical of our use of numbers in a good way, claim numbers are the most powerful instrument we have to understand reality. If they are objective and if those numbers really can tell us the facts and the real patterns in the world. I don't think that you can get much understanding of human character by counting things. I don't think you can get much understanding of human relationships by counting. Sure, we have algorithms that can fix people up on blind dates and a lot of them work out into very happy marriages. But algorithms can't fix marital problems.

Who is the audience you intended to reach?

The book aims to encourage a little bit more scepticism about all these claims. I think that you'll be a better data journalist and a better data scientist if you know where your data come from, meaning the raw numbers, how things get counted. The book is really for everyone interested in public affairs. Everyone who enjoys reading the news and wants to understand it and make sense of information about how we live our lives. I want no one to be intimidated by fancy statistics.

In the book, you examine early childhood development and how we learn to count. Why is that important?

That's really the key to my book. I want people to understand the mental process behind counting. I'm not talking about PhD brains here. I'm talking about pre-school, because we all learn to count usually before we go to kindergarten. I often ask people if they can remember learning to count. I haven't found anyone who remembers learning how to count. Some people remember learning how to read, but no one remembers learning how to count or learning how to talk. It's something our parents teach us and the adults around us teach us, but we just absorb it somehow.

When kids learn to count and when they learn to talk, they're really learning how adults categorise things. They're learning what fits under the word cookie, for example, and what things should be counted, therefore, as a cookie. This is why we can't talk or think about things without categorising them. So I would argue that categorising is the most powerful tool we have for understanding the world, not counting, because categorisation comes before counting.

In the book, you talk about how numbers have the aura of power. Tell us more.

I'm a political scientist, so I see power everywhere. I think that's maybe how I look at numbers a little bit differently from economists and mathematicians, because I'm looking for how numbers exert power. Hannah Fry says numbers are a source of comfort. And I think in the COVID-19 pandemic that has been true. They offer an illusion of certainty. They offer us hope of control and certainty in numbers.

Another way numbers exert power is that they include and exclude people. So all these scoring systems for job hiring, for giving insurance, bank loans, giving pay raises and promotions -- all of those boost some people up and leave some people out. Numbers can be used to draw lines and include and exclude people.

When it comes to race, how would you redesign the categories in the U.S. Census if you could?

This is a giant conundrum and I don't have a good answer for it. I've thought about it a lot. I've read everything I could find about the census -- but here's my dilemma. Race is the quintessential thing that can't be counted. It's an idea that some people have about other people. It's a social construct. It's not a thing. And there's a simple reason why we can't count people's race: sex. Even if you assign a race to two people, what do you do when those two people have kids? How do you count their kid?

In the United States, we have this sickening tradition of the one-drop rule that was applied to people who at the time were called Negroes. If they had one drop of Negro blood, they would be classified as Negro and therefore banned from all things that whites could do. But all people are mixtures of backgrounds themselves and then they produce kids. A mathematical formula for race makes no sense.

One of the striking things I found out in researching this book is how predictions are made about how in the year 2042, the United States will no longer have a white majority. It will be "minority-majority". All those predictions are based on a little categorising decision that the U.S. Census Bureau does when it makes projections. It takes the children of people who are in "mixed-race marriages" and counts them as the same race as the non-white parent. So the more racial integration we have -- and what could be a better sign of racial harmony and integration than intermarriage -- the more the proportion of minorities increases. This way of categorising who counts as a minority, therefore, sends out a frightening message to whites. This is a thing that is driving white supremacists crazy. That is the real danger of counting race. I think it can't be done. There's no correct way to do it given human reproduction, so I am not sure there's a good way to do it.

But here is the other side of my dilemma. In the United States, we have counted race since the beginning and we have made laws and policies and continue to do that on the basis of race. The effects of race-based policymaking and decision making are playing out and will play out forever, into the future. That's why we have so much wealth inequality between the races in the United States -- because blacks weren't allowed to own homes and they weren't given mortgages. I think that the dilemma is how can you begin to remedy these inequities unless you take account of race.

Finally, what should data journalists be mindful of in their data storytelling?

Don't take the numbers for granted. Ask three questions about the numbers. What was counted and what was not counted? Who counted? And why did they count? I think those are the same questions you'd ask as a journalist: what happened, who did it and why.

Latest from DataJournalism.com

Discord, Twitch, Clubhouse, are just a few of the social media platforms that are currently generating hype. Aside from being fun, how can they be used for your data journalism studies? We’ve tried some of them out and here’s what we discovered. Read the full blog written by Andrea Abellán here.

Our next conversation

Our next Conversations with Data podcast will feature Jan Diehm. As a senior journalist-engineer at The Pudding, she will speak to us about her creative process for designing compelling visual data essays -- most notably her famed piece with Amber Thomas examining the size of women's jeans pockets. She will also tell us how to pitch a story to The Pudding.

As always, don’t forget to let us know what you’d like us to feature in our future editions. You can also read all of our past editions here. You can also subscribe to this newsletter here.

Onwards!

Tara from the EJC Data team,

bringing you DataJournalism.com, supported by Google News Initiative.

P.S. Are you interested in supporting this newsletter as well? Get in touch to discuss sponsorship opportunities.

The story behind Datawrapper

Tara Kelly — Wed, 14 Apr 2021 14:39:00 +0200

Welcome to our latest Conversations with Data newsletter.

In this week's Conversations with Data episode, we caught up with data visualisation designer and blogger Lisa Charlotte Rost. She speaks to us about the DataVis Book Club she hosts on behalf of Datawrapper and explains why data visualisation has become more prominent since the pandemic.

You can listen to the entire podcast Spotify, SoundCloud, Apple Podcasts or Google Podcasts. Alternatively, read the edited Q&A with Lisa Charlotte Rost below.

What we asked

Tell us about your career. Was there a pivot moment when you fell for data visualisation?

I'm coming from a design background and I studied visual communication for six years. I focussed on print design learning how to layout books and magazines. And because of that interest in magazine design, I went to New York City to do an internship at Bloomberg Business Week at the end of my studies. That was 2013. I wanted to figure out how to design magazines. But on the first day, the team sat me down and asked me to join the graphics department. I thought, what the heck was that? I had no idea. At first, I asked if they meant illustrations, but they said, "No, we mean data visualisations -- like pie charts, bar charts, maps, etc." And I thought, OK, sure, why not. I like numbers.

It was the best internship ever. So after graduating, I was sure I wanted to go more in this direction. I applied for jobs at newspapers. I worked for mostly German newspapers. And then in 2016, I got an open news fellowship at the NPR visuals team in Washington, DC. In 2017, I got lucky again and the current CTO of Datawrapper convinced me to join his company. I've been there ever since and have been very happy there.

Tell us about Datawrapper and how it began.

The idea for Datawrapper came about in 2012 when a journalist named Mirko Lorenz teamed up with developer Gregor Aisch. Mirko was doing a lot of data training and didn't have an easy way to develop charts, graphs and maps. So that's how it began. We work together to create the best charting tool for everyone who wants to show their data in beautiful charts and maps.

How important is localisation to Datawrapper? What languages is it available in?

So far Datawrapper's tools are available in German and English. It's in German because it was built by a German founder, but in English so that people all over the planet can use Datawrapper. But the more customers we got from different countries, the more they asked for more languages. Recently we added Spanish, French, Chinese and Italian to our UI languages. So you can now use Datawrapper in six languages.

Speak to us about the DataVis Book Club that you started on behalf of Datawrapper. Who is it for and how can journalists get involved?

I started the DataVis Book Club in 2018 and it's for everybody who wants to learn about data visualisations and maybe bought a book or two and can't find the motivation to read it. The book club exists to motivate people to offer a deadline and then also open an interesting discussion with fellow readers and the author of the book.

So we've read 10 books right now. We are discussing our 11th book at the end of April. We read everything from data visualisation research papers to books by authors Andy Kirk, Alberto Cairo, RJ Andrews and Cole Nussbaumer Knaflic. The book club's format is in a shared notepad -- almost like a Google Docs document -- you go to a URL and everyone starts typing in different fonts and discusses the book at the same time. That's been a welcome change given the huge amount of online audio and video conferencing since the pandemic.

How do you decide what book to pick?

Sometimes I decide because I'm excited about a new book. But I also started doing surveys to ask people what they want to read next. We have participants from all over the world, so it can take time for the book to become available in all of those different regions.

What about any recent trends you've noticed since COVID-19 in the data visualisation world?

I think the whole data visualisation field just got way more important during the last year. Datawrapper got 10 times the views in 2020 than in 2019. So many of our users put our charts on their homepages. And normally you wouldn't see so many charts, maps and tables on a homepage of a news organisation. But in March, April and May 2020, you had these data visualisations always explaining the latest numbers. So that's great for the data visualisation scene, but it also comes with some challenges.

For example, many of these COVID-19 charts, maps and tables are live updating, which is an interesting phenomenon, because now we see lots of people expecting them to live update. They didn't expect that in 2019 or 2018. Datawrapper supports live updating, and we've seen a four-fold increase in our users using this feature since the pandemic.

You mentioned you worked in the United States and Germany in the data visualisation world. Did you spot any striking differences?

The data visualisation scene in the U.S. is definitely bigger than in Germany. A few years ago, I would have said the U.S. is ahead when it comes to innovation and experimentation. They try out new things, which is then copied by news organisations in Europe. I'm not sure if that's the case anymore. We've seen some excellent dashboards about COVID-19 in the last year coming from European news organisations. I think it is becoming more balanced, which is nice to see. A lot of data journalism teams in U.S. newsrooms pitch and do their own stories and have their own bylines. Some news organisations in Europe and even in parts of the U.S. still treat the data team as a service desk, but that is changing.

Finally, what's next for your book club and for Datawrapper?

We are currently reading Jonathan Schwabish's book "Better Data Visualisations: A Guide for Scholars, Researchers and Wonks". After that, we plan to read "Data Sketches" by Nadieh Bremer and Shirley Wu.

As for Datawrapper, we've already had a busy year. We launched our new website to explain to people what Datawrapper is. We also launched our new accessibility feature. It's now possible to write an alternative description of a chart so screen readers can pick that up and explain to blind people what a chart is about. We are also redesigning our My Teams and My Charts pages. This will make it easier for people to collaborate, access and find old charts. We are also hiring and expanding our team!

Latest from DataJournalism.com

By popular demand, DataJournalism.com has launched a data journalism Discord server for our community. We are looking for data journalism and data visualisation enthusiasts to test out the BETA version. So far we've had 213 user applications and our server is off to a great start. Want to take part? It will only take a minute to create a Discord account. Don't miss out! Sign up here!

Dr Amelia McNamara guides us on a journey through the history of handmade data visualisation. She provides an in-depth tutorial for journalists and cites a variety of resources to help your experiment with hand-drawing visuals for your next story. Read the long read article here.

Our next conversation

Our next Conversations with Data podcast will feature Professor Deborah Stone, a renowned political scientist at Brandeis University. She will speak to us about her new book Counting: How We Use Numbers to Decide What Matters.

As always, don’t forget to let us know what you’d like us to feature in our future editions. You can also read all of our past editions here. You can also subscribe to this newsletter here.

Onwards!

Tara from the EJC Data team,

bringing you DataJournalism.com, supported by Google News Initiative.

P.S. Are you interested in supporting this newsletter as well? Get in touch to discuss sponsorship opportunities.

Better data visualisations

Tara Kelly — Wed, 31 Mar 2021 13:42:00 +0200

Welcome to our latest Conversations with Data newsletter.

Before we jump into this week's conversation, we have some exciting news! We're setting up the first community for data journalists on Discord and are looking for beta testers! Want to take part? Fill out this form!

Now, on to the podcast.

In our latest Conversations with Data episode, we caught up with economist and data visualisation expert Jonathan Schwabish. As the founder of the data visualisation and presentation skills firm, PolicyViz, and a senior fellow at the Urban Institute, he speaks to us about his latest book, "Better Data Visualizations: A Guide for Scholars, Researchers and Wonks".

You can listen to the entire podcast on Spotify, SoundCloud, Apple Podcasts or Google Podcasts.

What we asked

Tell us about your career and how you first became interested in data visualisation?

Like everyone in the data visualisation field, I didn't actually start in the data visualisation field. I began my career as an economist and earned my PhD in Economics. I worked for a non-profit in New York City but then hit my stride personally and professionally in Washington, D.C. at the Congressional Budget Office (CBO). I spent about a decade there where I worked primarily on what we call the CBO long term model focusing on retirement policy and Social Security.

In 2010, I was looking to spark things up. I realised that a lot of the work we were putting out at the CBO was not getting picked up by members of Congress, our main audience. I started thinking about why is that. I then stumbled into this world of graphic design and data visualisation to improve our charts and graphs at the CBO. Next I started teaching data visualisation and later founded PolicyViz.

Seven years ago, I moved over to the Urban Institute, which is a non-profit research institution in Washington, D.C. I spend half my time on research and the other in the communications department where I work on data visualisation. My whole journey into the field of data visualisation was spurred by this observation that we as analysts, as researchers, as economists don't really do a good job of communicating our analysis. We bury it in text and in tables that nobody reads and only other economists can understand. And I believe there has to be a better way.

Tell us about your consultancy firm PolicyViz.

PolicyViz is my one-man consulting shop. It began as a blog back in 2013. I wanted to be more a part of the conversation in the data visualisation field as I was coming at it from the perspective of statistics, mathematics and economics, rather than from a design or computer science background.

Since then, PolicyViz has expanded into multiple things. The site has my books, an online shop and the PolicyViz Podcast. I also offer workshops and consulting in data visualisation and presentations. A few years ago I launched HelpMeViz where people can submit their visualisations to help improve them.

Talk to us about your new book, "Better Data Visualisations: A Guide for Scholars, Researchers and Wonks".

The book comes from my experience in teaching and holding workshops over the past few years. There's really two learning goals that I try to get across in the book. The first is about best practices. The beginning of the book talks through a number of guidelines covering the basic principles such as using active titles and good annotation. I also explore some things that you should do in all of your visualisations. But the meat of the book walks through more than 80 different graph types. I came to the field of data visualisation relying on line charts, pie charts, bar charts -- the economist's bread and butter -- but there's a lot of other graph types out there. And sometimes those other graph types are useful to help engage readers, which could be a goal in and of itself. And sometimes those graphs are better at visualising the data when the line chart isn't.

There is also a section on qualitative data visualisation. This is important because I think a lot of people who are working with qualitative data rely on the word cloud. But there are lots of other ways that you can visualise qualitatively. There's a chapter on tables and another on building a data visualisation style guide.

And then I wrap the whole book in a chapter on redesigns. This is where I do about a dozen graphs and redesign them. This involves pulling all the information in the book together to say, here's the best practice to make this graph clear. For example, I'm going to remake this graph as a dot plot or slope chart or whatever it is to demonstrate how I could try some of these alternative graph types. This is done in order to make the graph a little more effective and better at communicating the message that the author wants to get across.

You talk about the do's and don'ts of data visualisation in your book. Are there any things you see that make your blood boil?

For me, the only rule of data is that your bar charts need to start at zero. Otherwise, you're overemphasising the difference. My most hated things that people do are adding that third dimension, the 3D graphs when you just don't need it. At this point, it looks kind of like ClipArt. Another thing that really bugs me is the bar chart where people break the bar. Any time where you can make arbitrary decisions to make your data look the way you want it to look or make your graph seem more convenient, I think that's a dangerous game. More generally, journalists need to be mindful of people and organisations misrepresenting data or lying about data in their work. It's particularly important for journalists to be honest with the data, to be objective with the data, and put responsible use of the data first.

In your book, you discuss the data visualisation style guide. Tell us about that.

One of the earliest style guides that I ever saw was from the Dallas Morning News. They published their style guide in 2005. I have a collection on my website of style guides that I have seen. A lot of them I draw from media organisations. Each chapter of my book uses a different style guide. This is done to demonstrate how you implement these styles. One is kind of loosely based on The Washington Post. Another is based on the Texas Tribune. The Minneapolis Star Tribune has a really nice style guide. The Urban Institute also has a very detailed style guide.

For journalists, it helps with the branding of the news organisation. For instance, you could show me any graph from The Economist without the magazine and I would know it comes from The Economist. Same thing with the Financial Times. It has that the background colours, the font, the look. You can easily identify it.

For the people that I work with, researchers and analysts who don't really care about the font or colour. That is why it is important to give them a style guide and the tools associated with that. Perhaps an Excel template or an art theme. This allows them to focus more on the analysis and the story and the graph type, and then they can just click a button or run a little script that applies those styles. One thing I would say is that it is important for the style guide to be a living evolving document that can change as people develop their aesthetic style.

What are some of your favourite books in the field you'd recommend for data journalists getting started?

I have a few that I can recommend. I like all the books by Alberto Cairo including his most recent book, "How Charts Lie". I'd also recommend "Storytelling With Data" by Cole Nussbaumer Knaflic. That's another great book for introductory data visualisation techniques. "Info We Trust" by RJ Andrews is more of a historical look at data. "R for Data Science" by Garrett Grolemund and Hadley Wickham H is a must-have if you're coding in R programming language.

Finally, what is next for you at PolicyViz and the Urban Institute?

I'm just about to finish up a daily video series that I've been posting since early January. It's called One Chart At A Time. It'll be 56 videos when it's all done. Each video covers a different graph. It's from different people in the data visualisation and data journalism fields. I asked people to do a five-minute video about a specific graph type. So every day I've been posting those over the last couple of months. And that project is about to wrap up this May. It is a really exciting project with so many different people and personalities from a variety of fields talk about specific graphs.

I will publish a new report on racial equity in data and data visualisation, it's building off of a paper that I published back in August with a colleague of mine, Alice Feng, at the Urban Institute. But we have a much longer in-depth report. In summary, it's a combination of the conceptual reasons and rules about why we should be having these discussions and thoughts, but also some practical considerations, not rules.

Latest from DataJournalism.com

By popular demand, DataJournalism.com is launching a data Discord community server. We are looking for data journalism and data visualisation enthusiasts to test out the beta version. Sign up here for early access!

Dr. Amelia McNamara guides us on a journey through the history of handmade data visualisation. She provides an in-depth tutorial for journalists and cites a variety of resources to help you experiment with hand-drawing visuals for your next story. Read the long read article here.

Sonification is all around us! So how can you use music and sound to enhance your storytelling? Data journalists Duncan Geere and Miriam Quick explain how by walking us through some compelling examples from journalism, science and civil society. Read the entire long read article here.

Our next conversation

Our next Conversations with Data podcast will feature Lisa Charlotte Rost from Datawrapper. As a data visualisation designer and blogger, she will speak to us about the rules of data visualisation and the DataVis Book Club she runs on behalf of Datawrapper.

As always, don’t forget to let us know what you’d like us to feature in our future editions. You can also read all of our past editions here. You can also subscribe to this newsletter here.

Onwards!

Tara from the EJC Data team,

bringing you DataJournalism.com, supported by Google News Initiative.

P.S. Are you interested in supporting this newsletter as well? Get in touch to discuss sponsorship opportunities.

Finding data outliers for solutions journalism

Tara Kelly — Thu, 18 Mar 2021 13:40:00 +0100

Welcome to our latest Conversations with Data newsletter.

It's no secret that media trust is at an all-time low. Dwindling readership and disillusioned audiences are just some of the symptoms from continual doom and gloom reporting. But one way to engage with audiences is to introduce solutions stories to your news outlet: investigating and explaining, in a critical and clear-eyed way, how people try to solve widely shared problems.

To help better understand this, we caught up with investigative data journalist Matthew Kauffman from Solutions Journalism Network in our latest Conversations with Data podcast. He explains how data journalists can apply a solutions-oriented mindset to their work and how to identify "positive deviants" -- outliers in data that might point to places that are responding to social problems.

You can listen to the entire podcast Spotify, SoundCloud, Apple Podcasts or Google Podcasts. Alternatively, read the edited Q&A with Matthew Kauffman below.

What we asked

What led you to become an investigative data journalist?

I was always interested in journalism. In high school, I worked on the school paper where I had written some pieces that got some praise from teachers and fellow students. I had thought about a career in law and politics. However, I found journalism appealing in the sense that you could explore issues and educate people without having to sacrifice your own thoughts and beliefs.

I spent many years on the investigative desk at the Hartford Courant. I often thought of it as the projects desk than strictly investigative -- doing deeper dives into issues. You can't view the world without recognising there are issues that deserve a spotlight. There are inequities out there and having the opportunity to bring those to light was valuable. I don't see myself as an advocate, but it felt meaningful when I wrote about stories where there seemed to be problems -- people took notice and change happened.

As for data, I was always comfortable with numbers and math. A lot of journalists are not. I could use an Excel spreadsheet and suddenly you're the data journalist on staff. I have certainly come to realise the power of data in journalism by providing hard evidence to go along with anecdotal information. It makes for a really strong combination when you have the human element through anecdotes, but then hard systemic evidence provided through data.

[Solutions Journalism Network](www.solutionsjournalism.org] was founded eight years ago by journalists from The New York Times. The organisation trains and connects journalists to cover what’s missing in today’s news: how people are responding to problems.

How would you define solutions journalism?

The concept of solutions journalism is this idea that journalists can and should apply the same kind of rigorous, clear-eyed examination of responses to problems as we do to the problems themselves. So it's not picking winners. It's not advocacy. But we're very good at digging into systems and seeing what's going on and certainly identifying problems.

And if you have in particular a case where an issue is well known locally, for instance, you've already done 10 stories on how terrible the opiate crisis is in your community, it's important to consider whether another story about how awful things are is necessary. I think there's better value in saying, well, what have other communities done? Are there other places that have bent the curve on this? And if so, are there specific policies that they put in place to kind of move the needle? Have they found a successful response? So that's the concept of solutions journalism.

It's not a replacement for accountability journalism or reporting on problems. But I think it is a really strong partner in going beyond simply reporting on the problems and showing how success in other communities might be replicable. We call it 'hope with teeth'. Readers and viewers want journalism that sort of goes beyond doom and gloom. Studies do show readers respond to stories that provide a possible path forward.

How does data tie into solutions journalism?

Data are a terrific partner with solutions reporting. When the impact of a response can be measured with numbers, data are a terrific path to that. It's not enough for a town official to claim a terrific response on covid or homelessness or the achievement gap in schools without showing evidence. For instance, here's the response that was put in place. Now ask what difference has it made.

I use an example of hospitals in California that put policies in place to try to improve maternal health during pregnancy, essentially avoiding mortality specifically to address postpartum bleeding. Ok, that policy sounds like a good idea and seems like it will work, but do we have evidence that it works? And the data provide that evidence: while mortality was going up in most of the country (at the point at which these provisions went in place in hospitals), in California the curve was bent for maternal mortality. For instance, you can see in the data that suddenly the line stops going up and starts going down.

Of course, there are always issues with causation and correlation. Just because something happens after something else doesn't mean the first thing caused it. But that seemed pretty solid evidence that the changes that were made led to this better outcome. And these changes could be made at hospitals in any state.

Is solutions journalism only for more seasoned journalists?

No, I think one of the nice things about it is that there isn't a steep learning curve. It's just having a mentality about how you approach stories. And if you approach stories saying, I want to fully understand this issue: who it's affected, how bad it is, what caused it? Those questions take you to the past, up to the present, where and how did we get here and where are we? And it's just adding that additional part -- where could we go from here? Problems are rarely isolated to one single institution, one single school, one single town or one single country. If you expand your horizons beyond how did we get here and where we are and think about the places that have found a path forward -- that's all it takes. Any journalist at any news outlet of any size anywhere can adopt and pursue solutions journalism.

What are the biggest stumbling blocks journalists encounter who are new to solutions journalism?

I think the biggest stumbling block or hesitancy is this concern about, 'gee, this sounds a little like advocacy to me'. The first I ever heard of solutions journalism, I happened to be at a conference in Boston and there was a panel I sat in on with someone from the Solutions Journalism Network talking about this. And I went to them afterwards and I said, 'This sounds really intriguing, but how is this not advocacy?' And so I think a lot of people need to get comfortable with that first step. Remember, it's not about here's the right way. It's not about picking winners. It's just that there are other places that have approached the same problems that I'm writing about. So here's what they did. And then, it involves getting your investigative cap on, as all journalists have, and really digging in. Did it make a difference? Can you tie what they did to this better response? And then the key thing is making sure and being comfortable that this isn't public relations, it's not advocacy. It's as simple as having a mindset to produce better journalism.

How does reporting on the limitations of a particular solution come into play?

Reporting the limitations is a critical, essential part of solutions reporting. It's not a solutions journalism story if you don't explore what the limitations of a response are. For instance 'Hey, this worked in this community, but it's also really, really expensive or it worked in this community. But this community has great public transportation and that was essential to the response. And it won't work in a community that doesn't have that or it works for this demographic. But they really haven't solved the problem for this demographic.'

Solutions Journalism Network has launched a new fund to support reporting projects that pursue solutions stories by investigating “positive deviants” in datasets. Find out more here.

How do you do a sanity check when using data in your solutions reporting?

The first step is to think about what are the questions that I have that these data can answer. And if you're looking for a positive response or "success", it is important to ask what does "success" mean in this case? And it doesn't necessarily mean, whatever place has the highest number because they're all sorts of reasons why one community, school, hospital or institution might have a higher performance metric than another. So maybe you are looking at the improvement over time. For instance, is my measure of success equity across different groups? So the first thing to do is interview yourself to ask what am I looking for. What to me would be evidence of a successful response.

Interviewing your data is absolutely critical. Ask yourself if you can trust the data. Where did these data come from? Do they seem to be reliable? If I'm comparing multiple places, am I sure that these data are comparable, that they were captured in a similar way? And then when you do the data analysis, look for a certain smell test. For instance, if something went up 17,000% -- that doesn't sound right. Did I do the math wrong or is there missing data for certain years? If so, then I can't work with this. Even before you analyse the data, doing that interview and saying, does it appear the data are clean is critical. Remember that all of the caveats that we have in all data journalism certainly apply to solutions journalism. It's simply about getting a lead to a place to investigate further and see if they belong in your story.

Finally, what kind of training does the Solutions Journalism Network provide?

We have training in 17 different languages. We have journalists and regional managers in Africa, Europe and the United States. We have a ton of webinars and training. You can visit Solutionsjournalism.org, Learning Lab or the Hub. We do live web-based trainings. It's best to start with our Solutions Journalism 101 course. Those are held live once a month. If you don't want to do a live webinar, you can find asynchronous training on our site, too. We then have a more advanced training that gets into some more details about solutions journalism and the different ways to frame a story looking at different slices of a story. That lends itself to a solutions resource I'm currently developing. I also recommend checking out the Solutions Story Tracker with more than 1,000 stories on how people are containing COVID-19, coping, and caring for each other during the pandemic.

Latest from DataJournalism.com

Calling all data journalism, tech and design students! Are you looking to kickstart your career this summer? European Journalism Centre and Google News Initiative just opened applications for their paid journalism fellowship. Apply now! The deadline is 25 April 2021.

Our next conversation

Our next Conversations with Data podcast will feature economist and data visualisation expert Jonathan Schwabish. He is the founder of the data visualisation and presentation skills firm, PolicyViz, and a Senior Fellow at the Urban Institute, a nonprofit research institution in Washington, DC. In addition to his research at the Income and Benefits Policy Center, he is a member of the Institute’s Communication team where he specialises in data visualisation and presentation design. He will speak to us about his latest book Better Data Visualizations: A guide for Scholars, Researchers, and Wonks.

As always, don’t forget to let us know what you’d like us to feature in our future editions. You can also read all of our past editions here. You can also subscribe to this newsletter here.

Onwards!

Tara from the EJC Data team,

bringing you DataJournalism.com, supported by Google News Initiative.

P.S. Are you interested in supporting this newsletter as well? Get in touch to discuss sponsorship opportunities.

Investigating crime and corruption with data

Tara Kelly — Wed, 03 Mar 2021 12:48:00 +0100

Welcome to our latest Conversations with Data newsletter.

In this week's Conversations with Data podcast, we caught up with investigative journalist Pavla Holcová. She is a regional editor for Central Europe at Organized Crime and Corruption Reporting Project (OCCRP) and is the founder of the Czech independent outlet Investigace.cz. She talks to us about using data to investigate the February 2018 murder of her colleague Slovak investigative journalist Ján Kuciak and his fiancee Martina Kušnírová.

You can listen to the entire podcast Spotify, SoundCloud, Apple Podcasts or Google Podcasts. Alternatively, read the edited Q&A with Pavla Holcová below.

What we asked

What led you to become an investigative journalist?

I used to work in Cuba with human rights defenders and independent journalists. We conducted training there to help journalists understand the difference between fact-based reporting and opinion-based reporting. One day we were detained by the Cuban police and ended up in the same jail cell as investigative journalist Paul Radu, the executive director of OCCRP. Because we knew we couldn't talk about the job we were doing in Cuba, we started to talk about his job. He explained how he was an investigative journalist doing cross-border project journalism. And that was the moment in Cuban jail when I decided that once I will quit my job, I would like to do this. That was in 2010 and I founded Investigace.cz in 2013. A year later our publication joined OCCRP, where I am also now a regional editor for Central Europe.

What happened to investigative journalist Ján Kuciak and his fiancee in February 2018?

Ján Kuciak and his fiancee were shot dead on the evening of the 21st of February 2018. It happened a couple of weeks before their wedding. The police started to investigate and discovered that he was shot because of the work he had done. As an investigative journalist, he had exposed a couple of stories that uncovered the corrupt nature of the system in Slovakia, particularly the judiciary. I'm talking about tax fraud and about a group of people who were basically untouchable. They were never prosecuted. Ján exposed how it was possible and the system of corruption in Slovakia.

How did you and Ján work together?

A year before he was shot, we decided to work on a story together that exposed the ties of the Italian mafia to the then prime minister of Slovakia, Robert Fico. It was about a week before we were about to publish the story when he was shot. So the first lead was that the Italian mafia ordered his killing. But later it turned out that the lead suspect and the person with the highest motivation to silence Ján and his investigative work was Marian Kočner. He was a well-known Slovak businessman who felt threatened by Ján's reporting and he had the highest motivation. The court first ruled that there was a lack of evidence preventing him from being sent to prison. But there's an appeal and the case is still ongoing. He is currently serving a 19-year sentence in prison for another conviction -- the forging of promissory notes.

What did Ján's death reveal about Slovakia?

This case revealed how the system truly worked and how corrupt the system was. We are now able to prove it and expose it because we got access to 70 terabytes of data that we now call the Kočner Library. It's the evidence that was collected by Slovak police and it includes the files and all the annexes regarding the murder of Ján Kuciak and his fiancee Martina Kušnírová.

It was a very special situation in Slovakia. This was because in 2020 elections were happening in the country. It meant the political party that created the corrupt system could potentially not win reelection. We obtained those 70 terabytes of data in late November 2019, and we knew at the time that we only had about four months to work with the data and to start this project.

How did you obtain that 70 terabytes of data?

We did not ask for permission from the police. We got it from a source. We got it legally and it was not a leak. We know who the source is, even though we won't really reveal the source's identity.

What stories came out of this data?

The stories from the data set really changed Slovakia. The impact was huge. First, we described to the public how Ján and Martina were murdered. Then we exposed the world of Marian Kočner -- how he used honey traps and blackmail to escape justice. He was bribing the general attorney and had good connections to the police force. They were giving him information and neglecting his cases. He was even telling corrupt judges how to rule in his cases. Those people are now in jail because of the information from this data set and the public pressure. A big revolution is happening in Slovakia. It's not so visible now because of the coronavirus situation, but it is changing completely.

What kind of journalist was Ján Kuciak?

He was one of the most modest journalists I've ever worked with. We were not just colleagues, we were also good friends. So for me, his death was a shock. But in Slovakia, before he was murdered, he was not well known. He was only 27 years old and he was much more into analysing the data than relying on secret sources and then publishing what they told him. He was much more into using the public registries, collecting the data, putting them together, analysing them and then connecting the dots. He was also helping fellow journalists. If someone from another news outlet had a problem understanding a big financial scheme, he was the one who helped them piece it together.

A memorial for Ján Kuciak and Martina Kušnírová next to the statue of saint John Paul II. in Prešov By Petinoh - Own work, CC BY-SA 4.0

How did privacy concerns impact the stories you told from the data?

That was actually my role -- to make sure that we don't expose random bystanders in the conversation or actually the victims of Marian Kočner. One of the criteria for being selected to the team was that the journalists would require the highest ethical standards. They needed to take into account that we may expose people who were not part of the system and that they shouldn't do that. But we also needed to coordinate with the authorities to ensure that we didn't jeopardise the police investigation. So before when we started to work on a story, it was my task to coordinate with the lawyers and with the police just to check that we won't share some kind of information that shouldn't be revealed.

What about the safety of the journalists working on this?

We definitely needed to consider personal safety. One of the team members received a bullet in his mailbox. He got police protection and everyone was alerted to what was happening. But in the end, we didn't find who sent him the bullet. The team did manage to keep their sanity and work-life balance in spite of this.

We also established a safe room in Bratislava, the only place where journalists could access the data. We provided a set of computers that were stripped of all functionalities and could only work if the librarian started the system, which connected to the data. And then anyone who wanted to work in the library needed to come to that location to download the data on an encrypted USB stick and bring it home to work on. This meant that if anyone would steal it, they would not be able to access the data.

How does OCCRP hope to keep the memory of Ján Kuciak alive?

We shouldn't just remember him once a year. We should always remember that his murder changed Central Europe in an unprecedented way. The impact of his murder was huge. We want to finish all of the stories he started and we are still not done with the data. Following the steps of the journalists who are threatened or who have been murdered, and finishing their stories, will show that killing someone doesn’t mean killing their stories.

Latest from DataJournalism.com

The News Impact Summit has released an introduction to R programming language training video by data journalism trainer Jonathan Stoneman. Master the basics such as how to set up the R console and import data. You'll also learn how to pivot and filter data and use the R package ggplot. Watch the full video here.

Wikidata can be a useful resource for journalists digging for data on a deadline. Monika Sengul-Jones explains the joy and perils of using the searchable data trove for your next story. Read the long read article here.

Our next conversation

Our next Conversations with Data podcast will feature investigative data journalist Matthew Kauffman from Solutions Journalism Network. He leads a data reporting project helping newsrooms pursue solutions reporting by identifying positive deviants -- outliers in data that might point to places that are responding to social problems.

Before joining the Solutions Journalism Network, Matthew spent 32 years writing stories for the oldest newspaper in the United States, The Hartford Courant. He is also a two-time Pulitzer Prize finalist and teaches data journalism. He will speak with us about the power of combining data with solutions journalism to tell compelling stories that engage audiences.

As always, don’t forget to let us know what you’d like us to feature in our future editions. You can also read all of our past editions here. You can also subscribe to this newsletter here.

Onwards!

Tara from the EJC Data team,

bringing you DataJournalism.com, supported by Google News Initiative.

P.S. Are you interested in supporting this newsletter as well? Get in touch to discuss sponsorship opportunities.

What's on at #NICAR21

Tara Kelly — Wed, 17 Feb 2021 12:10:00 +0100

Welcome to our latest Conversations with Data newsletter.

With NICAR 2021 conference just around the corner, we caught up with Investigative Reporters and Editors interim executive director Denise Malan in our latest Conversations with Data podcast. She tells us all about the upcoming programme for one of data journalism's most sought after events happening next month. She also provides some useful advice for journalists new to data and how to take your first step in the field.

You can listen to the entire podcast on Apple Podcasts, Google Podcasts, Spotify, or SoundCloud. Alternatively, read the edited Q&A with Denise Malan below.

What we asked

Tell us about your work at IRE and your background in data journalism.

My background is in newspapers and doing data journalism. I joined IRE (Investigative Reporters & Editors) in 2013 and have had a variety of roles there. I am currently the interim executive director while we're in a leadership transition searching for a new executive director. My regular role now for about the past year has been as the deputy executive director.

We host 80-90 events a year, including our conferences, workshops, in-house custom training and bootcamps. I lead the training team that puts on all of those events. I also ensure our curriculum and our content is strong and evolving and that we're keeping up with the needs of journalists.

What is NICAR and how did it begin?

NICAR stands for the National Institute for Computer-Assisted Reporting, which is the data journalism wing of IRE. NICAR, as a conference, started in 1993. So we've been going for almost 30 years now, bringing the data journalism community together and helping them train each other. It used to be just a couple hundred journalists and then it really started growing in 2013. By 2018 we had over 1200 journalists at NICAR in Chicago. The real trademark of NICAR is hands-on data skills training where you can come into our computer labs and get to learn from expert data journalists.

What is on the agenda at NICAR 2021?

The NICAR 2021 conference will be held on 3-5 March 2021. For the first time ever, we will be virtual with almost 170 sessions on the agenda. That includes over 80 live sessions, which will be panels, tool demos, networking sessions, and a few happy hour sessions. We will have some meditation sessions and a DJ. We're also going to have one of the most anticipated sessions every year -- lightning talks. These are quick, quirky and fun 5-minute presentations that are submitted and then voted on by attendees. We will have 10 of them this year.

To adapt to the virtual world, our data sessions will be in virtual hands-on labs that include almost 90 recorded tutorials covering data journalism. Topics include spreadsheets, data wrangling, python and web scraping. Each lab also includes information on how to set up your home computer, since everyone will be doing this from home and it includes all the data files that you'll need to follow along with the tutorials. All of the live sessions will be recorded and you'll have access for a year to all of those sessions.

Who is eligible to attend NICAR 2021?

We require IRE membership to attend all of our conferences. For those requirements, you need to be a working journalist or a freelancer -- anyone who's substantially engaged in journalism. You can also be a journalism educator or student. We have several tiers of membership. Event registration is $175 for professionals and educators. The EarlyBird deadline registration fee was $125, but that has now passed. Students can register at any time for $50. Fellowships are still available to help journalists attend the virtual event.

What data journalism services do IRE offer for journalists?

IRE has a data services desk run by our trainers. We can help you clean data, analyse it and visualise it. We have a number of different skills in our team that can help with all of those. We do giant web scraping projects, which involve converting PDFs and cleaning the data. This is especially useful for journalists who don't necessarily have those kinds of skills in-house. These are done for an hourly fee that's on a sliding scale depending on the size of your organisation. If you are in a small organisation, it is very affordable. To find out more, visit IRE's website.

Are the terms "NICAR" and "computer-assisted reporting" still relevant today?

The National Institute for Computer-Assisted Reporting (NICAR) -- that name came up in the 90s. That's what we used computers for -- data journalism at that time. It looked very different in 1993 than it does today. Our trainings were also named computer-assisted reporting bootcamps up until just a couple of years ago when we decided to rename them to data journalism bootcamps. But the name computer-assisted reporting still sticks around and it's nostalgic for a lot of people. Of course, we use computers for basically all reporting now. But whatever you call it -- data journalism, data-driven journalism, computational journalism -- it all falls under this umbrella of what we do at NICAR.

Journalists, students and educators of colour; women who are early-career or students; and educators who teach data or investigative reporting are eligible for fellowships to help offset the cost of #NICAR21 registration. Apply today.

You've worked at IRE and as a data reporter for a newspaper. How did you get your start in data journalism?

Most of my career was at Corpus-Christi Caller-Times in Texas. We did a lot of data-driven work and we worked really hard to train our entire newsroom in spreadsheets. This is actually how I became involved with IRE. My bosses sent me to an IRE bootcamp where I learnt the real way to do spreadsheets. I had been mostly self-taught prior to that. The training took me really far in my career. And now I've been teaching those bootcamps for the last seven to eight years.

Where should journalists start when they are new to data?

At IRE we meet a lot of new data journalists and the advice that we like to give them is to start small. It can be very overwhelming when you want to learn data journalism -- you're looking at all of these different tools and you may think you need to become a mega programmer. I tell people you don't have to do all that, just take a breath and start with what you need. And in a lot of cases, spreadsheets can take you really far and you can learn those in just a few hours.

So my advice is to start with something that really interests you and your community. When you run into a challenge that you really do need to overcome, you approach it in small steps. If you need to visualise data, then you learn a visualisation tool. If you need to scrape a website, then you learn a little programming. Just don't think that you have to learn it all or all of it at once.

The other advice I have is to read data journalism constantly. Get to know the people who are doing this really well. Go to all of the trainings that you possibly can and really try to absorb the knowledge from our data journalism community.

Finally, what are some of the most inspiring data-led pieces you've come across recently?

The New York Times and The Atlantic covid tracking projects have been just an incredible public service over the last year because they are collecting data and standardising it and making it available to the public and to journalists.

Latest from DataJournalism.com

The latest edition of the Verification Handbook is available in German. A big thanks to Landesanstalt für Medien NRW and Marcus Engert of Buzzfeed Germany for making it possible. The handbook is edited by Craig Silverman, published by European Journalism Centre and supported by the Craig Newmark Philanthropies. You can also download the PDF version in English, Arabic, Turkish and Italian.

Our next conversation

Our next Conversations with Data podcast will feature investigative journalist Pavla Holcová from OCCRP. She is a regional editor for Central Europe and is founder of independent outlet investigace.cz. Pavla has contributed to major cross-border projects such as the Panama Papers, the Paradise Papers, the Russian Laundromat, and the Azerbaijani Laundromat.

She will talk to us about using data and digital forensics techniques to investigate the February 2018 murder of her colleague Slovak investigative journalist Ján Kuciak and his fiancee.

As always, don’t forget to let us know what you’d like us to feature in our future editions. You can also read all of our past editions here. You can also subscribe to this newsletter here.

Onwards!

Tara from the EJC Data team,

bringing you DataJournalism.com, supported by Google News Initiative.

P.S. Are you interested in supporting this newsletter as well? Get in touch to discuss sponsorship opportunities.

Vaccines and variants: What you need to know

Tara Kelly — Wed, 03 Feb 2021 17:51:00 +0100

Welcome to our latest Conversations with Data newsletter.

In this week's Conversations with Data podcast, we spoke with vaccinologist Dr Melvin Sanicas about the new COVID-19 variants from South Africa, Brazil and the United Kingdom. He explains to us how epidemiological surveillance works and how these new variants could impact the current vaccine rollout programme.

You can listen to the entire podcast on Apple Podcasts, Google Podcasts, Spotify, or SoundCloud. Alternatively, read the edited Q&A with Dr Melvin Sanicas below.

What we asked

Talk to us about these new variants from South Africa, Brazil and the United Kingdom. How do they come about?

The appearance of variants is not at all surprising because viruses mutate. That's what they do. When a virus particle enters a host cell, it converts that cell into a viral factory, churning out thousands of new virus particles in a relatively short period of time. And because replication is not perfect, there will be some changes, some mutations. Many of these mutants may be different genetically from the original virus, but not exhibit any biological important differences. Others can be inferior, which means that mutations made these viruses less able to replicate.

In a few cases, however, the mutations may confirm what's called a selective advantage. This means that a particular mutant might be able to infect a person more readily or replicate more inside the body, or even just leave a person's body more easily. These types of changes make a virus more likely to survive and reproduce, which can be worrying in the case of more dangerous viruses.

So in general, coronaviruses mutate less rapidly than the flu, for example, or HIV. But SARS-CoV-2 has been mutating throughout the pandemic. In fact, early in the pandemic, I think it was around April or May last year, one particular variant referred to as the D614G, quickly replaced the initial SARS-CoV-2 strain throughout the world. And this contains a change in the virus spike protein that allows the virus to be more easily transmitted from host to host. And because of this biological difference, the variant spreads rapidly. But if you are wearing masks properly in the right kind of mask, washing your hands regularly and keeping your distance from people, this variant cannot infect you.

Talk to us about how viruses are sequenced. What does that involve?

A genome is an organism's genetic material. It's essentially the instruction manual which contains all the information needed to make and maintain the virus. Virus genome sequencing is essential in understanding the spread and evolution of SARS-CoV-2. So how is this done? I'm not involved in a genomic sequencing lab at the moment. The last time I did this was several years ago for a different pathogen. But basically, someone swabs the patient's nose and we pull genomic material out of the sample using a little handheld DNA sequencer. This is typically smaller than a smartphone and connects directly to a laptop via a USB.

The data analysis software then does the sequencing of the genome and uploads the sequences directly into international databases. These databases can be accessed by people in the viral genomics community. This means scientists anywhere in the world working on viral genomics can have access to the widest array of data as possible. And by looking at the genomic sequence of the virus, we can have an idea where their version of the virus came from. We can also understand how the virus is spreading because the genomic sequence looks a little different as the virus mutates and spreads in different geographic areas. We can even tell if this virus is from Europe, Wuhan or even Singapore, for example.

Not every country conducts genomic surveillance on a regular basis. Why is that?

I think it's really more of a question of resources. For instance, should they increase genomic sequencing capacity or buy more PPE or buy more vaccine? And this is because genomic sequencing is considerably more expensive than a COVID diagnostic test, for example. It's not feasible to isolate every positive COVID-19 case and screen for the new variant. It's important for researchers to work with local clinical epidemiologists to build a monitoring strategy. If all countries with high transmission do genome sequencing or do more genome sequencing than what they are doing at the moment, I'm sure each country will be able to identify a new variant.

How do these new variants impact the current vaccines?

Changes in the spike protein of the virus could potentially alter the effectiveness of the newly developed vaccines. Most of the major vaccines in development at the moment focus on the spike protein. Mutations in the gene encoding for the spike protein could, of course, alter the structure, make some changes. Studies have been designed to explore the effectiveness of the existing vaccines against the new variant. Pfizer and Moderna released preprints from their studies recently. They showed that even though there are some reductions in neutralisation, overall, the antibodies produced by vaccines using Pfizer and Moderna are still able to neutralise the virus with these new variants.

The Johnson & Johnson vaccine is a one-dose shot. Should countries be opting for that one instead of the other two-dose vaccines?

Governments should opt for whatever vaccine has been reviewed and approved by the national regulatory authorities. If we want to stop the pandemic, we need to scale up vaccination efforts now. If this vaccine from Johnson & Johnson has an efficacy similar to the Oxford-AstraZeneca vaccine, with only one dose, surely it's a great addition to the covid vaccine family. But realistically, it's not a matter of choosing one over the other because no single manufacturer can produce doses for the whole world. Some will have to use vaccine A and some will have to use vaccine B, others will have to use vaccine C and so on and so forth. And every single day we wait for vaccine J to come into the picture is an opportunity for the virus to spread, mutate and create more variants.

And as an example, the full effects of Pfizer's two-dose vaccine are only expected to be seen in two weeks after the second dose. But recent data from Israel has already shown that there is a significant drop in infections even before this point. They have seen a 60 percent reduction in infections three weeks after the first doses were given. So I think we have to roll out vaccines as early as possible and not wait for the perfect vaccine.

What should journalists be mindful of when interviewing different experts and scientists about COVID-19?

I understand why everyone is saying things about the pandemic or the virus because we are all affected and we are entitled to an opinion. But if people's opinions are not based on what they were trained to do, or based on their experience, these opinions add to the confusion. And people who are not doctors or scientists might think that all doctors or scientists are COVID experts. But this is not the case.

For immune system questions, of course, you have to ask immunologists. For the virus itself, you have microbiologists and more specifically, the virologists. For managing COVID-19, you have to speak to infectious disease specialists for questions on community transmission. For public health advice, you have the epidemiologists and public health physicians. And it's also good to know within epidemiology itself, there are specialisations. For COVID-19, we need to speak to those with infectious disease epidemiology expertise because not all epidemiologists are the same.

How do vaccinologists like yourself fit into the picture?

Vaccinology is a subspecialty of microbiology. It's a combination of immunology, microbiology, infectious diseases, public health and immunology. We've studied bits and pieces of these different subspecialties of medicine. We can understand what's happening and we can have a bigger picture of of the vaccines, of the various treatments and even public health.

Latest from DataJournalism.com

To mark Data Privacy Day, DataJournalism.com assembled a special digital security guide by a panel of experts to help journalists stay safe online. The article includes useful advice from Tactical Tech, Tutanota, IVPN, Maldita Tecnologia, Fastmail, SheHacks_KE, consultant Chris Dufour and journalist Henk Van Ess. Read the full article by Andrea Abellan.

Our next conversation

Our next Conversations with Data podcast will feature Denise Malan, interim executive director of IRE, a US-based nonprofit membership association for journalists. She was a newspaper journalist for more than a decade, using data to shed light on local issues in government, schools, the environment and other beats. She will talk to us about the virtual NICAR 2021 event happening on March 3-5.

As always, don’t forget to let us know what you’d like us to feature in our future editions. You can also read all of our past editions here.

Onwards!

Tara from the EJC Data team,

bringing you DataJournalism.com, supported by Google News Initiative.

P.S. Are you interested in supporting this newsletter as well? Get in touch to discuss sponsorship opportunities.

Privacy Day 2021: What journalists need to know

Tara Kelly — Thu, 28 Jan 2021 13:04:00 +0100

Welcome to our latest Conversations with Data newsletter.

Today is Data Privacy Day 2021 and to celebrate the occasion we are proud to announce our latest goodie bag offers solely for DataJournalism.com members:

1-year free subscription with Tutanota
6-month free secure VPN subscription with IVPN

Take advantage of these latest offers by signing up today and access them via your profile!

To mark the day, we assembled a special digital security guide by a panel of experts to help journalists stay safe online. Read the intro to our Q&A below or the full article here.

Topics covered:

The biggest threat to internet privacy for journalists.
Genuinely investigate anonymously online.
Most important aspects of privacy for media professionals to safeguard.
Keeping your personal information secure using cloud storage.
Must-have privacy tools for journalists.
Advice for media professionals to protect their sources and data.
Balance the promotion of their work with their online privacy and safety.

Our panel of privacy and security experts:

Naiara Bellio, Maldita Tecnología’s coordinator
Chris Dufour, a consultant in privacy, digital security & disinformation
Valentin Franck, software developer at Tutanota
Nicola Nye, chief of staff at FastMail
Sasha Ockenden, communications coordinator at Tactical Tech
Laura Tich, co-founder of SheHacks_KE
Henk Van Ess, a journalist specialising in OSINT, social media & privacy
Viktor Vecsei, communications officer at IVPN

What is the biggest threat for journalists when it comes to internet privacy?

Viktor Vecsei (IVPN): The biggest threats journalists face vary – it depends on which country they live in, issues they focus on and the type of adversaries they might face. Each person’s situation is unique. At least basic levels of privacy protection measures must be in place to avoid personal threats from readers disagreeing with their mission, harassment by government officials or getting targeted with disinformation campaigns.

Two distinct areas are important to consider: protecting their identity when doing investigative work or research and protecting their personal privacy when publishing materials and disseminating them on social media. Each requires different tools and techniques and they need to consider how, what and when they access and share to minimise threats.

Sasha Ockenden (Tactical Tech): We are all immersed in technology and data – and the pandemic has only exacerbated this. The major privacy issue for all of us, including journalists, is how to compartmentalise our private and professional activities and make sure that the tools we use for one do not affect the other. For journalists in particular, given the potential consequences of sensitive information being exposed, it is more important than ever to understand how data is collected, stored and (ab)used.

The fast-changing nature of online tools and platforms, and the ways they are regulated in and across different jurisdictions, can make it hard to keep on top of. This is especially the case for those who consider themselves less tech-savvy. Tactical Tech’s Data Detox Kit provides clear suggestions and concrete steps to keep control of all aspects of your online life, make more informed choices and change your digital habits in ways that suit your private and professional lives.

Henk Van Ess (Journalist): The biggest threat is the journalist themselves. Those who say "I have nothing to hide" will inevitably be trolled, embarrassed, cloned, or worse: hacked.

Chris Dufour (Digital security consultant): There is no single "big threat" in terms of a specific piece of malware, hacking technique, or attacker. That's the threat: the internet is iteratively changing and evolving daily, sometimes hourly. As such, it can be virtually impossible to fully secure oneself, and even if you could, there are corollary vulnerabilities in the form of those around you and the information they share about you: your family, friends, coworkers. I believe the biggest threat is the individual's degree of skill and time spent securing themselves against the attack and undue influence.

Valentin Franck (Tutanota): There are several threats to privacy in today’s internet. Journalists are affected by those in particular because they are more likely to hold sensitive information than the regular internet user. First of all, large parts of the internet are tracked by private companies with the primary objective of user profiling in order to sell targeted advertisements. The amount of information gathered by those companies is enormous.

The exposed position of journalists and the fact that they can be multipliers means that it is interesting to learn about and shape their thinking and interests for a wide range of actors. Also, state actors might force private companies to help gather information on a person of interest.

Read the full article here.

Latest from DataJournalism.com:

Did you know a free membership with DataJournalism.com comes with access to free and heavily discounted digital products?

Take advantage of these latest offers by signing up today and access them via your profile!

1-year free subscription with Tutanota 6-month free secure VPN subscription with IVPN

These are in addition to our existing digital privacy-related partners who've already generously given our members offers: 1password, FastMail, ProtonMail and Flokinet.

Disinformation comes in all languages. That's why the latest Verification Handbook For Disinformation And Media Manipulation is now available to download in Arabic, English, Italian and Turkish.

A special thanks to the partner organisations who translated the handbook: Al-Jazeera Media Institute for the Arabic translation; Teyit for the Turkish translation; Slow News, Facta and Pagella Politica for the Italian translation.

The handbook was edited by Craig Silverman, published by the European Journalism Centre and supported by the Craig Newmark Philanthropies.

Our next conversation

Our next Conversations with Data podcast will feature vaccinologist and physician Dr Melvin Sanicas who will discuss the new COVID-19 variants in South Africa, Brazil and the United Kingdom. He will also explain how epidemiological surveillance works and what these new variants mean for the vaccine rollout programmes around the world.

As always, don’t forget to let us know what you’d like us to feature in our future editions. You can also read all of our past editions here.

Onwards!

Tara from the EJC Data team,

bringing you DataJournalism.com, supported by Google News Initiative.

P.S. Are you interested in supporting this newsletter as well? Get in touch to discuss sponsorship opportunities.

Digital verification for human rights advocacy

Tara Kelly — Wed, 13 Jan 2021 16:36:00 +0100

Welcome to the first Conversations With Data newsletter of 2021.

With misinformation already dominating the headlines this year, there's no better time to inject open source information into your newsgathering.

Our latest Conversations with Data podcast features an interview with Sam Dubberley from Amnesty International. He talks to us about managing the Digital Verification Corps and the challenges of using open source information for human rights research and advocacy. He also provides some useful resources for journalists new to the field.

You can listen to the podcast on Spotify, SoundCloud or Apple Podcasts. Alternatively, read the edited Q&A with Sam Dubberley below.

What we asked

Talk to us about your career. How did you get started in open source investigation work?

I began my career working in newsrooms. In 2002, I joined the European Broadcasting Union, where I worked as a subeditor, producer, editor, and later ran the newsdesk. I worked in breaking news and saw journalism transition from traditional news reporting to social media newsgathering.

In 2013, I did a fellowship at Columbia Journalism School's Tow Center for Digital Journalism, researching the impact of user-generated content in the newsgathering process. I examined what verification meant along with the ethical questions around using this kind of content. What did it mean for people around the world who now have increased connectivity, the power of a camera in their homes and a power to share that with the world? What did it mean for human rights?

In 2016, Amnesty International asked me to set up a programme called the Digital Verification Corps to help the organisation conduct research using open source information. There was a realisation that this power to tell stories and gather information and testimony digitally was something that the human rights movement had to engage with. I am also a fellow of the Human Rights Centre at the University of Essex where I am a research consultant for their Human Rights Big Data and Technology Project.

Tell us more about the Digital Verification Corps at Amnesty.

The Digital Verification Corps is a network of volunteers from four universities worldwide trained in open source investigation methods that help them verify data from social media platforms. Today, the initiative includes 100 participating students from seven universities. These include Human Rights Centres at the University of California, Berkeley (US), the University of Essex (UK), the University of Pretoria (South Africa), and the University of Toronto (Canada).

The students, who have a background in human rights law, volunteer to join the programme. We teach the students to apply new, digital methodologies when working with human rights documentation and evidence. The technical training they receive allows them to verify the authenticity, location, and time of videos and photographs from social media – skills have proven to be valuable for future research. The outcome of their efforts then helps Amnesty monitor and report on human rights violations.

What are some of the biggest challenges you face in your work as head of the evidence lab in the crisis response team at Amnesty?

There are so many different ways to approach that question. Finding the information and ensuring what you are saying is accurate is very challenging. Another hurdle is persuading researchers who are used to doing testimonial interviews to integrate new forms of data into their work. The impact of seeing the worst that humanity does to each other is also challenging. The traumatic side of the work is a very real one.

Scale has also become a big issue. For example, we put out a report in October 2020 around the 2019 protests in Chile. We used open source information, as well as field testimony to really show who was responsible for injuries. It involved going into great detail and identifying police officers who were on the streets and seeing the same person in video after video. You also have to be meticulous when you're doing that kind of research. To verify the videos, you have to go through it frame by frame and extract every small piece of information you can from a video.

What resources do you recommend for journalists interested in open source investigation?

This week Amnesty is releasing two online courses on Advocacy Assembly, the multilingual training platform for journalists and activists. The courses provide an introduction to methods and best-practice of open source research, and the ways these new methods of information gathering can be used for human rights reporting and documentation. You can sign up for Open source investigations for human rights: Part 1 and Open source investigations for human rights: Part 2 in English, Arabic, Persian and Spanish.

While the content is very much focused on digital verification for human rights work, it is also relevant for journalists. Both courses feature a mix of expert voices, including archivist Yvonne Ng from WITNESS, Amnesty's weapons and military operations expert Brian Castner, and lawyer Lindsay Freeman from UC Berkeley Human Rights Center.

Amnesty also has a website called Citizen Evidence, where we place case studies when we publish a large report that involves open source information. The aim is to show a glimpse of how we do this work behind the scenes.

Sam Dubberley is co-editor of the book ‘Digital Witness: Using Open Source Information for Human Rights Investigation, Documentation, and Accountability’.

Tell us more about "Digital Witness", the book you co-edited.

Human rights NGOs around the world now need to know how to really build on the best information and integrate it into their research because it is so powerful. Through the Digital Verification Corps, we realised these resources did not exist. That's why myself, Alexa Koenig, executive director of UC Berkeley Human Rights Center, and Daragh Murray, from the Human Rights Centre at the University of Essex came together and decided to publish a book about this. It teaches the methods and best-practice of open source research and features contributions from some of the world's most renowned open sources investigators. It also offers a comprehensive range of topics, including the discovery and preservation of data, and ethical considerations, to provide readers with the skills needed to work in an increasingly digitised and information-saturated environment.

What advice can you give journalists new to open source information?

My one piece of advice is not to be intimidated by it and just go for it. I can imagine open source information can seem a bit mystical -- like there is some kind of magical techniques behind it all. Remember that it is just another way of doing research. As a journalist, it is just another way of looking for the who, what, where and when. So you can do it. It involves applying critical thinking skills in a slightly different way.

For instance, are there farmers protesting in India? Well, there's a lot of open source information around what's going on there. Use the resources out there to really build on what is out there. Given this space is so new, the opportunities are vast.

Finally, what are the challenges you see going forward in the sphere of digital verification in human rights?

Researchers and journalists have managed to understand how to search YouTube and Facebook. But the big tech companies are always changing things without consultation. So that becomes a problem. Yet people find ways around them all of the time. Remember that it is not so much the techniques that are important -- it is the way you think about the approach. It is how you develop an open source investigation mindset.

Everyone says geolocation is so important for open source research. Well, sometimes you can't locate it. So what else can you look at to show that something is authentic and shows human rights abuse? Focus on the mindset rather than rely on tools given these are always changing.

Latest from DataJournalism.com

DataJournalism.com celebrates some of the most impactful and inspiring coronavirus-related data stories of 2020. Take a look at the full list curated by Andrea Abellan.

Finding hidden data inside the world's free encyclopedia is no easy task for journalists. On the heels of its 20th anniversary, Monika Sengul-Jones' explains how to navigate the often unwieldy world of Wikipedia. Read or listen to the full article here.

Our next conversation

Our next Conversations with Data podcast will feature an expert talking about the new COVID-19 variants in South Africa and the United Kingdom along with how epidemiological surveillance works. We will also explore what this new variant means for vaccine programmes around the world.

As always, don’t forget to let us know what you’d like us to feature in our future editions. You can also read all of our past editions here.

Onwards!

Tara from the EJC Data team,

bringing you DataJournalism.com, supported by Google News Initiative.

P.S. Are you interested in supporting this newsletter as well? Get in touch to discuss sponsorship opportunities.

Navigating vaccine hesitancy

Tara Kelly — Wed, 16 Dec 2020 12:01:00 +0100

Welcome to the final Conversations with Data newsletter of 2020.

With COVID-19 vaccine programmes rolling out in the United Kingdom and the United States this month, vaccine misinformation continues to be a growing problem globally.

To help journalists better understand this, our latest Conversations with Data podcast features an interview with Professor Heidi Larson, the founder and director of the Vaccine Confidence Project at the London School of Hygiene & Tropical Medicine. In addition to vaccine hesitancy, she talks to us about her new book, "Stuck: How Vaccine Rumors Start and Why They Don't Go Away".

You can listen to the podcast on Spotify, SoundCloud or Apple Podcasts. Alternatively, read the edited Q&A with Professor Heidi Larson below.

What we asked

Tell us about your career. How did you become interested in vaccines as an anthropologist?

I didn't set out in anthropology to study vaccines, but I was always interested in health and ended up spending most of my career working in health. I had worked a lot in AIDS and saw how powerful non-medical interventions were. I later moved into vaccines and headed Global Immunisation Communication at UNICEF. After I left UNICEF, I set up the Vaccine Confidence Project to measure it globally. This involved looking at the types of concerns, developing an index and designing some metrics around this complicated space.

Based on the Vaccine Confidence Project's newly published survey findings, what trends are you seeing around the world?

We just released our global trends in vaccine confidence research in The Lancet in September 2020. The new study mapped global trends in vaccine confidence across 149 countries between 2015 and 2019 using data from over 284,000 adults. The respondents were surveyed about their views on whether vaccines are important, safe, and effective.

What's interesting is that we see Europe got a little bit better, particularly in Finland, France, Ireland, and Italy. We see countries in Northern Africa are becoming more sceptical. We see countries in South America that historically have been like the poster child of vaccine enthusiasm have been wearing at the edges. The research shows how diverse it is regionally and it gives an interesting map.

Professor Heidi Larson is an anthropologist and the founding director of the Vaccine Confidence Project. She headed Global Immunisation Communication at UNICEF and she is the author of "Stuck: How Vaccine Rumors Start and Why They Don't Go Away".

Tell us about your book "Stuck". Who did you have in mind when writing the book?

I had the whole world in mind when writing this. I was playing with "stuck" like an injection, but also "stuck" in the conversation. The book is reflecting on the last 10 to 20 years of my own research and examines how the issues surrounding vaccine hesitancy are, more than anything, about people feeling left out of the conversation. "Stuck" provides a clear-eyed examination of the social vectors that transmit vaccine rumours, their manifestations around the globe, and how these individual threads are all connected.

How likely is it that these new COVID-19 vaccines will stamp out this virus?

Well, it depends on how many people are willing to take the vaccines. I hear estimates from 60 to 70 percent are needed. Some say a bit more. Some say a bit less for the amount of uptake to actually have any kind of population benefit in protecting people against COVID-19. If it doesn't reach that, many individuals will benefit from it, but it won't have that same kind of population benefit.

The surveys we're seeing globally are not showing those high levels of acceptance. You don't know until the vaccine is ready what people do. I mean, look at political opinion polling. People can change their mind there. We still don't have the final information on these vaccines. We've got some indication of how effective they might be. But we're still waiting for more information. And it's going to come.

Many people are concerned about how quickly this vaccine has been developed. But that doesn't mean it hasn't gone through the same safety measures and three-stage trialling as other vaccines, right?

That's true. But as you say, a different context. When the West Africa Ebola outbreak happened, the global health and scientific community realised there was no emergency funding for research trials. There was emergency funding for control measures in treating and isolating it, but there wasn't for trials. So because of that, there was a funding mechanism, Coalition for Epidemic Preparedness Innovation (CEPI) that was ready to fund these COVID-19 trials, which we've never had before. This meant trials could get up and running much quicker.

We also have a lot of new technologies that were not used in previous vaccine development, which have allowed different kinds of processes to move quickly. Administrative processes have shortened a bit. Safety measures have not. That is the one piece of the development that has not shortened. Nobody wants to compromise safety. It would be bad for industry. It would be bad for government. It would be bad for populations. It's in nobody's interest to make a vaccine that is not safe.

What can journalists do to stop the spread of vaccine misinformation?

I think that we have to be careful with deleting the misinformation without putting something else in that space. The reason people migrate to misinformation is that they aren't finding the answers to their questions. So we should use and listen to misinformation to understand what are the kinds of issues that people are trying to find information for. Because if people are all migrating to certain pieces of misinformation, they're not getting what they need somewhere else.

So let's not delete it without giving an alternative story. I think that from a journalistic point of view, that's really important. Talk about why it's shorter. Just keep people engaged in a positive, informed way and don't just leave the space empty. Otherwise, they'll go right back to where they got the misinformation in the first place.

What role do social media platforms have in fighting vaccine misinformation?

I think one of the bigger challenges is the posts on social media that are not explicitly misinformation. They're seeding doubt and dissent. They're asking questions. They're provoking a highly sceptical population. And that's a much harder thing to handle. You can't delete doubt. That's one of the things I'm working very closely with Facebook on -- looking at different ways to rein it in -- at least mitigate the impact of it. Before we wag our fingers at any one social media platform or all of them, I think we certainly, as a global health community, need to do a better job of engaging publics and getting them the right information so they're not going off looking for another story.

Finally, what other credible resources can you recommend for journalists?

You can visit our website VaccineConfidence.org. Another excellent resource is the Vaccine Safety Net, a global network of websites, established by the World Health Organisation, that provides reliable information on vaccine safety. It reviews and approves websites and we're one of their websites that they review and say is credible.

What I like about the Vaccine Safety Net is that it gives you a choice and people want a choice. That's been a real problem in general with the pro-vaccine rhetoric sentiment. It's very homogenous compared to the anti-vaccine movement, which has a lot of different flavours and colours. And so for the public, with their multitude of different concerns and different anxieties, they have a lot to choose from.

Don't miss our latest long reads on DataJournalism.com

Finding hidden data inside the world's free encyclopedia is no easy task for journalists. In Monika Sengul-Jones' long read article, she explains how to navigate the often unwieldy world of Wikipedia.

Our next conversation

Our first Conversations with Data podcast of 2021 will feature an interview with Sam Dubberley, head of the Crisis Evidence Lab at Amnesty International. He will talk to us about his work managing the Digital Verification Corps and the evolution of open source investigations for human rights advocacy.

Sam is a fellow of the Human Rights Centre at the University of Essex where he is a research consultant for their Human Rights Big Data and Technology Project. He is also the co-editor of the book "Digital Witness: Using Open Source Information for Human Rights Investigation, Documentation, and Accountability".

As always, don’t forget to let us know what you’d like us to feature in our future editions. You can also read all of our past editions here.

Onwards!

Tara from the EJC Data team,

bringing you DataJournalism.com, supported by Google News Initiative.

P.S. Are you interested in supporting this newsletter as well? Get in touch to discuss sponsorship opportunities.

Taking stock of open data

Tara Kelly — Wed, 09 Dec 2020 13:07:00 +0100

Using publicly available data to tell stories is becoming more and more important for journalists today. But finding open data in the first place can be a challenge given not all countries share national data in an accessible way.

In this episode of the Conversations with Data podcast, we caught up with Shaida Badiee from Open Data Watch, an NGO focused on monitoring and promoting open data in national statistical offices. As the co-founder and managing director of the organisation, she tells us about its newly released Open Data Inventory -- an annual index assessing the health and openness of national statistical data across 187 countries.

You can listen to our entire podcast on Spotify, SoundCloud or Apple Podcasts. Alternatively, read the edited Q&A with Shaida Badiee below.

What we asked

Tell us about yourself and your work at Open Data Watch.

I'm the co-founder and managing director of Open Data Watch, an international NGO working to support and persuade countries and agencies to do better at making data more open and accessible. Our focus is on data that is needed to guide and monitor sustainable development. We have been in operation since 2014.

I started working on data several decades before in 1977 when I joined the World Bank to help with economic and financial data management for a programme that later became the World Development Report (WDR). I started as a summer temp and I grew in that job as I found my passion for data. In 1995, I led a large change management project which created the development data group of the World Bank. I took over that work and ran that department. I left in 2013 when I actually maxed out on the number of years that I could be at the World Bank.

After 36 years at the World Bank, what persuaded you not to retire?

In 2010, I had an exceptional opportunity to manage the World Bank's open data project, which was initiated by the president of the bank at that time, Bob Zoellick. Since then, the usage of data has skyrocketed overnight. It's set an example for other agencies and countries to follow, too. I learnt a lot from helping the bank to open its data and saw first-hand the complexities it faced. It will take a while before countries and agencies around the world could do the same. That's why I decided to start up Open Data Watch as an independent NGO to monitor progress on open data and support countries through this transformation and realise the benefits of open data.

Shaida Badiee is managing director of Open Data Watch. She has been an active member of the UN Secretary General’s advisory group on data revolution. She co-chairs the Sustainable Development Solutions Network (SDSN) and has played a key role in the startup of the Global Partnership for Sustainable Development Data.

Tell us about your organisation's 2020 Open Data Inventory.

Open Data Inventory (ODIN) is an evaluation of two aspects of data coverage and openness of data, which is provided by the national statistical offices on their website. The inventory collects tons of data used to build a profile of open data in each country. This is used as input to also calculate the score. Often countries data are from a score of one to 100 for the coverage. ODIN includes 22 data categories grouped under social, economic, financial and environmental data. Openness is measured against international standards. It is based on open data for national statistical offices, where users see how countries are doing and how they compare with others, and what needs to change for them to be to do better. This is our fourth round of ODIN this year and we are covering 187 countries.

Why does the world need to take stock of open data?

Open data directly brings a lot of value -- social and economic value on its own to help governments with their daily work and daily decisions. It helps citizens to know what the state of their lives are and assist with international development and research. So just tracking how countries are doing on open data is really important. It is also essential to understand who is left behind and what are the good practises. Open data also indirectly tells us a lot about some of the prerequisites, for example, transparency, accountability, good data governance, and data stewardship. So when you track open data, it also gives you a view of many good practises that countries should show that they have adopted and keep improving over time. And that's what we're trying to do with ODIN.

Talk to us about the results from this year's Open Data Inventory. Did you come across any surprising findings?

Open data for official statistics are on the rise. About 75 percent of countries have increased their score. So that's really wonderful news to hear. And the other surprise is that the top performers used to be all high-income countries. But we see that they are diversifying. The top list now has countries like Palestine and the United Arab Emirates. In this year's index, Mongolia and Slovakia now arrived in the top 10 countries in the world. This is a first for both countries.

Another surprise is that countries in Africa and the Pacific Islands made the most significant improvements. We also see that the openness score is still a challenge for many low-income countries where they need much more technical and financial support to increase their capacity. Everyone is welcome to visit the ODIN website to view the results and send us comments, criticisms. We want to hear all feedback.

The Open Data Inventory (ODIN) measures how complete a country’s statistical offerings are and whether their data meet international standards of openness. Visit Odin.OpenDataWatch.com for more.

From Open Data Watch's perspective, does a strong culture of open data have any connection with a country's ability to report on COVID-19 data?

Well, we still have to do more research on this for hard evidence. But from what we see right now, from the latest ODIN results and comparing the openness score with the countries reporting COVID-19 data, we do see that there is a relationship there. Logically, when you have a good open data practise that means that you have your house in order, that you have procedures in place for data dissemination, you have sorted out your legal and technical issues. So when it comes to emergencies as we have with COVID-19, then you can use those competencies and resources. But when the system is weak, then it's much harder to publish and disseminate data in time for decision-makers.

Finally, what's one or two misconceptions about open data you come across and wish to clear up?

"Open by default" is often misunderstood as everything must become open and countries get really nervous when they hear that. However, that's not what it's about. "Open by default" is about countries having a negative list of what data sets, for reasons of privacy, security or data quality issues are not to be made open, which implies that everything else should be open by default. Countries should not resist and actually embrace it.

The other misconception is that open data is just for rich countries. And we just saw from the 2020 ODIN results that that is not true. Countries in lower-income countries are making very good progress. And the last misconception I often come across is that people think open data is a project that they can just get to. But data is a continuous process. It is a process of modernisation of systems and meeting the needs and demands and adapting and adjusting and modernising with technology, with new findings.

Don't miss our latest long reads on DataJournalism.com

In a fragmented world, can you keep all your news sources and items in one place? Yes, you can. As long as you are intentional, consistent, and use a few simple techniques and tools, things can only get better. Read the full article written by journalist and analyst George Anadiotis.

Our next conversation

In the next episode of our Conversations with Data podcast, we will speak with Professor Heidi Larson, an anthropologist and Director of The Vaccine Confidence Project. She will speak to us about misinformation and vaccines in light of the latest COVID-19 vaccine news. She will also talk to us about her new book, "Stuck: How Vaccine Rumors Start -- and Why They Don't Go Away".

As always, don’t forget to let us know what you’d like us to feature in our future editions. You can also read all of our past editions here.

Onwards!

Tara from the EJC Data team,

bringing you DataJournalism.com, supported by Google News Initiative.

P.S. Are you interested in supporting this newsletter as well? Get in touch to discuss sponsorship opportunities.

Mind the map

Tara Kelly — Wed, 25 Nov 2020 10:00:00 +0100

Welcome back to the latest edition of our Conversations with Data newsletter.

In this episode of Conversations with Data, we spoke with science journalist and author Betsy Mason about how journalists can accurately use maps to help visualise stories. She also explained how bad design, geographical quirks and perceptual illusions can confuse the public, particularly when it comes to election maps.

You can listen to our entire podcast on Spotify, SoundCloud or Apple Podcasts. Alternatively, read the edited Q&A with Betsy Mason below.

What we asked

Tell us about your career. How did you first become interested in maps?

This is a question I often get and it is surprisingly hard to answer because I think I've just always had an affinity for maps. When I was little, I could spend hours poring over road atlas'. Before I became a science journalist, I was a geologist. I got a master's degree in Geology and I ended up working for an oil company in Houston for almost two years. While I was there, I learnt that there was this thing called science writing. So I applied to UC Santa Cruz’s graduate science communication programme and went there and switched careers and have really enjoyed it since.

How did this interest turn into you writing a book?

My obsession with cartography and maps began when I started writing about them at WIRED Magazine as a science editor. I started to blog about maps with my friend and colleague Greg Miller called "Map Lab". We blogged about maps and making maps. The idea was we would learn all about maps and about how to make maps in public and our readers would learn along with us. The more I learnt, the more I loved maps even more. We eventually moved that blog to National Geographic, where it's called All Over the Map, and then did a book together of the same name published in 2018.

Betsy Mason is a freelance writer and editor specialising in science and cartography based in the San Francisco Bay Area.

Why do maps matter to humankind?

I do think that there's definitely something special about maps that uniquely draws us, humans, in. Maps give us an understanding of our place in the world. I believe having a representation of the world that we can place ourselves in is important. It helps us be less inward thinking, less focussed on ourselves and our immediate surroundings. It's about perspective.

In your book, you explain how our brains are built for maps to absorb visual stories. Tell us more about that.

I think that our brains relate to maps differently than to other data visualisations and that we more readily form emotional connections with maps. This is backed by both science and my experience of seeing how others interpret them. Maps are this perfect combination of constraint and creativity. They're based on a very real framework defined by actual geography, but they leave just enough space for creativity and artistry that hits the right spot for human brains.

You also focus on the history of maps in your book. What do maps tell us about the past and why do they matter?

I'd say maps are an important way to record history because they can be so specific to the time and place that they were made. They can reveal so much about the person who made them, their view of the world, what was important to them, what they left out. So they can record history in a way that I think has these sort of different angles that you don't get in other ways.

In this visually stunning book, award-winning journalists Betsy Mason and Greg Miller explore the intriguing stories behind maps from a wide variety of cultures, civilizations, and eras.

You wrote an article days before the US election entitled "Election maps are everywhere. Don't let them fool you". What was your perception of how news organisations used maps to communicate the election results?

Well, I thought that, as expected, the cable news stations that were on for hours and days on end with their magic wall maps -- which are all made with extremely bright reds and blues -- I could see the Chromeo stereotypes. I could see all the problems. We've got a situation where rural, less populated areas tend to vote Republican and therefore are coloured red and concentrated urban centres tend to be more Democratic and are coloured blue. So when you look at the map, it's just a sea of red with dotted blue Islands. But in this election, more people voted Democrat than Republican. That's not the way the map looks though. I think that's particularly problematic right now when we have people who are suggesting that the election was rigged and that there actually is more support for President Trump than there is because it does look like that on the map.

What about election maps published by other news organisations?

Print news organisations like The New York Times and The Washington Post do a lot better. They have different kinds of maps. They use a more even brightness of red and blue. They're not always shading the entire state or county in the colour of the winning party. They also do a thing called normalisation on vote share. So if a county or state is extremely Republican, it'll be a much darker red than if it just barely went Republican. That gives you a much better impression of where the country actually stands.

Those newspapers that I mentioned and some other publications have got really great cartographers on staff, and they're always innovating new ways to try to make more accurate representations of the election. But it's a really interesting and difficult problem. It is also true that when you're mapping by state and it's representing the Electoral College, that sort of unevenness is representative of the Electoral College system. So it's a more accurate representation of this weird way that we elect a president that isn't a direct, popular vote. So it doesn't really matter how Republican Utah or Wyoming are. It just matters that they're Republican and all the electoral votes go that way.

As far as journalists and their mapping skills go, do you ever see a story and wonder why a map wasn't included?

More often than not, I think "this should not have been a map". And that's one of the things that journalists should do is think about whether or not a map is necessary. Generally, there are two kinds of maps when it comes to journalism. There are maps that you use to display information or an idea, and then there are maps that are used basically with data journalism and GIS analysis to reveal the story and to find the patterns. And so obviously those kinds of maps are integral to the story.

What should journalists be mindful of when using maps in their stories?

Throughout history, maps have been used to convince people of things that were true and things that weren't true. And maybe it comes back to that connection with maps that I was talking about earlier. But I think people are very prone to believe things that they see on maps. If it's on a map, it just looks and feels more true. So one thing journalists need to keep in mind is that the same data set can be mapped many different ways that will show or convey a different message. And that's not to say that journalists should just pick the way to map the data that matches the story they're telling. The point is to use maps carefully and responsibly and understand that they're not just illustrations to break up the text.

Can you recommend any useful resources for journalists wanting to learn more?

It's a perfect time to to be a journalist who wants to make maps because it's more possible than it's ever been. For journalists who just want to start incorporating maps or maybe just playing around with maps, a lot of different resources exist out there. I can recommend two free online courses. Dr Anthony Robinson teaches one at Penn State University, and it's called Maps and the Geospatial Revolution. You'll learn about the basic principles of cartography, map design, how to use colour, and how to make maps using the software platform of Esri, which is the biggest mapping software company. There's also Alberto Cairo's course about data visualisation for journalists through the Knight Center. While that course generally focuses on data visualisation, it has a lot about mapping in there, too.

Latest from DataJournalism.com

The final day of News Impact Summit's data journalism event takes place tomorrow. Thursday's programme includes an impressive line-up of speakers:

Julia Angwin, The Markup
Matthew Kauffman, Solutions Journalism Network.
Kiko Llaneras and Mariano Zafra, El Pais
Karoline Albuquerque, Julliana de Melo Correia e Sá & Ciara Carvalho, Jornal do Commercio

Want to attend? Register here.

Our next conversation

In the next episode of our Conversations with Data podcast, we will speak with Shaida Badiee, the co-founder and executive director of Open Data Watch, a non-profit organisation working at the intersection of open data and official statistics. She will speak to us about the 2020 release of Open Data Inventory, which aims to assess the coverage and openness of official statistics to help identify gaps, promote open data policies, improve access, and encourage dialogue between national statistical offices and data users.

As always, don’t forget to let us know what you’d like us to feature in our future editions. You can also read all of our past editions here.

Onwards!

Tara from the EJC Data team,

bringing you DataJournalism.com, supported by Google News Initiative.

P.S. Are you interested in supporting this newsletter as well? Get in touch to discuss sponsorship opportunities.

Meet the undercover economist

Tara Kelly — Wed, 11 Nov 2020 13:47:00 +0100

Welcome back, data enthusiasts. In this latest edition of our Conversations with Data newsletter, we have some exciting announcements for you:

News Impact Summit's Data Journalism event programme is out! Some of the confirmed speakers include Financial Times' John Burn-Murdoch, Civio's Eva Belmonte and The Markup's Julia Angwin. Register here.
The Verification Handbook for Disinformation and Media Manipulation is now available in Arabic, English and Turkish in PDF format. A big thanks to Teyit and Al Jazeera Media Institute for providing the translations.

Now to our podcast. In our latest episode of Conversations with Data, we caught up with the FT columnist and BBC broadcaster Tim Harford about his new book, "How to Make the World Add Up". As an economist and journalist, he talks to us about the power of statistics and why they aren't just a tool for debunking misinformation.

You can listen to our entire podcast on Spotify, SoundCloud or Apple Podcasts. Alternatively, read the edited Q&A with Tim Harford below.

What we asked

Tell us about your career and how it began.

I studied Philosophy, Politics and Economics at Oxford University, which is the classic degree for people who have no idea what they want to do with their lives. I thought I would quit economics, but I was persuaded by a wonderful man called Peter Sinclair not to quit. And my new book is dedicated to Peter, who sadly died this year. Having been persuaded to stick with economics, I ended up teaching in Ireland for a year at University College Cork. This was followed by a master's degree back at Oxford.

After that, I worked in scenario planning for Shell and the World Bank before finally joining the Financial Times in 2006. Around the same time, my book, "The Undercover Economist", was published. A TV show based on the book was broadcast by BBC 2. Shortly after that, I was asked if I would present "More or Less" on BBC Radio 4. It was one of those strange things where, as far as my journalism career went, nothing happened for about 10 years, and then everything happened all at once.

Where did your love for interpreting quirky numbers come from?

Well, I had to learn it after I became the presenter of BBC Radio 4's "More or Less". My training as an economist actually didn't really prepare me to think about statistics. It's a different discipline. Obviously, there are numbers in economics, but it is a different thing. I found myself picking up a lot along the way. And one of the things I think I've learnt is that the difference between good statistical journalism and good journalism, in general, is not as big as some people may think.

"How To Make The World Add Up” by Tim Harford was released in the UK in September 2020. The book will be available in the U.S. and Canada in February 2021 under a different title.

Tell us about your new book. What should readers take away from it?

The book is an attempt to help people think clearly about the world. My argument is that numbers, statistics are a good way to do that. There are a lot of things that we can only understand about the world; there are patterns we can only perceive if we have the data -- if we are able to to use statistics. They are a tool for showing us the way the world is in the same way that air traffic control needs radar, or an astronomer needs a telescope. We social scientists need statistics. That's the basic argument of the book.

What has disturbed me is so much of the way we think about statistics is in the mode of fact-checking and in particular debunking falsehoods. The most popular book ever published about statistics is reportedly "How To Lie With Statistics" by Darrell Huff. I think it's very striking and very worrying that the most popular take on statistics is from cover to cover a warning against misinformation. We can use statistics to illuminate the world. That's what the book is trying to do and argue for.

Who is the book aimed at?

The book is aimed at helping ordinary citizens have a little bit more confidence in their own judgement and showing them how to apply their own judgement to figure out what's true and what's not.

Tim Harford, “the Undercover Economist”, is a Financial Times columnist, BBC broadcaster, and the author of nine books, most recently “How To Make The World Add Up”. (Photo credit: Fran Monks)

In your book, you talk about the importance of getting the back story. What advice do you have for data journalists?

There is a couple of things to bear in mind. One is that just because you've got a wonderful spreadsheet in front of you doesn't mean that you shouldn't be picking up the phone and talking to people. For instance, talking to experts who might understand the data better than you or people who created the data. In one of the other chapters, I warn people against premature enumeration. This is when you've got all the numbers to start analysing, plotting graphs and taking averages. I like to think of this as doing all the cool things that we data journalists like to do with numbers before you actually understand what it is that those numbers are describing. And very often there will be really important facts about the data that are not in the spreadsheet but only in the footnotes. Or they won't be visible at all unless you have a conversation with somebody about how the data were produced.

You continuously work across several projects. Where do you get your ideas from?

The world is full of ideas. I find things that happen in the real world. And because I'm a nerd, I'm always looking for a nerdy perspective on that. My very first book, "The Undercover Economist", is mostly a question of me going. "Ha! When I sit in Starbucks, it looks like this. Why does it look like this? Is there an explanation in economic theory as to why it looks like this?" This has me trying to figure out what it is that I'm seeing in the world, but using this quite nerdy lens. While this is not unique to me by any means, it's not typical for a journalist. And the other thing is, I'm often trying to make connections between one thing and another that are not obvious. I'm thinking to myself, what's the parallel?

Finally, what other upcoming projects are on the horizon for you?

This is book number nine. It's coming out in the U.S. and Canada in February 2021. In the rest of the world, it's called "How to Make The World Add Up", but in the U.S. and Canada, it will be called the "The Data Detective". There will be a new series of "Cautionary Tales" podcast coming out in January as a 14 part series. BBC's "More or Less" will also be back on the air in January. So a lot is going on, and I'm sure a new crazy project will occur to me very soon.

Latest from DataJournalism.com

The final News Impact Summit will be held on 24-26 November. Data Journalism: Build Trust in Media is a free live-streamed event held on YouTube hosted by the European Journalism Centre and powered by Google News Initiative. The full programme includes an impressive line-up of speakers from Solutions Journalism Network, OCCRP, Financial Times, Disclose, Civio, WeDoData, Internews and The Markup. Want to attend? Register here for free.

The FinCEN Files investigation reveals how some of the world's biggest banks have allowed criminals to move dirty money around the world. With more than 85 journalists in 30 countries, ICIJ explains how it investigated leaked documents involving about $2 trillion of transactions using statistical and textual analysis. Read the full article here.

Our next conversation

In the next episode of our Conversations with Data podcast, we will speak with Betsy Mason about election maps and how best to communicate results to voters. Betsy is a freelance writer and editor specialising in science and cartography based in the San Francisco Bay Area. She is co-author with Greg Miller of All Over the Map, an illustrated book about maps and cartography for National Geographic, and the cartography website Map Dragons. Her work appears in numerous publications including National Geographic, The New York Times, Science, Nature, WIRED and New Scientist.

As always, don’t forget to let us know what you’d like us to feature in our future editions. You can also read all of our past editions here.

Onwards!

Tara from the EJC Data team,

bringing you DataJournalism.com, supported by Google News Initiative.

P.S. Are you interested in supporting this newsletter as well? Get in touch to discuss sponsorship opportunities.

Politics and probability

Tara Kelly — Thu, 29 Oct 2020 11:39:00 +0100

Welcome to our 60th edition of the Conversations with Data newsletter. With the 2020 US presidential elections less than a week away and millions of advanced ballots already cast, the world is waiting to see who will take the White House on November 3rd.

To help journalists navigate the political twists and turns of the election, this week's Conversations with Data podcast features Micah Cohen, FiveThirtyEight's managing editor. He talks to us about the power of probability and the uncertainty in election polling data.

You can listen to our entire podcast on Spotify, SoundCloud or Apple Podcasts. Alternatively, read the edited Q&A with Micah Cohen below.

What we asked

Tell us about yourself and your work at FiveThirtyEight.

I'm the managing editor at FiveThirtyEight where I run the newsroom, plan our coverage and help a really talented group of journalists do the best work they can do. I've been working with FiveThirtyEight since 2010. Before that, I worked at The New York Times when Nate Silver, who is the founder of FiveThirtyEight, came there. The Times had the licence to FiveThirtyEight for three years, then ESPN bought it in 2013. That was the point at which we began growing and went from a staff of two to 30 people. After I graduated from Tulane University, I went to The New York Times and then FiveThirtyEight.

Talk to us about FiveThirtyEight. How was it founded, and why is it called that?

Nate started blogging about politics during the 2008 cycle, and eventually, he called his blog FiveThirtyEight, which is the number of electors in the Electoral College. So that's why to win the presidency, you have to have 270 electoral votes, which is a majority of 538. Nate started it because he was getting frustrated with how the Obama Clinton Democratic primary was being covered. He started writing about some common mistakes the media was making in terms of dealing with the 2008 delegate race.

Micah Cohen is FiveThirtyEight’s managing editor and previously wrote for FiveThirtyEight at The New York Times.

FiveThirtyEight shows Biden is favoured to win the US election. What exactly does that 'favoured' status mean?

We have a statistical model that takes in data polls, the economic indicators and spits out win probabilities for each candidate. Those win probabilities are 0 to 100 percent. Biden has an 84 percent chance. The issue is how do you communicate what that means to people. Even very numerate people don't always have a really good sense of what a probability means. For example, right now, Biden has an 84 to 85 percent chance of winning. A lot of people will just round that up to 100 percent and will interpret that as Biden is certain to win the election, even though that is a one in six chance. That's about the same odds as if you're playing Russian roulette and you would get the bullet right.

FiveThirtyEight's interactive election forecast shows Biden is favoured to win the 2020 US presidential election.

In 2016, many polls showed Hillary Clinton was favoured to win the U.S. presidential election. Are there any indicators that show the 2020 polls are a repeat of what happened in 2016? Has the forecast's methodology changed or the way you communicate about it?

The answer to the first question about the methodology is no. Our forecast works largely the same way it did in 2016 and largely the same way it did in 2012. We make refinements and improvements every election cycle. There are some differences in how the forecast handles economic data and there's this new uncertainty index along with COVID specific elements such as mail-in ballots as a source of uncertainty. But the bulk of the forecasting methodology is the same.

The reason we didn't need to really change anything is that we thought the forecast largely worked the way it was supposed to in 2016. It gave Trump about a one in three chance of winning by election day. It identified his most likely route to winning, which was overperforming in the Electoral College relative to the popular vote.

And by the way, all that was was evident in the data. National polls had Clinton up by a few percentage points, and she won by a couple of percentage points. But polling at the time showed Trump performing better in the swing states than he did nationally. So there was reason to think he might overperform in the Electoral College. Now, there were some states where the polls were just off. But that was more of a state-specific problem than a national polling problem. The model is largely the same, but we are communicating what it shows differently this year.

How is FiveThirtyEight communicating that uncertainty differently?

Our forecast starts with a series of maps that are meant to represent the range of likeliest outcomes. So if you go to the page, you'll immediately see it says "Biden is favoured". Then you have a bunch of red and blue maps. As of right now, there are more blue maps (Democrat) than red maps (Republican). So right away, you get a sense that Biden is favoured. But these different maps are meant to show you there is a range of possibilities that are consistent with the data. And that's really the key.

How does this uncertainty impact the way you visually design the forecast?

The whole reason we do a forecast is that polls are our best way of measuring public opinion. They're also our best tool for understanding the state of an election. But they aren't exactly right. There's a margin of error and other sources of error. If the polls were perfect, we wouldn't need a forecast at all. We would just take the polling average in every state and say Biden's going to win this. The forecast is really designed to measure how likely it is that the polls will be wrong.

We are leaning more into what are the sources of uncertainty. What are the chances that the polls are going to be wrong or something unexpected is going to happen? And how could that happen? I think the way we write about the forecast, talk about the forecast and the way our visual journalists design the forecast, leans into that uncertainty and explains the range of possibilities. We're trying to present a more nuanced picture, not a binary one.

There is still a significant share of white women in the suburbs who are on Trump’s side (45 percent), but the chart above shows white men support Trump much more overall. Read the full article.

How do independent voters play out? Is there any data showing how they might impact the election outcome?

Biden is really crushing Trump with Independent voters. And that is one way Biden's polling is really different than Hillary Clinton's from 2016. There are way fewer undecided voters in 2020 than in 2016. For instance, there was a ton of undecided voters up until basically the last couple of days of the campaign in the last election.

Every election, we ask what group is going to swing this election and determine the outcome. Is it independent voters, NASCAR dads or soccer moms? The way elections normally work is a candidate will do better or worse largely across the board. Trump right now, according to the polls, is losing by a wide margin. That's in part because independent voters largely favour Biden. But it's also because every other group of voters has soured on Trump a bit. If you'd look at Trump's approval rating, it is worse now among suburban white women and among Latino voters than it was four years ago. So, yes, Biden is winning independent voters, but he's winning a bunch of other demographic groups, too.

Polling averages are adjusted based on state and national polls, which means candidates’ averages can shift even if no new polls have been added to this page. Read more about the methodology.

Finally, what do you think journalists need to be mindful of when reporting the results on election day?

The first thing I would say is to be careful. Take a moment to think about what you are writing. What are you showing? If it's TV, what do you show graphically? What is the potential to confuse things?

At FiveThirtyEight, we are trying to get our readers at least used to the idea that it might take a while to count the vote and not to assume that if it takes a few days or even a couple weeks to count the vote, that it is inherently a sign of a rigged process or a botched election. There's this pandemic, and the country has to adjust its election administration accordingly. It's therefore going to take us longer to count the vote this year than it has in other years. So that's one thing.

On another note, it is also important to be more careful about how you are talking about the results. Think about how you are showing the results and be much more specific in describing for readers and viewers what kind of results we are getting. And most importantly, explain what we still don't know about the election results and lead with that. Taking those precautions better prepares readers and viewers for what could be a confusing night and keeps everybody on the same page.

Latest from DataJournalism.com

The more journalists know about polls, how they work and how to evaluate their quality, the closer they come to clarity and accuracy in reporting. In Sherry Ricchiardi's latest Long Read article for DataJournalism.com, she provides useful resources and expert advice on how to cover opinion polling data in the upcoming 2020 presidential elections. Read the full article here.

The final News Impact Summit has been pushed back to 24-26 November. Data Journalism: Build Trust in Media is a free live-streamed event held on YouTube hosted by the European Journalism Centre and supported by Google News Initiative. The full programme with confirmed speakers will be announced soon. Register here.

Our next conversation

In the next episode of our Conversations with Data podcast, we will speak with economist and journalist Tim Harford about his latest book "How to Make the World Add Up: Ten Rules for Thinking Differently About Numbers". As one of the UK's most respected statistical journalists, he is best known for his long-running Financial Times column "The Undercover Economist" and his BBC radio show More or Less.

As always, don’t forget to let us know what you’d like us to feature in our future editions. You can also read all of our past editions here.

Onwards!

Tara from the EJC Data team,

bringing you DataJournalism.com, supported by Google News Initiative.

P.S. Are you interested in supporting this newsletter as well? Get in touch to discuss sponsorship opportunities.

FinCEN Files: Q&A with ICIJ's Emilia Diaz-Struck

Tara Kelly — Wed, 14 Oct 2020 13:45:00 +0200

Managing a data-led investigation is no small feat no matter the size of your organisation. But how do you build a collaborative culture with hundreds of journalists across the world and ensure your investigation makes an impact?

In this week's Conversations with Data podcast, we talk to Emilia Diaz-Struck from the International Consortium of Investigative Journalists' (ICIJ) about its latest cross-border investigation, the FinCEN Files. As ICIJ's research editor and Latin American coordinator, she explains how the team used data-driven tools to sift through and analyse trillions of dollars in transactions to reveal the role of global banks in industrial-scale money laundering.

You can listen to our entire podcast on Spotify, SoundCloud or Apple Podcasts. Alternatively, read the edited Q&A with Emilia Diaz-Struck below.

What we asked

Tell us about your role at ICIJ.

I am currently based in Washington, D.C. and I oversee all of our data and research work. I work with an amazing team of data journalists, researchers and developers. We mine up to millions of records for our investigative projects and analyse complex data sets. At the same time, we collaborate with journalists all around the world. I also coordinate our Latin American partnerships and focus on bringing great journalists on board to work on large investigations with us.

In September 2020, ICIJ published the FinCEN Files. Talk to us about this investigation.

The FinCEN Files is an investigation that started after Buzzfeed reached out to us. They had a collection of documents called suspicious activity reports (SARs) that haven't been known to the public. These reports are filed by banks to the Financial Crimes Enforcement Network, known as FinCEN, a financial crime agency within the U.S. Department of Treasury -- hence the name of the project. They report on transactions that could be flagged or considered suspicious because they could potentially involve money laundering.

The files were obtained as part of the U.S. congressional investigation into the Russian interference of the 2016 U.S. elections. The FinCEN Files investigation looked at 2,100 suspicious activity reports. Among those reports, 98 percent were filed between 2011 and 2017. To put that into perspective, that's only 0.02 percent of the reports that FinCEN receives in a single year. So the findings are only a small sample of the activity during that period.

ICIJ and 108 media partners spent months investigating secret bank reports shared by BuzzFeed News.

What did these suspicious activity reports reveal?

The 2,100 reports gave us an unprecedented look into the anti-money laundering monitoring system by banks. It also allowed us to explore some of this monitoring system's failures. For instance, ICIJ found a lag of 160 days since the activity started until banks reported it to the authorities. However, the law says they usually should report it in the first 30 days. The investigation also revealed some reporting system failures. In about half of the reports, they had at least one client they flagged where they didn't know who the person or company was behind the transaction. These reports came from intermediary banks, which are used to pass money through the financial system from bank to bank. Due to U.S. law, these types of corresponding or intermediary banks are the ones that are sending out these reports to FinCEN.

Emilia Díaz-Struck is ICIJ's research editor and Latin American coordinator. She has taken part in cross-border investigations such as ICIJ's Offshore Leaks, Luxembourg Leaks and Swiss Leaks projects; a collaboration between journalists from 5 countries that revealed the illicit trade of coltan; and a collaboration with Costa Rica's La Nación newspaper on a real estate scandal involving a former Venezuelan magistrate who fled ahead of an arrest warrant.

Who took part in the FinCEN Files investigation?

The investigation involved more than 400 journalists in 88 countries and took place over 16 months. Altogether, we were 110 media outlets participating in the project.

How did you start to collaborate with these global partners?

We always start by figuring out what kind of data we have in front of us. Are there any interesting people or organisations of public interest in the data that allow us to bring partners on board? We also ask ourselves, how global is this data? One of the findings was that the data connected to more than 170 countries. So that's when you start figuring out highlights tied to specific countries. So if we noticed Venezuala in the data, we would start involving partners in Venezuela. We continued developing partnerships with other countries based on relevant data showing up in the files.

About 16 months ago, we began planning and held a big meeting in Germany. The aim was to figure out what the documents were about, but also to coordinate. For instance, how are we going to work together? Are there any potential collaborations across countries? Partners also agreed on keeping it confidential and to share with everyone and not to keep the findings for themselves.

Explore Confidential Clients. It shows the notorious names behind billions in suspect money. The categories reflect allegations that often prompted the reports or suspicious activity flagged by banks and are not necessarily indicative of misconduct.

What kind of data-driven tools did you use for the investigation?

Our tech team has developed a number of useful tools for us. One is called the DataShare. This open-source tool allows people to explore the documents. All the documents for the FinCEN Files investigation were added there. So that helps people explore the data in a secure way. DataShare also does entity extraction. So you can get a sneak peek of what entity names are there and do searches, as you would do in an open search platform. But this one was protected and only exclusive for the project.

We also used what we call the Global I-Hub. This collaboration platform is like a secure social network for journalists, where you have groups and subgroups allowing you to organise the whole investigation along with a project manager to provide guidance. The complexity of the files meant we had to do a large collaboration. Data-wise, this project involved more than 85 journalists in 30 countries extracting data from the files. While we tried to automate as much as possible, a lot of manual work was required to extract the data from the files. We couldn't get all the insights we needed. So we actually created what we call the 'data extraction party' done via the Global I-Hub.

Explore the confidential data at the heart of the FinCEN Files.

Finally, what has been the impact of this investigation?

It is still too soon to say. But so far there have been discussions about regulations. For instance, Elizabeth Warren and Bernie Sanders have made public statements about the FinCEN Files and they're looking into the regulatory part of it. We know in Europe there have also been open discussions examining the current regulation that banks have, which is tied to this reporting system. There are questions about whether there are improvements that can be made into the regulatory system regarding the different aspects that our investigation highlighted. There are many stories with this investigation and there are still more to come.

Latest from DataJournalism.com

Disinformation has become a major factor in the 2020 American presidential campaign. In our latest long read, journalist Sherry Ricchiardi examines the methods manipulators use to inflame political and social tensions. The piece also focuses on the challenge: 'What can journalists do to avoid being easy marks?' Read the full article here.

The European Journalism Centre and Google News Initiative offer a series of new conferences in October and November to discuss the latest innovations in journalism. Live-streamed on YouTube, the topics will cover Data Journalism as well as Audio & Voice. Register here!

Our next conversation

In the next episode of our Conversations with Data podcast, we will speak with Micah Cohen, the managing editor at FiveThirtyEight. He will talk to us about the uncertainty in polling data and how journalists can report on the upcoming US elections.

As always, don’t forget to let us know what you’d like us to feature in our future editions. You can also read all of our past editions here.

Onwards!

Tara from the EJC Data team,

bringing you DataJournalism.com, supported by Google News Initiative.

P.S. Are you interested in supporting this newsletter as well? Get in touch to discuss sponsorship opportunities.

Storytelling beyond charts and graphs

Tara Kelly — Wed, 30 Sep 2020 09:46:00 +0200

Here is a riddle for you. How do you tell a story for all ages using data without charts and graphs? That was the challenge set by data designer Stefanie Posavec and data journalist Miriam Quick when creating "I Am a Book. I Am a Portal to the Universe."

In this week's Conversations with Data podcast, they tell us about the book's concept and provide some useful advice for successful data design collaborations.

You can listen to our entire podcast on Spotify, SoundCloud or Apple Podcasts. Alternatively, read the edited Q&A with Stefanie and Miriam below.

What we asked

Tell us about your work in data visualisation.

Miriam: I'm a data journalist, and I write data stories for news organisations like the BBC. I work on information design and have done projects for clients as a researcher as a data analyst and a copywriter. I also collaborate with artists like Stefani to make data art pieces for museums and galleries. So I'm interested in working with data about science, the arts, and particularly music because I have a PhD in musicology. I try to create musical visualisations whenever I can. I'm best known for my project Oddityviz with designer Valentina D'Efilippo. The project visualises data from David Bowie's song Space Oddity as a series of 10 engraved records, posters and animation.

Stefanie: As for me, I'm an information designer, artist and author. My favourite material to work with is data. I work with a lot of different forms from a project that you can dance through or hop through. I'm best known for my Dear Data project, a year-long project I worked on with designer Giorgia Lupi. Every week we collected our personal data and then we drew it on a postcard and would send it to the other person. The project culminated into a book called Dear Data, and then later into a journal that people can use to do this very personal data gathering and drawing process themselves. The project's postcards and sketchbooks are held in the permanent collection of Museum of Modern Art, New York.

Talk to us about the concept of your new book, "I Am a Book. I Am a Portal to the Universe."

Miriam: The book is called "I Am a Book. I Am a Portal to the Universe." It has been published in the U.K. just this September by Particular Books. The basic idea behind it is that the book itself is a measuring device. So you can use the book to measure things. Each one of the measurements of the book --from the thickness of its pages to the noise it makes when slammed shut --embodies a fascinating fact or data point. For example, hold the book to the sky and see how many stars lie behind its two pages. Or in the time it takes to turn one page, how many babies are born and how many people died? How many molecules is the book made of? All of these different stories use the book as a physical object.

"A love letter to book design, which - through fascinating data measurements - unlocks an understanding of the world around us."

What audience is the book targetting?

Stefanie: It's an all-ages book geared towards children age eight and up through to adults. The book aims to target a broad audience and is for people who might not necessarily pick up a book with the word data or science in the title. I think another critical thing about the book's concept is that we wanted to make sure that all of the data represented in it would be on a one to one scale printed on the book of actual size. So, there's no abstraction. We both think it creates a very particular effect to have the data represented in this way throughout the book.

What was the process like for coming up with the idea for this book?

Stefanie: We are used to collaborating on projects together and have been working together since 2012. We were sitting in a cafe together discussing ideas for new collaborations, and it came to us. We were given the opportunity by our agents to take this forward into a full book proposal. After we got the book deal, we started with the tone of voice and began researching the data. We also set ourselves a design brief which entailed coming up with design rules, such as no traditional charts or graphics allowed within this book. We didn't want this book to have anything that you might typically see in a traditional information graphic book.

Stefanie Posavec is a designer and artist who uses data as a creative material. Her work has been exhibited internationally at major galleries including the Centre Pompidou, the V&A, the London Design Museum and Somerset House, and is held in the permanent collection of the Museum of Modern Art, New York.

How did this project differ from other collaborations you've worked on together?

Stefanie: I would say the process was quite similar. Most of the time, Miriam does the data, and the words and I do the design. Next, there is this kind of a fuzzy bit in the middle where they all start to mix up, and we begin to have opinions about each other's work. And that's where the really interesting stuff happens.

Miriam: I guess for me, any real difference is that compared to the other projects we've done, this was just so much bigger. It took two and a half years from start to finish, and there was a bigger project management element to it. For me, it was quite a new territory in terms of the material we were working with. I had to become familiar with all the publishing industry terminology quite quickly.

Miriam Quick is a data journalist and researcher who explores novel ways of communicating data. Bylines include the BBC and Information is Beautiful. She co-creates data artworks, exhibited at the Wellcome Collection, Southbank Centre and Royal College of Physicians and internationally.

What advice do you have for designers and journalists collaborating on data projects?

Miriam: One of the things I've learnt as a data journalist is that drawing exploratory plots is useful as part of the data analysis. But also, while you're researching and gathering data, try and understand what kinds of insights data sets can visually support the story. My tendency early on in my career was to give the designer loads of notes, caveats and sources. But I quickly realised that they weren't reading them. So I learnt to slim that down. I think it's a case of striking a balance between giving the designer all the information they need and not overwhelming them with irrelevant information.

Stefanie: As a data designer, I think it is essential to work with a data journalist you trust. I love this collaboration because working with a data expert like Miriam means she can transform and analyse data in ways that I never would. I respect her expertise, and it makes me happy as a designer because it means that I can focus on the things that I love best, which is that translation of number to form.

Finally, what other upcoming projects do you have in the works?

Miriam: I'm working with the data journalist Duncan Geere on a podcast about data sonification. It's called Loud Numbers and it will be out early next year. The idea is that in every episode we will take a data set and sonify it. We will create music using code to represent the data. The whole episode will explore that data set through sound. You can sign up to our newsletter on LoudNumbers.net or follow us on Twitter or Instagram.

Stefanie: I am working as an artist in residence for a research project between Warwick University, Goldsmiths and Imperial College looking at the impact of data personalisation. The project is called "People like you" and has been running for a few years now. I am researching how the various stakeholders in a biobank perceive the ‘people behind the numbers’ who consent to their biological samples and data being used and stored for research. This research will inform the creation of a data-driven artwork aiming to communicate these insights to a wider audience.

Latest from DataJournalism.com

The European Journalism Centre and Google News Initiative offer three new conferences in October and November to discuss the latest innovations in journalism. Live-streamed on YouTube, the topics will cover Data Journalism, Audio & Voice and Audience. Register here!

Our next conversation

In the next episode of our Conversations with Data podcast, we will speak with Emilia Díaz Struck, research editor and Latin American coordinator at the International Consortium for Investigative Journalists (ICIJ). She will talk to us about the use of data in the FinCEN files, an investigation revealing the role of global banks in industrial-scale money laundering — and the bloodshed and suffering that flow in its wake.

As always, don’t forget to let us know what you’d like us to feature in our future editions. You can also read all of our past editions here.

Onwards!

Tara from the EJC Data team,

bringing you DataJournalism.com, supported by Google News Initiative.

P.S. Are you interested in supporting this newsletter as well? Get in touch to discuss sponsorship opportunities.

Q&A with Alberto Cairo

Tara Kelly — Fri, 18 Sep 2020 18:35:00 +0200

Well-designed data visualisations have the power to inform and leave a lasting impression on audiences. From cutting through the clutter to simplifying complexity, visual storytelling is no longer an add on -- it's a necessity.

In this week's Conversations with Data podcast, we spoke with author, educator and data visualisation expert Alberto Cairo. He is the Knight Chair in Visual Journalism at the School of Communication of the University of Miami (UM) and also serves as director of the visualisation programme at UM's Center for Computational Science. He talks to us about the latest edition of his book "How Charts Lie", and provides some useful advice for journalists covering the US elections.

You can listen to our entire podcast on Spotify, SoundCloud or Apple Podcasts. Alternatively, read the edited Q&A with Alberto Cairo below.

What we asked

Tell us about the start of your career. How did you first become interested in data visualisation?

Like most people working in data visualisation, I ended up entering this field by happenstance. As you know, I'm originally from Spain and I studied journalism with the idea of eventually working in radio. I really liked radio and I had an internship in Spain's National Public Radio, and I even read the news. In the last year of my journalism studies, a professor of mine saw that I could sketch things out. She recommended me for an internship, in the graphics department of a newsroom. I knew nothing about information graphics or explanation graphics at the time. But I got into that internship when I was 22 or 23 and I fell in love with it and I learned on the job. I had very good teachers in the newsroom and I stayed in the field ever since.

At the time, I barely did anything data-related. Instead, I was mostly designing explanatory graphics. This meant using illustrations, 3D models and animation to tell stories. I eventually became El Mundo's head of graphics. This shift towards data visualisation happened around 2009 when I began studying cartography. As it has a high quantitative component, I also started studying statistics. So cartography led me to data visualisation and that's how I began working in this field.

Your latest edition of "How Charts Lie" is coming out soon. Does it cover any new material?

The new paperback edition comes out on 13 October 2020. There's some new material including a new epilogue about COVID-19 and what has happened with graphics covering the pandemic. It is relatively short, about 10 pages or so. But that's the only addition to the book. The rest is identical to the hardcover edition, with the exception that we are using a different colour palette. The publisher decided to change the cover. Instead of being a blue cover, it's a yellow cover, which is very eye-catching. I really like it. It's much brighter and perhaps a little bit less serious. But I don't mind, as the book itself is written in a cheeky tone. The colour palette inside, instead of being based on red and greys, is more orange and reds. And I really like it. So it's essentially the same book, but with those extra 10 pages about COVID-19.

Remind us again what the focus of the book is?

"How Charts Lie" is a book for the general public. It's a book for anybody who wants to become a better reader of charts, meaning statistical graphs and data maps, etc. The title is a provocation that intends to attract people to the content of the book. But the book itself is obviously not a book about how to lie with charts. It's more a book about how to become a better reader of charts. And moreover, a better title of the book could have been, "How we lie to ourselves with charts". The book points out how charts that are otherwise correctly designed, correctly made, may still mislead you if you don't pay attention.

This ties in with the misconception that visualisations are like illustrations, that you can just take a quick look at it and move on. The point that I make in the book is that we need to stop treating maps, graphs and charts as if they were drawings. We need to start treating them as if they were text.

"How Charts Lie" examines contemporary examples ranging from election result infographics to global GDP maps and box office record charts, demystifying an essential new literacy for our data-driven world.

You served as artistic director for "At the Epicentre" -- a brilliant visualisation about the COVID-19 death rate in Brazil. Tell us more about the project.

"At the Epicentre" is essentially a data visualisation that asks the following question: "What if all the people who die of coronavirus in Brazil were your neighbours? How many people around you would disappear?" The visualisation traces a circle around you and shows you how wide that circle could be if all 100,000 people or so were your neighbours. It shows that human beings have a tough time understanding numbers unless those numbers put us at the centre.

I must clarify that I was the art director for this project, but I cannot take credit for the idea. The creators are responsible for the idea. We worked with a team of designers and developers in Brazil as well as Google News Initiative -- Vinicius Sueiro, Rodrigo Menegat, Tiago Maranhão, Natália Leal, Gilberto Scofield Jr., Simon Rogers and Marco Túlio Pires. The data visualisation is available in both Portuguese and English.

Alberto Cairo is a journalist and designer, and the Knight Chair in Visual Journalism at the School of Communication of the University of Miami (UM). He is also the director of the visualisation programme at UM’s Center for Computational Science.

Finally, what advice do you have for journalists and designers covering the upcoming US elections?

I would say journalists should try to deemphasise horse race narratives in their reporting. We journalists tend to overemphasise the latest poll. For instance, one may show Trump is losing so many points in comparison to the previous poll. Another may show Biden is gaining so many points over an earlier one. This is just another mantra. Any single poll is always noise. What really matters is the weighted average of all those polls. Over a period of time, five or 10 polls all pointing in the same direction might be a pattern. But the outcome of a single poll itself means nothing. So if one poll shows Trump is gaining four points, that's meaningless because that may be just noise. Maybe it's a product of polling error. We don't know.

I wish that journalists would be a little bit more cognizant on how to think probabilistically. My advice is to learn a bit about probability. And that might be an idea for my next book. All of these polls and forecasting, are interesting to read. I'm not against publishing numbers, but I wish that we would emphasise even more than we do right now the level of uncertainty around those estimates that we're making.

Latest from DataJournalism.com

Our next conversation

In the next episode of our Conversations with Data podcast, data designer Stefanie Posavec and data journalist Miriam Quick will talk to us about their new book, "I am a book. I am a portal to the universe." We will also hear about their visual inspirations and passion for pushing the creative boundaries in data storytelling.

As always, don’t forget to let us know what you’d like us to feature in our future editions. You can also read all of our past editions here.

Onwards!

Tara from the EJC Data team,

bringing you DataJournalism.com, supported by Google News Initiative.

P.S. Are you interested in supporting this newsletter as well? Get in touch to discuss sponsorship opportunities.

A.I. adoption in media

Tara Kelly — Thu, 30 Jul 2020 10:35:00 +0200

From personalisation to content moderation, artificial intelligence touches many facets of journalism. Still, many hurdles exist for A.I. adoption in the newsroom. Among them include a lack of funding for media innovation and severe knowledge gaps around these A.I. technologies.

In this week's Conversations with Data podcast, we spoke with Professor Charlie Beckett, the founding director of Polis, the LSE's media think-tank for research and debate around international journalism. He talked to us about Polis' JournalismAI project and the findings from its global survey of journalism and artificial intelligence.

You can listen to our entire podcast on Spotify, SoundCloud or Apple Podcasts. Alternatively, read the edited Q&A with Professor Charlie Beckett below.

What we asked

Tell us about your journey from journalism to academia.

I was an incredibly traditional journalist back in the last century. I started on local papers and worked my way up through television news at the BBC and so on. Back in 2006, I left Channel 4 News to become director of a new journalism think tank at the LSE called Polis. It was a bit like doing a start-up as I was given this blank sheet of paper to create it. At the time journalism was exploding with all these new technological waves -- from the move online to the explosion of social media. I wrote a book about network journalism back in 2008. My work has been about exploring all these digital technologies and understanding their impact -- all the stuff I couldn't do when I was a journalist. And for me, that was the attraction of A.I., the latest wave of this technologically driven change.

As a media think tank, what is Polis' mission?

Polis is a journalism think tank within the Department of Media and Communications at the LSE. I spend all my time working with journalists or with people who are affected by journalism. It aims to provide a forum to talk about journalism and its impact on society and to do research, of course. But primarily, it is to be a bridge between the researchers, the academics and the journalists themselves. And increasingly, it's become a bridge within journalism. Polis' JournalismAI project is very much about connecting journalists around the world in small and large newsrooms as well as different sectors. We share knowledge and best practice.

How did Polis' A.I. journalism work emerge?

Last year we received a subsidy from the lovely people at Google, which allowed us to do this global survey. The first thing was to find out what was happening. The survey pointed to an incredibly varied and rich picture of all sorts of activity in all sorts of newsrooms in all or parts of the news process -- all the way from newsgathering to news delivery. The survey also told us about where the gaps, the challenges and the problems were for A.I. adoption and adaption.

Now we are in the second year, where we have been trying to find not solutions, but at least pathways. People told us was that there was a huge knowledge problem as well as an understanding problem. That led us to create a free training course in collaboration with VRT News and supported by Google News Initiative. People also told us that they were really struggling to find the time and resource to do innovation and to prototype new ideas. This led us to create an experimental collaboration project to do precisely that.

Tell us more about this experimental collaboration project.

It's called the JournalismAI project. We've got more than 40 journalists from every kind of news organisation around the world, from Hong Kong to Argentina. They've divided into groups and are working on a series of challenges. Those could range from how do we get more editorial diversity in our content to how do you maximise the revenue from subscription models. They've just been working for a couple of months, and their energy and their thoughtfulness has been inspiring. They're sparking off each other, sharing experiences, coming up with ideas that they wouldn't necessarily be able to do in their own newsroom. We've got a group of fantastic coaches who are editorial innovators to help out too. The idea is that by the autumn, they will come up with something. It probably won't be a finished, beautifully produced tool or system. Still, they'll come up with a prototype or an idea for a prototype or system that could be useful in many news organisations.

Professor Charlie Beckett is the founding director of Polis, the LSE's media think-tank. He also leads the Polis JournalismAI project, a global initiative aiming to inform media organisations about the potential offered by A.I. powered technologies.

Where should a journalist start if they want to learn more?

Start with our one-hour training course, Introduction to Machine Learning. It is built by journalists, for journalists, and it will help answer questions such as: What is machine learning? How do you train a machine learning model? What can journalists and news organisations do with it, and why is it important to use it responsibly? There are some great books out there, too. Nicholas Diakopoulos' book gives a more academic take on automating the news. There's also Francesco Marconi, who's written a much more newsroom focussed A.I. in journalism book. People are starting to create courses, but they are more academic at this stage. Journalism schools are still catching up.

Lastly, what is the biggest hurdle to A.I. adoption in the newsroom?

Money. It's not just money in terms of let's spend the cash to get the shiny new things. It's investing the time and money in the long term. It can take you three to six months to develop it followed by another six months to iterate it. And then you still have to pay attention to it. I think that is the biggest problem is applying those resources and having the general knowledge in the newsroom to do that efficiently. There is no easy answer to that, I'm afraid. We hope that as these technologies become more mature and more adapted, that they'll be replicable.

Machine learning has been around for a while, but it is rapidly developing in almost every field. But the difference is that pharmaceuticals or fintech have got gazillions of dollars to develop these and it can do it at scale. The news industry is relatively small and has far less money for independent R&D. And that's why -- love them or loathe them -- we should work with the tech companies, universities and other partners who can help news organisations to see what might work for them.

Latest from DataJournalism.com

As Black Lives Matter protests spread around the world, journalist Sherry Ricchiardi examines the verification methods and OSINT tools journalists and non-profits like The New York Times and Amnesty International used to portray the movement in the wake of George Floyd's death. The long read article also points to useful resources from the Global Investigative Journalism Network, First Draft News and the Committee to Protect Journalists. Read the article here.

Our next conversation

The Conversations with Data podcast will be back in September. We will discuss social network analysis -- an advanced form of analytics that is specifically focused on identifying and forecasting connections, relationships, and influence among individuals and groups.

As always, don’t forget to let us know what you’d like us to feature in our future editions. You can also read all of our past editions here.

Onwards!

Tara from the EJC Data team,

bringing you DataJournalism.com, supported by Google News Initiative.

P.S. Are you interested in supporting this newsletter as well? Get in touch to discuss sponsorship opportunities.

AI in the newsroom

Tara Kelly — Thu, 16 Jul 2020 10:53:00 +0200

From data mining for investigative journalism to automated content for finance and sports coverage, journalists are no strangers to automation and machine learning. But what are some of the opportunities and challenges that come with A.I. making its way into the newsroom?

To better understand this, we spoke with Professor Nicholas Diakopoulos, an Assistant Professor in Communication Studies and Computer Science (by courtesy) at Northwestern University where he directs the Computational Journalism Lab. He explains how A.I. can assist journalists in discovering, composing and distributing stories and talks about his new book, Automating the News: How Algorithms are Rewriting the Media.

You can listen to our entire podcast on Spotify, SoundCloud or Apple Podcasts. Alternatively, read the edited Q&A with Professor Nicholas Diakopoulos below.

What we asked

Let's start with the basics. How do you define an algorithm?

A straightforward way to define an algorithm is as a series of steps that are undertaken in order to solve a problem or to accomplish a defined outcome. The easiest way to think about this is to compare a recipe to an algorithm. The recipe defines the ingredients and a set of steps that show you how to combine them in order to achieve that delicious dinner that you want to cook.

Of course, there are far more complicated, complex algorithms in computer science. Oftentimes in computational journalism, we are interested in digital algorithms that run on computers and allow us to scale up information workflows or information recipes. So you know how to structure the transformation of information from data to make sense of that data, visualise it and publish it. One way to think of computational journalism is to think about what the algorithms are for producing news information.

How would you define A.I.?

For A.I., the meaning seems to change over time, depending on who you're talking to. I think a fair definition of A.I. is a computer system that can perform a task that would typically require some level of human intelligence. The current wave of hyped-up A.I. technology is based on machine learning, where a system essentially figures out how to accomplish a task based on a bunch of data that it's been trained with.

Let's take Trint, for example. It's an automated transcription service for transcribing and interpreting audio material. These types of algorithms are trained on hours and hours of recorded audio, together with manually typed transcripts. The algorithm, based on those two types of data, figures out what is the mapping between the sound that someone makes and the word that you would use to transcribe that sound. If you give the computer enough examples of that, then it can do a pretty good job of transcribing a new piece of audio that you feed it.

In your new book, Automating the News: How Algorithms Are Rewriting Media, you explain how A.I. is making its way into the newsroom. Tell us more about this.

There are so many facets of journalism which are touched by A.I. and automation right now. Some journalists are using data mining to find interesting new stories from large datasets. These technologies can also be used to automatically generate content like written articles or videos. Another use case is to implement things like social bots or chatbots where people can interact and chat with A.I. systems to get information.

Given all of the content produced on a daily basis from a large newsroom, journalists might also want to use algorithms to figure out how to optimise click-through rates or distribution patterns around that content. To boost audience views, algorithms can certainly help determine which headline is producing more engagement and amplify the amount of traffic that content can receive online.

How are investigative journalists using A.I. in their research and reporting?

One of the areas I'm pretty excited about and where I've seen it effectively used in journalism is increasing the scale, the scope and the quality of investigative journalism. There's plenty of examples now of investigative journalists using machine learning approaches to define a pattern in a data set that they think is newsworthy. They then use that machine learning model to find other patterns in the data that are similar to the one that they want to find. That helps these journalists to cover a lot more ground in terms of the stories that they're able to find for the scale and the scope of the investigation.

Can you share a specific example of this?

A few years ago, The Atlanta Journal-Constitution used this approach to try to find doctors who had been accused or even found to be guilty of sexual misconduct in their practice, but who were still practising medical doctors. Once the journalists knew that there was this pattern, that there were these types of doctors out there, they could look for that pattern in other documents that they could scrape up. They were then able to take what might have been a regional or state level story and turn it into a national-level investigation. So we definitely see the power of A.I. in those types of contexts.

How else can A.I. be used in the newsroom?

A.I. can also have a lot of value in reducing repetitive work. For example, The New York Times and The Washington Post have been using machine learning technologies for helping with content moderation. They get thousands of millions of comments per month submitted to their websites. A.I. helps filter through it and potentially weed out comments that are inappropriate or bring down the level of the discourse on the site. Another area where we see A.I. bringing some value to journalism is its ability to increase the speed of information. This is particularly valuable for breaking news or finance.

Professor Nicholas Diakopoulos is an Assistant Professor in Communication Studies and Computer Science (by courtesy) at Northwestern University where he directs the Computational Journalism Lab. His research focuses on computational journalism, including aspects of automation and algorithms in news production, algorithmic accountability and transparency, and social media in news contexts. He is author of the book, Automating the News: How Algorithms are Rewriting the Media, published by Harvard University Press in 2019.

What is the biggest misconception people have about A.I. in journalism?

The biggest misconception that I encounter over and over again is that journalists think A.I. is going to take their job. Having researched this for several years now, I just haven't found that to be the case. Instead, what I found is A.I. creates new jobs and new opportunities rather than destroying them. But it's also changing jobs. For instance, to be an effective journalist you might need to know how to work with data or have the ability to learn new tools. A.I. can't do entire jobs. It can usually do little tasks or pieces of jobs. Even if it could come along and do 10% of someone's job, that's not going to take away their entire job. It's just going to mean that they have 10% more hours in their job or to do other things that they couldn't do before. Hopefully, A.I. is going to make journalists more effective and efficient and to augment their capabilities.

Lastly, what is next for A.I. in the media?

One thing that I'm pretty excited about is the new class of technology related to synthetic media. Most people have probably heard of deep fakes -- automatically generated images and videos that are created using machine learning models. But fewer people may realise that the same type of technology can also be used to generate text. So whether that's news articles or comments or other forms of text. I think it's interesting to think about how synthetic media can be could be adapted to journalistic requirements. It's going to turbocharge automated content production, both in positive ways in terms of like news organisations being able to churn out a lot more data-driven stories, but potentially also negative ways by creating new opportunities for misinformation.

Latest from DataJournalism.com

We are excited to announce that the News Impact Summits are back for a 2020 online edition! Organised by the EJC and powered by Google News Initiative, the events will cover three topics that are driving innovation in journalism: Data Journalism, Audio & Voice and Audience. The Data Journalism stream will take place on 3-5 November. Register here!

Our next conversation

In the next episode of our Conversations with Data podcast, we will be speaking with Professor Charlie Beckett from the London School of Economics (LSE). As the founding director of Polis, LSE's media think-tank, he also leads the Polis JournalismAI project, a global initiative aiming to inform media organisations about the potential offered by A.I. powered technologies. In addition to discussing automation in news production, we'll also talk about the findings from Polis' global survey of journalism and artificial intelligence.

As always, don’t forget to let us know what you’d like us to feature in our future editions. You can also read all of our past editions here.

Onwards!

Tara from the EJC Data team,

bringing you DataJournalism.com, supported by Google News Initiative.

P.S. Are you interested in supporting this newsletter as well? Get in touch to discuss sponsorship opportunities.

The power of predictive analytics

Tara Kelly — Wed, 24 Jun 2020 15:51:00 +0200

In recent years, data journalists have embraced predictive journalism with open arms. Many news organisations have data teams capable of modelling and telling predictive stories. But incomplete data and poor modelling can hamper any predictive story.

To better understand the challenges, we spoke with Leonardo Milano who leads the predictive analytics team at the United Nations OCHA Centre for Humanitarian Data. He discusses how the humanitarian community is using predictive modelling and projections to anticipate, respond and prepare for emergencies around the world.

You can listen to our entire podcast on Spotify, SoundCloud or Apple Podcasts. Alternatively, read the edited Q&A with Leonardo Milano below.

What we asked

Let's start with the basics. How do you define predictive analytics?

Predictive analytics is about making use of current and historical data to understand what is likely to happen or estimate some characteristics of an event that is likely to happen. However, the field is very broad. For instance, predictive analytics is not only about us estimating precisely a number that may be successful. It also supports decisionmaking with analytical tools. To be effective, predictive analytics requires the best data and contextual knowledge available on a given issue.

Tell us about the Centre for Humanitarian Data.

The Centre for Humanitarian Data is part of the United Nations Office for the Coordination of Humanitarian Affairs (UN OCHA). The overall goal of the centre is to increase the use and the impact of data in the humanitarian sector. We believe that connecting people and data has the potential to improve the lives of the people that we are trying to assist in humanitarian emergencies. We manage the Humanitarian Data Exchange (HDX), which is an effort to make humanitarian data easy to find and use for analysis. Launched in July 2014, HDX is as a growing collection of data sets about crises around the world. It's accessed by users in more than 200 countries and territories.

How do you define humanitarian data?

Humanitarian data is defined in three different categories. Firstly, humanitarian data is data about the context where a humanitarian crisis is occurring. The second category is data about the people affected by the crisis and their needs. Lastly, it is defined as data about the response from organisations and people seeking to help those who need assistance. For instance, this could be a set of different interventions promoted by the government to respond to a humanitarian crisis. Alternatively, this could be mapping areas with humanitarian access constraints because of insecurity or ongoing conflict.

Leonardo Milano leads the Predictive Analytics team at the United Nations OCHA Centre for Humanitarian Data. His team provides support to the humanitarian community around the development, evaluation, validation and application of data science tools to better inform humanitarian response.

What is the biggest misconception people have about predictive analytics?

The biggest misconception is that people believe predictive analytics in the humanitarian sector is mainly a technical challenge. The technical aspect of developing a model is actually the least challenging part. So first of all, you need to obtain the required data and understand its scope and limitations. This is highly complex in a humanitarian context where data is collected in a very challenging environment. Next, you need to understand the use case and design a model that is fit for purpose. And the last one is communication. There is still a disconnect between policymakers, decision-makers and technical partners.

What tools and advice do you provide for those working in this field?

Our goal is to make all humanitarian data available, easy to navigate and analyse. This includes data coming from more than 1,200 sources. The Humanitarian Data Exchange has more than 100,000 users per month and almost 20,000 datasets are shared as of today. About a third of our users are based in field locations. People do download datasets, but they also explore the data through many different data visualisations that our team creates. We also provide other data-related services. For instance, we added a new data grid feature in 2019 to help users understand what data is available and missing across the top humanitarian crisis data.

What should journalists be aware of when referencing models and projections in their stories?

It's important to understand who developed the model and why. First of all, understand the use case and why this model and these projections were produced in the first place. The scope of the projections and the eventual limitations are also essential to grasp. Be sure to read the methodology and footnotes explaining the assumptions and its limitations. These are extremely important because this is where the team who developed the model may add important biases that would be reflected in the projections.

I understand the Centre for Humanitarian Data is partnering with the Rockefeller Foundation. Tell us about it.

The Centre for Humanitarian Data is partnering with The Rockefeller Foundation to increase the use of predictive analytics to drive anticipatory action in humanitarian response. The work will focus on developing new models, providing a peer review process, and closing data gaps.

Lastly, tell us about the report that you're working on with Johns Hopkins University and Rockefeller Foundation.

The goal of this report is to support our colleagues in the field estimating the scale, the severity and the duration of COVID-19 outbreaks in countries with major humanitarian operations. By working with the Johns Hopkins University Applied Physics Laboratory, we have developed an initial COVID-19 model for Afghanistan. We have extended this model to other priority countries such as the Democratic Republic of the Congo, South Sudan and Sudan.

While there are numerous models available on COVID-19, there is very little information available to inform humanitarian response interventions. We aim to provide additional insights into the current COVID-19 crisis in the humanitarian context. The report will be released in the coming weeks ahead of the expected peak of COVID-19 in these countries.

Latest on DataJournalism.com

Journalist Sherry Ricchiardi explores how reporters can tell stories about the impact COVID-19 is having on society's most marginalised and vulnerable groups. From The New York Times to the Associated Press and The Guardian, our latest long read article cites useful resources and examples of data-led storytelling.

Our next conversation

In the next episode of our Conversations with Data podcast, we'll look at machine learning in the newsroom. The discussion will focus on how it can help make data journalists more efficient and speed up news production.

As always, don’t forget to let us know what you’d like us to feature in our future editions. You can also read all of our past editions here.

Onwards!

Tara from the EJC Data team,

bringing you DataJournalism.com, supported by Google News Initiative.

P.S. Are you interested in supporting this newsletter as well? Get in touch to discuss sponsorship opportunities.

Trustworthy data

Tara Kelly — Wed, 10 Jun 2020 12:43:00 +0200

Analysing and communicating uncertainty in data has never been more important for journalists. But what is the best way to interrogate data and deem it trustworthy?

To help us better understand this, we spoke with Professor Denise Lievesley from the University of Oxford in our latest Conversations with Data podcast. As an experienced statistician in government and academia, she discusses what data journalists can learn from statisticians and the parallels between the two professions.

You can listen to our entire podcast on Spotify, SoundCloud or Apple Podcasts. Alternatively, read the edited Q&A with Professor Denise Lievesley below.

What we asked

You've had an impressive career across academia and government. Tell us about your background.

I would describe myself as an applied social statistician. I've worked in a variety of jobs as a statistician, including as the director of statistics for UNESCO. My role as an applied statistician involved collecting data for the Millennium Development Goals and monitoring progress across the world. I was also the chief executive of NHS Digital (formerly the NHS Health and Social Care Information Centre). My last two jobs have been in academia. I was the dean of social sciences at King's College London. Currently, I'm principal of Green Templeton College, a specialist graduate college of the University of Oxford where the students study medicine or applied social science.

Let's start with the basics. How do you define statistics?

Statistics are described as the science of uncertainty. The difficulty is that statistics has two meanings. It relates to the numbers themselves but also to the science, which is about the understanding of the numbers. One way of thinking about statistics is that it's about understanding why patterns occur and whether they happen by chance or by some other external factor. Statisticians help you understand that uncertainty and draw conclusions from the data.

Tell us about your time at UNESCO and what it was like to set up their Institute for Statistics.

I joined UNESCO in 1999 when the UN agency had a division of statistics. It was viewed as a service to the organisation rather than a profession in its own right, providing external services to the world. I was recruited to dissolve the division of statistics and to set up a new institute. And that is indeed what I did. The UNESCO Institute for Statistics still exists today. Based in Montreal, it is still a jewel in the crown of UNESCO. It collects data and sets the standards for collecting data and works on statistical capacity building concerning education, science, technology, culture, and communications.

How can journalists decide if they can trust data?

There are two ways. The first is by looking at the science that has underpinned the data. How were they collected? Are they likely to be representative? How up to date are they? What is known about the error in the data? These are questions journalists are used to asking when validating if a story is true.

The second way to determine its trustworthiness is to understand why the data was collected in the first place. How was the agenda set for the collection of the data? What are the incentives to report in a particular way? What happens to the government statistician in a country if they produce data that is unpopular? Understanding the job security and the support systems for statisticians is essential, too. Like journalists, statisticians may face pressure to report good news.

Professor Denise Lievesley has been Principal of Green Templeton College, University of Oxford since October 2015. Formerly she has been Chief Executive of the English Health and Social Care Information Centre, Director of Statistics at UNESCO, where she founded the Institute for Statistics.

In the wake of the death of George Floyd, how can statistics help bring a voice to society's most vulnerable?

One of the reasons why I'm a statistician is because I think it gives a voice to the marginalised in our societies. The challenges in doing this is that very often those marginalised people are missing from the data. Often they don't trust the government well enough to be prepared to participate and respond in our studies. The participation rate is an important aspect of representativeness in data collection. It is critical that our data reflects inequalities within our societies. We haven't always found it easy to collect high-quality data on really sensitive issues.

One of the things that I take away from the current anguish experienced by a part of the population in the United States is that they haven't had visibility or a voice. Statistics are an important part of giving them that voice. But we have to learn ways in which we can help them. I don't think we've got enough black and ethnic minority staff in our statistical offices to build that understanding.

In what situations would timely data be more useful than data released later on?

As with COVID-19, there are occasions where timely data is absolutely critical in a changing situation. In such scenarios, it may be better to have less than perfect data fast. There are other cases where the data from last week or from last month makes very little difference and it would be better to have data that has been through some greater checks. That could mean more time for reflection or good data analysis. For COVID-19, we need data fast, but we'll also want better data later in order to do a post-mortem. So I think the message for a journalist is that the latest data is not always better.

What can journalists do to encourage governments to collect better quality data?

There are two things they can do. First of all, I would like to see more journalists get involved in the consultations that take place about what data ought to be collected and what is collected in the first place. One of the problems we've got in many countries of the world is that the agenda about what is collected is only settled by governments. The other thing that journalists can do is push hard for data to be published. Almost all countries have signed up to the UN declaration on statistical ethics. However, not all countries abide by it.

Latest on DataJournalism.com

Our next conversation

In the next episode of our Conversations with Data podcast, we'll be speaking with Leonardo Milano, the predictive analytics team lead at the United Nations OCHA Centre for Humanitarian Data. The discussion will focus on how predictive data supports the humanitarian community and their response to future crises.

As always, don’t forget to let us know what you’d like us to feature in our future editions. You can also read all of our past editions here.

Onwards!

Tara from the EJC Data team,

bringing you DataJournalism.com, supported by Google News Initiative.

P.S. Are you interested in supporting this newsletter as well? Get in touch to discuss sponsorship opportunities.

Detecting deepfakes

Tara Kelly — Wed, 27 May 2020 11:52:00 +0200

Today's media landscape has never been more chaotic or easier to manipulate. AI-generated fake videos are becoming more prevalent and convincing, while orchestrated campaigns are spreading untruths across news outlets, messaging apps and social platforms.

To better understand this, our latest podcast features a conversation between Buzzfeed's Craig Silverman, the editor of the latest Verification Handbook, and Sam Gregory, programme director at WITNESS. As an expert in AI-driven myths and disinformation, he discusses deepfakes and synthetic media along with the tools and techniques journalists need to detect fakery.

You can listen to our entire 30-minute podcast on Spotify, SoundCloud or Apple Podcasts. Alternatively, read the edited Q&A below between Craig and Sam.

What we asked

What are deepfakes and who is most impacted by them?

Deepfakes are new forms of audiovisual manipulation that allow people to create realistic simulations of someone’s face, voice or actions. They enable people to make it seem like someone said or did something they didn’t. They are getting easier to make, requiring fewer source images to build them, and they are increasingly being commercialised.

The latest version of the Verification Handbook is available for free on DataJournalism.com.

When it comes to fakery, what level of media manipulation are journalists most likely to come across?

At the moment, journalists are most likely to encounter shallow fakes -- lightly edited videos that are mis-contextualised. If they encounter a deepfake -- a manipulation made with artificial intelligence -- it is most likely to be in the form of gender-based violence directed towards them. The biggest category of deepfakes right now are those that are made to attack primarily women, both ordinary people, journalists, and celebrities.

How should newsrooms approach training journalists in detecting synthetic media?

I think there are good investments that journalists can make now. While it doesn't necessarily make sense to train everyone in your newsroom in how to spot fakes, thinking about ways to have access to shared detection tools or shared expertise does. Building a greater understanding of media forensics generally is becoming increasingly necessary for newsrooms given deepfakes are becoming a growing form of media manipulation.

WITNESS works with human rights activists all over the world. What is the big difference between the media environments in the Global North versus the Global South when it comes to media manipulation?

I think one dimension is the context. For instance, from discussions with people-based in South Africa, the perception is that the government might well be the source of myths and disinformation. Another aspect is the lack of resources. If large news organisations in the Global North think they're underresourced, imagine what it is like for a community media organisation who is the only real source of information for documenting a favela in Rio or Sao Paolo.

The skills gap is another issue. Citizen journalists in the Global South don't have access to OSINT training or an opportunity to build media forensic skills. Fakery also affects local communities differently. For instance, if a rumour spreads on WhatsApp, it is often the people in close proximity to the harm who are most affected and may face direct violence. That is less likely to be the case for journalists reporting in Europe or North America.

Sam Greggory is programme director at WITNESS, an international organisation that trains and supports people using video in their fight for human rights. In the new Verification Handbook, he authored a chapter on deepfakes and emerging manipulation technologies.

If a journalist suspects inauthenticity in a video what should their step-by-step verification approach be?

At the moment, there aren't any good tools out yet to do this kind of detection. There is a three-step process worth trying. First, take a look and see if there are obvious signs that something is being manipulated. Look frame by frame to see if there are distortions. Look at the details. You may see things that indicate it is. Secondly, do a video visual verification process. The final step involves reaching out to someone who has expertise in deepfakes to review it. At the moment, this is a critical need. We've got to work out how to provide access to greater expertise for journalists so they know who to reach out to.

Craig Silverman is the editor of the latest Verification Handbook. As Buzzfeed's media editor, he is one of the world's leading experts in misinformation and disinformation.

As the ability to manipulate media becomes easier to do, there is a desire to create tools with built-in verification. How might that be problematic?

There's a great push within professional media to be able to assign an owner, location and clear data in order to verify and publish authentic videos and photos. However, we've got to figure out a balance of protecting against manipulation, but also not exposing people who are doing this kind of work.

WITNESS is part of a working group within the Content Authenticity Initiative. Founded by The New York Times, Adobe and Twitter, the aim is to set the industry standard for digital content attribution. In conversations with platforms, we ask questions like who might get excluded if you build this authenticity infrastructure. For example, WITNESS works with people who document police violence in volatile situations. For privacy and security reasons, they can't have their real identity linked to every piece of media or share very close location information on every video. They might want to do that on a single video or a photo, but they can't do it consistently. So we are concerned about those kinds of technical and privacy issues.

Is there a timescale for a deepfake having a large impact on the world?

I hope it is as far away from now as possible. I do not wish for us to have this problem. I think the time frame of a large scale deepfake is hard to predict. What is predictable is that the process of creating them is becoming easier and more accessible to more people. That's why it's so important to invest in the detection and the authenticity infrastructure, along with making these tools accessible and available to journalists.

Latest on DataJournalism.com

From handling financial projections and estimates to navigating company reports, journalist Erik Sherman explains how to frame business stories with data on your side. Read our latest long read article here.

Are you an investigative journalist looking to conduct a cross-border investigation? Applications are now open for the Investigative Journalism for Europe Fund! Grants of up to €6,250 are available for teams of journalists based in the EU and EU candidate countries. Apply today!

Our next conversation

In the next episode of our Conversations with Data podcast, we'll be speaking with Professor Denise Lievesley from Green Templeton College, University of Oxford. The discussion will focus on what data journalists can learn from statisticians and the parallels between the two professions.

As always, don’t forget to let us know what you’d like us to feature in our future editions. You can also read all of our past editions here.

Onwards!

Tara from the EJC Data team,

bringing you DataJournalism.com, supported by Google News Initiative.

P.S. Are you interested in supporting this newsletter as well? Get in touch to discuss sponsorship opportunities.

Debunking disinformation

Tara Kelly — Wed, 29 Apr 2020 13:39:00 +0200

Today's information environment has never been more chaotic or easier to manipulate. To help journalists navigate this uncertainty, the European Journalism Centre supported by the Craig Newmark Philanthropies is releasing the Verification Handbook For Disinformation and Media Manipulation.

This week's podcast features a conversation between Buzzfeed's Craig Silverman, the editor of the handbook, and Dr Claire Wardle, the executive director of First Draft. The pair spoke about how the verification landscape has changed since 2014 along with the tools and techniques journalists need to spot, monitor and debunk disinformation.

You can listen to our entire 30-minute podcast on Spotify, SoundCloud or Apple Podcasts. Alternatively, read the edited Q&A below between Craig and Claire.

What we asked

You contributed to this Verification Handbook and an earlier edition. What was happening in 2014 when the last version came out?

Back then, we weren't really talking about misinformation or even disinformation. I remember writing that original Verification Handbook because lots of newsrooms were struggling with breaking news. They needed the tools to understand how to verify a video or image that emerged on Twitter or Facebook during a breaking news event.

With Hurricane Sandy and the Arab Spring, there was more of an awareness that online content existed that journalists could use in stories. Content could be old or a hoax, but it was nothing like the world that we live in today. While some newsrooms like the BBC and NPR understood this, many relied on other organisations like Reuters or Storyful to verify content. Most newsrooms did not understand how to do this, nor view it as a required skill set that they needed to have.

The latest version of the Verification Handbook is available for free on DataJournalism.com.

In 2014 newsrooms avoided debunking inaccurate news. How is that different from today's news landscape?

In many ways, if you have a headline with the words "bots", "hacking", "Russia", it's a very human response to want to click on those kinds of stories. Unfortunately, there is a kind of business model now in debunking misinformation. And it does get a lot of page views. It's important to have a conversation about what are the long-term consequences for this type of reporting. Does it drive down trust in organisations or democracy or the news media itself?

What should journalists be aware of when understanding how humans process and share information?

It doesn't matter who we are, how educated we are, where we are on the political spectrum -- all of us are vulnerable to sharing misinformation if it taps into our existing world views. When we do these kinds of investigations, we need to be rigorous. We need to use these tools. We need to be fact-based. But I think when we're understanding how this information spreads, we can't just assume that by pushing more facts into the ecosystem, we're going to slow this down. We need to understand why people are sharing it. And sometimes people are sharing it for malicious intent.

So why might people be sharing this?

Let's talk about three motivations of why people do this. The first is financial. People want you to click on scams, or they want you to click on a website for advertising revenue. Or it could be political purposes that might be foreign interference. Or it might be on a domestic level that you're trying to shape the way that people think about politicians, issues or policies. But the last one is social and psychological. Some people do this to see if they can get away with it. And that motivation is sometimes the hardest to understand.

Dr Claire Wardle is executive director of First Draft, a not for profit fighting misinformation globally. In the new handbook, she contributed to the introduction and authored a chapter on messaging apps.

At First Draft, you train a lot of newsrooms in verification. Tell us about the principles for investigating disinformation you try to instil in people.

The biggest lesson for everybody who is working in this space is if you go searching for this, you will always find an ongoing disinformation or misinformation campaign. The challenge for those of us who are doing reporting is when do you do that reporting? And I think it's essential that we have journalists monitoring these spaces. It's vital that we understand which conspiracy theories are bubbling up, along with whether or not there's a coordinated attempt to impact trending hashtags. But the challenge is just because you can find it doesn't mean you necessarily have to report it.

What are some of the criteria you have for deciding whether to report on it?

We have five different kinds of criteria that we use to make that distinction. For instance, what kind of engagement is it getting? Is it moving across platforms, has an influencer shared it. For all journalists, there's this instinctive response to shine a light on it. That's the central paradigm of journalism. But what does that mean when bad actors are deliberately targeting journalists, knowing that that's what their inclination is to do? And so when you know that bad actors are hoping that you'll report on that conspiracy, how can you make that decision about whether or not to report it? And if you report, how do you report it? What's the headline? What's the lead or image?

Craig Silverman is the editor of the latest Verification Handbook. As Buzzfeed's media editor, he is one of the world's leading experts in misinformation and disinformation.

What are some of the principles and cautions for journalists investigating inauthentic activity and figuring out who is behind it?

That is the $64 million question because newsrooms want to be able to say in real-time who is behind this. It's also partly because of the reporting about how Russia medalled in the 2016 U.S. election -- it's the go-to explanation. The problem is that people want immediate answers. But the truth is because anybody can be anyone on the Internet, it makes it very difficult to be able to say at any given time that this is who is behind this campaign.

Even if you find out that the person behind the campaign is living in a basement in Ohio, you can't necessarily say that there isn't a connection to a state actor. It's very difficult to get to these answers even by talking to the social platforms. They either don't want to give up this information, or they don't have enough information.

What else should journalists be mindful of with these types of investigations?

The other thing journalists have to be careful about is there is now an industry of these new kinds of data driven platforms where reports will show evidence that there was Russian influence in a particular moment. And as a journalist, it's very easy to type that almost like a press release. But unless the journalist can ask the questions about the data, we have to be very careful about these claims. Just as we need to teach journalists how to read academic research, there is a need to teach them how to read this kind of data.

Finally, what other resources can help journalists to question the data? First Draft recently worked with the Stanford Internet Observatory to create a new website called Attribution News. It takes you through the questions, tips and techniques to understand how to question the data or question sources when they're making these sorts of claims. This resource, combined with the latest Verification Handbook, can help journalists to find, monitor and investigate the data.

Latest on DataJournalism.com

As the COVID-19 crisis deepens, audiences around the world are seeking stories that show the impact of the virus on their daily lives. In our latest Long Read article, journalist and professor Paul Bradshaw explains how journalists can use data to generate relevant story ideas that resonate with their audience.

Our next conversation

In the next few episodes of our Conversations with Data podcast, we'll be speaking with other contributors to the latest edition of the Verification Handbook. The discussions will continue to centre around case studies and advice for spotting misinformation, disinformation and media manipulation.

As always, don’t forget to let us know what you’d like us to feature in our future editions. You can also read all of our past editions here.

Onwards!

Tara from the EJC Data team,

bringing you DataJournalism.com, supported by Google News Initiative.

P.S. Are you interested in supporting this newsletter as well? Get in touch to discuss sponsorship opportunities.

Data uncertainty & COVID-19

Tara Kelly — Wed, 15 Apr 2020 15:22:00 +0200

As the COVID-19 crisis deepens, reporting and interpreting unclear data from clinical trials are essential for journalists. In this week's Conversations with Data podcast, we spoke with Dr. Siouxsie Wiles, an Associate Professor from the University of Auckland about the science and uncertain data behind the pandemic. As a microbiologist with a speciality in examining how infectious disease spreads, she debunks several conflicting health reports in the media and explains why antibody tests aren't a reliable option for mass screening.

You can listen to our entire 30-minute podcast with Dr. Siouxsie Wiles on Spotify, SoundCloud or Apple Podcasts. Alternatively, read the edited Q&A with her below.

What we asked

Tell us about your background in microbiology and infectious disease.

I'm an associate professor at the University of Auckland. I run a research lab where we mostly do antibiotic discovery. I'm really interested in the transmission of infectious diseases and how when we study those in the lab, the transmission is actually what is was most ignored.

I did my undergraduate degree in Microbiology specialising in infectious diseases of humans. Then I did a PhD in microbiology and focused on making bacteria glow in the dark to use as sensors for pollution. After I got my first post-doctorate position, I turned my focus back to infectious diseases and worked with the bacteria that causes tuberculosis by making it glow in the dark.

You're also a science communicator. How did you come up with those COVID-19 explainer cartoons?

For the past 10 to 15 years, I've been really interested in communicating science. So when COVID-19 happened, I'd already built up a reputation as somebody who could explain complicated science to the general public. As the pandemic began to spread in China, I was invited on breakfast TV to explain what was happening. I know how important it is to keep people informed and calm, but also to get them to do the right thing.

I've ended up having this amazing collaboration with a cartoonist called Toby Morris. We've been trying to distil some of these really important messages down into graphics that people can understand. For instance, what is exponential growth? And why should we stay at home? It's been really great to see these cartoons being taken up by governments in Australia, Argentina, and elsewhere.

How uncertain is the data for COVID-19?

What people are watching is science in action on an extremely accelerated scale. It's astonishing how much we know now since January and the fact that clinical trials are happening for treatments. But what that means is we are dealing with huge amounts of uncertainty.

The best picture we have at the moment is what happened in China. So the World Health Organisation and China did this joint mission where they put a report together. The report covers all of the classic things about epidemiology, including the case numbers, what is known about the virus and who gets it. At the moment, we're watching this kind of incredible experiment where we still don't know whether everything that came out of the report from China is going to play out the same way in other places. Other countries are doing fairly different things to control it and have different populations. That includes differing socioeconomic factors and access to healthcare.

What is certain when it comes to this pandemic?

I think the only thing that is certain is that this virus infects humans. It transmits human to human. And for some people, the outcome is pretty catastrophic. Because this outbreak is happening so fast, it has the massive capacity to overwhelm health systems, and that means that even though people might have survived with good treatments, they aren't going to survive because there aren't the intensive care beds or the ventilators to look after everybody who is sick.

Antibody tests have been spoken about widely. How reliable are they?

These are blood tests where you take a blood sample that looks for whether your body has made an immune response to the virus. So it is looking for antibodies called IgM and IgG. Depending on the test, I have seen some data where antibody tests only gave a correct result two out of 10 times. I've seen others where it gave you a correct result eight out of 10 times. And so that really matters if you're going to deploy the test widely and you're only going to get the right answer two at 10 times.

Dr. Siouxsie Wiles is a microbiologist and an associate professor at the University of Auckland.

Why do you think certain countries moved quickly with lockdown measures, while others responded slowly?

What's clear is that those countries that reacted really fast -- Taiwan, Singapore, Hong Kong -- are all countries that have dealt with things like SARS before. Those countries who responded fast know how serious these things can get and put in measures years ago for how you would respond in a pandemic situation. And that's really telling that those countries are the ones that are controlling it the best. Those countries that have taken a different approach have not had this experience before.

How is this going to play out? When will governments slowly lift lockdown measures?

While this isn't my area of expertise, the plan would be to either eliminate or vaccinate. And both of those plans are going to take time because every country is responding in different ways. What I imagine is going to happen is there may well be countries that keep it under control and they will be the ones who are able to open up to each other and not to others.

We might end up having little pockets around the world where life is kind of normal, but those pockets can only interact with other pockets like them. Perhaps people who want to travel to those places are going to have to show either they've had COVID-19 or they go into quarantine for two weeks before they are allowed in. It's just not clear how long this is going to take. If we're reliant on a vaccine, it's going to be 18 months or so.

Any advice for data journalists out there covering this?

Like many people, I'm watching the trackers showing the confirmed cases and deaths from this pandemic. It is important to remember that those numbers are becoming more and more unreliable every day. Many countries are not testing. For instance, at the moment, the U.K. is only testing those people who are in intensive care or healthcare workers. I think the other number to remember is that there will be lots of people who die who didn't die of COVID-19, but who died because of COVID-19. And I wonder whether those will ever be calculated. We have a massive undercount of confirmed cases and deaths. And it's not going be a long time before we really fully understand how this played out.

Long Reads on DataJournalism.com

We're looking for authors for our Long Read section. Whether you're a seasoned data journalist, a student, or a data visualisation expert, we're interested in sharing your expertise. Not sure what to pitch? Have a read of our guidelines. Get in touch with our data editor, Tara Kelly, at [email protected].

Our next conversation

In the next episode of our Conversations with Data podcast, we'll be talking to an expert about misinformation, data, and COVID-19.

As always, don’t forget to let us know what you’d like us to feature in our future editions. You can also read all of our past editions here.

Onwards!

Tara from the EJC Data team,

bringing you DataJournalism.com, supported by Google News Initiative.

P.S. Are you interested in supporting this newsletter as well? Get in touch to discuss sponsorship opportunities.

Visualising stories around COVID-19

Tara Kelly — Wed, 01 Apr 2020 20:32:00 +0200

As the world confronts the outbreak of COVID-19, there's never been a more important time for journalists to tell visual stories with accuracy and clarity. In this week's Conversations with Data podcast, we spoke with Amanda Makulec about how to create understandable charts, graphs, and maps that better explain complexity in these uncertain times. As data lead at Excella and operations director at the Data Visualization Society, she explains why responsible design has never been more important for data storytellers and how not to mislead your audience.

You can listen to our entire 30-minute podcast with Amanda on Spotify, SoundCloud or Apple Podcasts. Alternatively, read the edited Q&A with her below.

What we asked

Tell us about your work in data visualisation.

I am currently the data visualisation lead at Excella, a Washington D.C.-based technology consulting firm that serves clients from federal agencies to nonprofits. I lead data visualisation teams and create data visualisations as part of that work. I also volunteer as the operations director for the Data Visualization Society. We are a nonprofit organisation that was founded in February 2019. In under a year, we reached almost 12,000 new members from over 130 countries around the world. Becoming a member is free.

What's happening with the Data Visualization Society? Anything new?

The Data Visualization Society is launching a new matchmaking initiative where we pair our members with health and civil society organisations doing vital work right around COVID-19. We’ve already seen numerous impactful visualisations shared by public health practitioners and media partners, but we’ve also seen some that might have benefited from some expert consultation.

We believe we can help as a community by collaborating with public health organisations, researchers, local communities, and others through data visualisation. We had more than 475 people sign up to volunteer in the first few days of launching the programme. We also had a group of initial organisations submit requests for support.

What responsibility do designers working with data have to the general public at this time?

That's a big question. There's just so much information coming at us, both in the form of articles, dashboards, and maps that tell us about how challenging it could be for our hospital system here in the US. And I think it's our responsibility as data visualisation designers to think about the emotional response that what we create will evoke in people.

That's why the data visualisation community talks a lot about the challenges of using bright, bold colours like red on maps and how they evoke a certain response of panic. How are we embedding a call to action in what we create? How are we helping people feel empowered to make local individual decisions that can really make a difference in the current environment? It's not just public health professionals and clinicians reading this. Instead, the entire world is looking at this. We have to be mindful of how our charts might be misinterpreted.

What good examples have you seen out there covering the pandemic?

John Burn-Murdoch's charts for the Financial Times are updated daily with input from experts. Those visualisations explain what the different curves look like for different countries. With small annotations, they point to what these countries have done to help flatten those curves or change the trajectory of that curve. Providing context that helps us understand the numbers and why they're going a certain direction is key.

The Washington Post's flatten the curve article by Harry Stevens belongs in a top-five hall of fame for being an illustration of data that helped inspire an entire country, and really the world, to make certain choices that are really hard to make around limiting your activity. We've seen the power of data visualisation help people understand complex concepts.

Why do case fatality rates for COVID-19 matter and how should we visualise them?

Case fatality rates are a curious measure when we're talking about a broad interest in understanding what's happening in the world with the disease. And we've seen a lot of people calculating a case fatality rate by doing the simple math by taking the number of deaths and dividing it by the number of cases. And that sounds like delightfully simple algebra, but the challenge is that it's very hard to calculate an accurate, generalisable case fatality rate for this disease when the data we have is so uncertain.

We're better off making estimations for smaller groups or smaller subpopulations. And I've been really proud to see some of the different news outlets who have pivoted from reporting single point fixed case fatality rates over to reporting ranges for a given country or a given demographic group.

What do you think COVID-19 has taught us?

We've demonstrated the value of engaging and collaborating with subject matter experts so that it's not just designers or journalists operating in isolation. The value of finding sources and validating what we're making shouldn't be underestimated. The best visualisations that people engage with are accurate and created as a collaboration, not just one person sitting behind a computer screen.

We still need to think about the ways in which we are packaging and communicating numbers for the general public that are easier to understand, but still represent data correctly, accurately, and effectively. My hope is that we all have helped the world become a bit more data literate and that we've had bigger conversations about the ethics and choices that we make in terms of what we plot and what we don't.

Long Reads on DataJournalism.com

The Washington Post's most-read article visualised how pandemics like COVID-19 can spread, and why social distancing matters. In our interview with author and graphics reporter Harry Stevens, he explains how he used data, design, and code to communicate the concept of social distancing to help flatten the curve.

Our next conversation

With the COVID-19 still spreading, and a third of the world in lockdown, many journalists are looking for ways to explain what is happening. To ensure we bring you the most useful information on visual storytelling during this time, we'll be talking with Associate Professor Dr Siouxsie Wiles from the University of Auckland in our next podcast. As a microbiologist with a passion for infectious disease, she's also an experienced science communicator, podcaster, and blogger. She'll share with us her advice on how we can best cover this global pandemic with accuracy and clarity.

As always, don’t forget to let us know what you’d like us to feature in our future editions. You can also read all of our past editions here.

Onwards!

Tara from the EJC Data team,

bringing you DataJournalism.com, supported by Google News Initiative.

P.S. Are you interested in supporting this newsletter as well? Get in touch to discuss sponsorship opportunities.

Q&A with Simon Rogers

Tara Kelly — Wed, 18 Mar 2020 14:05:00 +0100

Simon Rogers is a leading voice in the world of data journalism. From creating the Guardian's datablog in 2009 to now serving as data editor at Google News Lab, we spoke with him about the 2020 Sigma Awards and how data journalism has evolved over the past decade. He also discussed the emergence of machine learning in the newsroom and why collaboration is the future of data journalism. You can listen to our entire 30-minute podcast with Simon on Spotify or SoundCloud. Alternatively, read the edited version of our Q&A with him.

What we asked

Tell us what's happening at Google. Anything new?

I work on the Google News Lab. Our team serves as an editorial bridge for the news industry. We're advocates for the news industry within Google, but also the place where people come to get data and to tell stories. We also produce content and do training. My focus is around data and newsroom innovation. We're very busy at the moment given we're in the middle of a global pandemic. Trends data is a very useful way to understand how people are thinking about this. Currently, we are looking at how we can help journalists around data, innovation and machine learning.

Let's talk about the Sigma Awards. How big was the turnout in the end?

We had 510 entries submitted from 287 organisations across 66 countries. We thought if we received 100 entries, we'd be happy. What I love about the Sigma Awards is it comes from the community. We're lucky enough to be the position that where I work could fund it for the first year, which is great to get it off the ground. And the response amongst the industry was just brilliant. We had people entering from all over the world. It wasn't just the usual suspects from the big newsrooms. We had sole practitioners in developing countries as well as not for profits and designers.

Amongst the winning entries, there were quite a few collaborative projects. Do you think we're going to see more of that in the future?

I do, actually. I think part of that is because data journalists tend to be more collaborative than reporters in other fields. I genuinely believe that is because often we work on our own. There is an isolation to being a data journalist. When you can work with others you can make your work a lot better.

For instance, ProPublica's Election Land Project and Documenting Hate, are great examples of mass collaborative cooperative projects. And then you're seeing projects like The Troika Laundromat, or Copy, Paste, Legislate, where people have different specialisms or geographic areas. Those mixed skillsets can help you to produce a project together that's much better than doing it individually.

Many of the winning projects this year had a mix of traditional journalism with the latest technologies. How do you explain that phenomenon?

I think that the combination of traditional reporting and data reporting is really interesting. We're also seeing that data journalists are now working out how to use machine learning. That took a little while because it's not instantly apparent how to incorporate machine learning into your work. We've seen that with Peter Aldhous at BuzzFeed. Journalists are realising this is a really great reporting aid and it can help make our work better. Also, data journalism is pretty established now. It's a part of every newsroom. So I think that makes a difference.

Does Google News Lab have any training available on machine learning?

Yes, we took part in the MOOC with the Knight Center for Journalism of the Americas at the University of Texas at Austin last year. It's still online and there are sessions on machine learning for beginners by Dale Markowitz. So that's definitely worth checking out as a good starting point. We also did a session at NICAR with Anatoliy Bondarenko. He listed a number of resources in his tip sheet to help get people started.

Simon Rogers is Google News Lab's data editor based in San Francisco.

What does it take to make these collaborative projects work?

The best ones that I've seen give people the freedom to explore and innovate and then come together when it makes sense. We see a lot of parallel projects where you have three organisations working together. But actually, all they're doing is sharing the data and then they go off and do their own thing. But other things like Documenting Hate where everybody is working together with the public, to me, that's honestly more enticing and interesting. I love the idea of these giant collectives of reporters coming together to try to build something that's much stronger than they could do individually.

With the coronavirus (COVID-19) pandemic spreading, have data journalists shown their value in the newsroom?

I think so. It's obviously the great irony that most American data journalists were at NICAR and are now self-quarantining at home after becoming exposed to the virus. Data journalism has had these moments that have made it more important. For instance, the original WikiLeaks releases was an example of that where data journalism was the only way to interpret that. And I really feel it's so important and valuable at the moment. One thing I see looking at trends data is how much people want reliable information. They want to know what the facts are and to understand stuff better. One of the biggest conversations this week has been around flattening the curve. How many people now are looking at epidemiological curves and understanding them now because of data journalists? So, it's super important and really powerful. And it's great to see.

Data journalism is a rapidly evolving field. What direction do you think it will take next?

Reliable information has never been as important as it is now. I think people are aware of that. When data journalism took off in the 2000s, lots of people were doing it. Then there was a period where it became the big people and large newsrooms focusing on it. And now I feel like it's turned around with one or two people doing really interesting things in small newsrooms around the world. That is a trend I'm very excited about. I can't wait to see how The Sigma Awards look next year.

Other happenings on DataJournalism.com

Our next conversation

With the coronavirus (COVID-19) rapidly spreading, many journalists are looking for ways to visually explain what is happening and how it is affecting the world. That's why in our next podcast and newsletter, we'll be talking to an infectious disease expert about the do's and don'ts for covering this global pandemic. We'll also ask for their top tips and resources for improving your maps, charts, and graphs when communicating about this critical health issue.

As always, don’t forget to let us know what you’d like us to feature in our future editions. You can also read all of our past editions here.

Onwards!

Tara from the EJC Data team,

bringing you DataJournalism.com, supported by Google News Initiative.

P.S. Are you interested in supporting this newsletter as well? Get in touch to discuss sponsorship opportunities.

Closing the gender data gap with Data2X

Tara Kelly — Wed, 04 Mar 2020 16:19:00 +0100

We've all heard about the gender pay gap, but what about the gender data gap? To mark International Women's Day, we spoke with Data2X's executive director Emily Courey Pryor about how missing or incomplete data on girls and women stop journalists from telling the full story. You can listen to the entire 30-minute podcast with Emily, or read the edited version of our Q&A with her below.

What we asked

First of all, what is gender data?

The official UN definition of the gender data is: data that is collected and presented by sex as a primary and overall classification; reflects gender issues; is based on concepts and definitions that adequately reflect the diversity of women and men and capture all aspects of their lives; and is developed through collection methods that take into account stereotypes and social and cultural factors that may induce gender bias in the data.

Simply put, this means that we are looking at data that has been sex-disaggregated and describes the lived experiences of a particular sub-set of the population. This doesn’t necessarily mean data about the lives of women and girls – gender data could also be data about men and boys; and of course, we recognise that gender doesn’t exist in a binary. But when Data2X talks about “gender data” we are specifically referring to data (or the lack of data) about women and girls.

Why are women and girls being left out of the picture?

For far too long, women and girls have been misrepresented or left out completely from data. And if data is being used to guide the creation of policies and programmes, and that data is missing half the population, those policies and programmes will not meet the needs of that missing population. This is a problem.

As for why women and girls are being left out, well, there are many reasons. Historically, the way data collection methods were developed and have evolved do not capture women’s and girls’ experiences. For example, if a survey enumerator asks questions of the head of household, and the head of household is usually a male, are women represented in his answers? Also, women’s and girls’ experiences haven’t always been prioritised by governments, so data hasn’t been collected about them. For example, unpaid care work is now getting so much attention in the media— but that wasn’t always the case, so governments didn’t prioritise collecting data about unpaid care work as part of their labour statistics programmes.

How is Data2X working to close these gender data gaps?

Data2X works to close gender data gaps in two ways. First, we build the case and mobilise action for gender data through our research, advocacy, and communications work, which aims to show the critical role gender data plays in efforts to achieve gender equality. 2020 is a critical year for our advocacy work as we celebrate the 25th anniversary of the Beijing Platform for Action, a landmark moment for gender equality. We are working with our partners to ensure that gender data is recognised as a key enabler for gender equality and that real commitments are made to its advancements by governments around the world.

Second, we work to augment the production and use of gender data by partnering with data producers to strengthen data collection, experimenting with new data sources to improve insights on women and girls, and building global gender data expertise by supporting organisations that are guiding gender data production and use. By shining a light on the issue and catalysing action, we have been modelling how good gender data can be produced— and inspiring others to take up the cause.

Emily Courey Pryor is the founder and executive director of Data2X.

What about the work Data2X has done on big data?

We have also been a leader in exploring the potential of new sources of data, like mobile phone, geospatial and app data, to fill gender data gaps. Last year we released a report called Big Data, Big Impact that summarised findings from 10 research projects that we supported, and we found that big data holds huge promise to help us understand aspects of girls and women’s lives that traditional data often struggle to see. One of the projects with a fantastic researcher, Jihad Zahir of Cadi Ayyad University in Morocco, used social media data to assess sentiments towards violence against women, breaking new ground in the analysis of non-English language data.

Tell us about your new report on mapping the gender data gaps.

We’re thrilled to be releasing Mapping Data Gender Gaps: An SDG Era Update of our mapping gaps work. This report looks across six key domains— health, education, economic opportunity, public participation, human security, and environment— to identify the major gender data gaps in our collective knowledge. We also wanted to highlight all of the fantastic work that has happened since we released the last iteration of this report in 2014.

So, we started by asking key researchers in each field about the major gaps in data that hamper their ability to answer the most pressing questions. We also collected information on major databases and research studies. What we found was that in some areas, such as maternal mortality, there has been really significant progress in increasing and improving gender data collection since 2014. But, of course, what constitutes a data gap does not remain static— the data we need depends on the questions we are trying to answer. In 2015, the world agreed the Sustainable Development Goals, and with these came a host of new and more nuanced questions than the international community had ever striven to answer before.

The result of this expanded horizon is the opening of new gender data gaps. This is why we added environment as a domain. We know anecdotally that women and girls are disproportionately impacted by climate change and natural disasters, but the effort to systematically collect data on these adverse effects is just getting underway. With the rise of the gig economy and increasing informalisation, we also highlighted the critical need to collect data on decent work for women. As the world changes, so do our data needs.

How can journalists tell stories differently so there is a bigger push for this kind of gender data being collected by governments and researchers?

We need to differentiate between stories with data and stories about data. Journalists should, of course, continue to tell stories with data and incorporate statistics and visualisations that make readers better understand and relate to their topics. But we also need journalists to tell stories about data – and about the lack of data. If this issue is exposed, if it could make headlines in national and international news, if world leaders and funders across the globe could be made to realise that the issue of missing data is as critical as the issue of health or education disparities, we might see some progress.

And to be clear, the picture is not all bleak. Over the past few years, a spotlight has been shined on the issue of gender data gaps, thanks to champions like Melinda Gates and Her Majesty Queen Maxima, and also thanks to authors like Caroline Criado Perez who have given the issue urgency and prominence in the collective psyche. But for the huge, system-wide changes we need to make a dent in this issue, we need more champions, more authors, and more journalists to keep the pressure up.

Other happenings on DataJournalism.com

Public health reporting has the potential to empower communities. Yet, medical research is easy to misreport. Investigative health reporter Aneri Pattani explains how to understand medical research data, challenge it, and, of course, report it accurately and ethically. Check it out here.

Our next conversation

As you've probably heard, the annual International Journalism Festival in Perugia has been cancelled. But we still want to celebrate the winners of The 2020 Sigma Awards. That's why we plan to feature the competition's winning projects in our next issue. We'll also ask the winning teams for their top tips and resources for improving your data, design and editorial skills in the newsroom.

As always, don’t forget to let us know what you’d like us to feature in our future editions. You can also read all of our past editions here.

Onwards!

Tara from the EJC Data team,

bringing you DataJournalism.com, supported by Google News Initiative.

P.S. Are you interested in supporting this newsletter as well? Get in touch to discuss sponsorship opportunities.

Verification for data

Tara Kelly — Wed, 19 Feb 2020 13:09:00 +0100

When using data as a source for your stories, verification is vital. Today's journalists face the challenge of sifting through enormous amounts of information from social media and open data portals. But how do you decipher what is true and accurate in the digital age?

Joining us for our first-ever Conversations with Data podcast is Craig Silverman, Buzzfeed's media editor and one of the world's leading experts on online misinformation and content verification. He talks to us about verifying the numbers along with his experience of working with Buzzfeed's data team. Listen to the full 30-minute interview with Craig, or read the edited version of our Q&A with him below.

What we asked

How did you get started in verification?

The journey began in 2004. At that time, I was a freelance journalist living in Montreal, and I started a media blog called Regret The Error. Initially, it focussed on finding the best of the worst mistakes and corrections made by journalists and media organisations. I was collecting hundreds, thousands and then tens of thousands of these corrections. I started to connect the dots and research the discipline of verification.

As journalists, it's our job to get things right. Verification is at the core of what we do. But how do you actually do it and is it ever taught? At that point, and even to a certain extent still today, there aren't usually verification courses in journalism schools. The blog evolved from corrections and transparency to accuracy. Later it focussed on the discipline of verification and how we can be accountable as journalists to the public, to our audiences, and to our sources.

The terms 'misinformation' and 'disinformation' seem confusing. What's the difference?

This is a by-product of fake news becoming a term that is widely used but is now kind of meaningless. When people say fake news, they can mean very different things. I try to avoid using it unless it is in the context of specifically talking about that term and what's happened to it.

I credit Claire Wardle and the folks at First Draft who have done some work on trying to clarify the terms. The definition of disinformation is best given as false information that is created to deceive. Misinformation is about the accidental spread of false or misleading information. Misinformation is a big piece of the puzzle because you can have people who are absolutely well-meaning, who have no intention of spreading something false or misleading, but who passed it along because maybe they think it's important information and it's true. Or maybe [they spread it] because it came from a friend and they trust their friends and so they believe it's true.

Tell us about your latest verification handbook.

We published the first verification handbook in 2014, and it was framed in the context of emergency and breaking news. Given the information environment has changed a lot since 2014, this latest handbook focusses on investigating disinformation and media manipulation.

I found people who are some of the best practitioners in this area to contribute to the handbook. We have folks from NBC News, Rappler in the Philippines and BBC Africa as well as research organisations like Citizen Lab. We also have examples coming from different parts of the world like Brazil, the Philippines, West Papua and other countries to show the global extent of this. The handbook comes out in April 2020 in Perugia at the International Journalism Festival. Verification Handbook 2: disinformation and media manipulation will be free and available online.

What one chapter is a must-read for data journalists?

We have a chapter and case studies looking at inauthentic accounts and activity around bots. I think that's a place where data journalists might be really interested in because you're making a determination based on data, typically whether you're determining what account is authentic or not.

We show some of the tools, approaches and patterns you want to look for in that data and then that activity. It's an opportunity for somebody who knows how to get data from an API endpoint and gather that data. Then you can think about how you want to sort and analyse that data.

The chapter also gives data journalists some non-technical approaches for how you think about inauthentic activity in our social media environment. I suspect that those case studies on Twitter and bots will be really interesting for the data journalism community.

Craig Silverman is one of the world's leading experts on online misinformation and content verification.

You've broken countless stories on media manipulation. Do you ever use data skills for these investigations?

I know some basic HTML, and I took an R programming course at IRE a couple of years ago. However, I'm not proficient enough to be applying that in my work. For me, gathering and analysing data is really important. I'm often using pre-existing tools to help me do that. For example, I use Crowdtangle, which is a great tool and platform where you can query for Facebook and Instagram posts going back in time. You can pull historical data and get that as a downloadable CSV. I do a lot of my work in Excel or Google Sheets in terms of gathering, cleaning, sorting and filtering the data to get the insights that I'm looking for.

I'm fortunate that at BuzzFeed News we have a data journalism team led by Jeremy Singer-Vine, who is fantastic and who I've done a whole bunch of stories with. When there is a data component to it or the manual approach is going to be insanely time-intensive, and there's clearly a way to produce a script to gather it, I team up with Jeremy. We've done a lot of stories together, and it's a really great partnership. We're very fortunate to have folks in our newsroom with those skills who really love teaming up on stories. They are 100 percent full collaborators, and they just bring a whole other skill set and mindset to it.

Do you think the media industry needs a redesign?

Realistically, I don't think there's going to be a massive slowing down. I do think newsrooms have to be conscious of the decisions they're making and fight this drive and desire to be first or to jump on something right away. That is a permanent problem because it is a permanent tension we have in our newsrooms.

If you develop good verification skills for the digital environment as well as good traditional reporting of verification skills, over time you will be able to be faster because you're practising and you know what to look for. That's why these verification handbooks are so essential. They're free and written by experts in a way that gives people actionable, useful hands-on advice.

I think we shouldn't hesitate to rethink our media environment and how we do our jobs better. If more journalists had better verification skills, we would avoid some of the pitfalls that you see in those urgent, early moments where mistakes often get made.

Other happenings on DataJournalism.com

Data journalism isn't just for in-depth projects or investigative reporting. MaryJo Webster explains how to tell quick and powerful stories with data when the clock is ticking. Check it out here.

Our next conversation

We've all heard about the gender pay gap, but what about the gender data gap? To mark International Women's Day on 8 March, we'll be talking with Data2X's executive director Emily Courey Pryor about how missing or incomplete data on girls and women stop journalists from telling the full story. We'll also discuss what steps data journalists can take to ensure the data they use is inclusive.

As always, don’t forget to let us know what you’d like us to feature in our future editions. You can also read all of our past editions here.

Onwards!

Tara from the EJC Data team,

bringing you DataJournalism.com, supported by Google News Initiative.

P.S. Are you interested in supporting this newsletter as well? Get in touch to discuss sponsorship opportunities.

Neurodiversity and design

Tara Kelly — Wed, 05 Feb 2020 12:53:00 +0100

The call for more inclusive design is everywhere. Countless studies show the indisputable value when we create experiences that reach everyone. But for data journalists, a dearth of information exists on how we design visual experiences for accessibility. That's especially true for neurodivergent audiences.

In this 45th edition, we'll hear from Sean Gilroy, the BBC's cognitive design head and neurodiversity lead, and Leena Haque, senior UX designer and BBC neurodiversity lead. Based on your questions, they talked to us about designing for neurodivergent audiences and why it matters for data storytelling. Wait, 'what's neurodiversity?' you ask. Read on to find out!

What you asked

What is neurodiversity?

The term Neurodiversity was coined by Judy Singer in the 90’s and it references the diversity of human cognition. It is important because it created a paradigm that recognised conditions such as Autism, ADHD, Dyslexia and Dyspraxia as part of a naturally occurring variation in cognitive function, providing valuable and positive cognitive skillsets and

How much of the world's population is neurodivergent?

Limited information and statistics exist relating specifically to Neurodiversity as an umbrella reference, but we have seen estimates range from 15% to 30% of the population being Neurodivergent. As the conversation about Neurodiversity grows, we believe that a more positive attitude toward different cognitive styles develops and that the skills and abilities of Neurodivergent people will be increasingly desirable as part of the future of diverse and inclusive organisations and employers.

Tell us about the BBC’s CAPE Project. How did it come about?

We both work in the BBC’s User Experience & Design department, researching a new design framework based around Neurodiversity and Neuroscience called Cognitive Design. Alongside our day jobs however, we created the BBC’s Neurodiversity initiative, BBC CAPE, which stands for Creating a Positive Environment.

We recognised there was a lack of understanding and information relating to Neurodiversity in the workplace, so we decided to do something about it. There is an abundance of talent across the Neurodivergent community that is being missed because traditional methods of identifying and measuring ability, recruitment processes etc, create barriers rather than opportunities and due to the lack of understanding of conditions like Autism or ADHD, people often struggle to access the right support. Through CAPE, we wanted to find solutions that would help remove these barriers to employment for, as we see it, the benefit of both neurodivergent people and the BBC.

Leena Haque and Sean Gilroy are the BBC's neurodiversity leads.

How can data journalists apply principles of inclusive design for a neurodivergent audience?

Inclusive design is increasingly being recognised for its importance in delivering inclusive experiences for consumers and audiences. It is difficult to identify any one specific example, as experiences of consuming content and information are often specific to an individual.

However, the best way to find out preferences and solutions is to ask and involve those individuals who have personal experiences and stories, in any process to develop more inclusive experiences.

When we design for visualisation the focus can sometimes move toward colour for example. However, we should remember that visual cues also extend to things like size, shapes and patterns. The layout of data can also offer support to the way some neurodivergent people consume information, so leveraging white space for example.

Is it worth creating a version of a visual story for neurodivergent people?

We believe it is important for designers to appreciate an increasingly diverse audience and to move toward offering choices to people, which in other words might be described as personalisation. A one-size-fits-all approach is arguably a position we are increasingly moving away from as we realise there isn’t really such a thing.

If design caters for different user cases and needs, the results often lend themselves to a much wider audience, over and above the original intended user group. Take Voice UX for example, which can be utilised as an inclusive tool for people who are blind or visually impaired, but it might also offer a more inclusive experience for someone who is Dyslexic for example.

Is there any element of Intersectionality when it comes to designing for neurodiversity?

Intersectionality is important in regard to Neurodiversity, but more from the perspective of interconnected social categorisations of race, gender, sexuality etc, as coined by Kimberle Crenshaw. The appreciation of shared traits across different conditions is something separate to Intersectionality, but nevertheless important when designing inclusive and accessible content.

We contributed to an All-Party Parliamentary Group research initiative, Westminster AchieveAbility, whose research a few years ago highlighted that co-occurrence of conditions was typical, with individuals often having a diagnosis of 2 or more conditions. We had also recognised this in our work for BBC Cape, as our research had indicated shared traits and experiences of people across the neurodivergent spectrum, which is why our initiative focuses on the umbrella reference of Neurodiversity rather than any specific, individual condition.

In respect then of making stories or content inclusive, it comes down very much to simply appreciating that people consume information in different ways and have different preferences, so it's important to focus on the individual and to understand and appreciate as many different perspectives as possible, rather than try to cater to any one condition.

How important is it for data journalism teams to be hiring talent with neurodivergent conditions?

It is important to have neurodiverse teams, as it is important to have diverse teams if you want to be sure that your output will be reflective and representative of your audience.

Possibly more appropriate would be to recognise that it is valuable to have Neurodiverse teams because it will add depth and perspective that could offer different insights and approaches that have the potential to provide a creative and competitive advantage to your content.

What one thing do you wish non-neurodivergent designers could understand about you and your condition?

Leena: I would like to think that any Neurotypical (non-neurodivergent) person, designer or otherwise, would take the time to understand and respect who I am, as I would offer them the same courtesy of understanding and respecting who they are. Someone’s condition isn’t really relevant to who they are, any more so than being left or right-handed makes a difference to someone’s identity.

As part of our work, we talk about challenging homogeneity as opposed to promoting diversity. If we turn our focus away from highlighting what makes us different and instead focus on making sure everyone isn’t the same we can begin to move away from seeing people for their differences and instead of appreciating everyone for what they can offer.

ICYMI: other happenings on DataJournalism.com

Designers often follow a set of strict conventions when creating visualisations. Kaiser Fung, the founder of Junk Charts, examines the fundamental rules of data visualisation, why they are important, and when it is okay to break them.

Our next conversation

It’s time for a Q&A! Joining us in the next edition we have Craig Silverman, Buzzfeed's media editor. As a fake news expert, he's authored and edited a number of books on misinformation, including the European Journalism Centre's latest Verification Handbook. You might’ve also seen Craig in our video course Verification: The Basics. We'll discuss the importance of verification and what it can do for your data journalism stories.

As always, don’t forget to let us know what you’d like us to feature in our future editions. You can also read all of our past editions here.

Onwards!

Tara from the EJC Data team,

bringing you DataJournalism.com, supported by Google News Initiative.

P.S. Are you interested in supporting this newsletter as well? Get in touch to discuss sponsorship opportunities.

Decoding the dynamics of a data team

Tara Kelly — Wed, 22 Jan 2020 17:02:00 +0100

Behind every award-winning data visualisation, there's a hardworking team merging the best of design, code and journalism. But orchestrating such a data team is no small feat.

In this 44th edition, we'll be looking at all of the advice you have on creating and managing data units in the newsroom. We'll also hear from investigative and non-profit teams that lead projects remotely.

What you said

What’s the best mix of skills for a data team? Let's begin with The Telegraph's Ashley Kirk, one of the paper's first data journalists: "Data journalism teams need a breadth of skills, but the most important one is simply journalism. The ability to dig out a story in data and be able to communicate it to an audience. Every other related skill - using scraping or freedom of information requests to source information, using R or Microsoft Excel to analyse data, or using ggplot, QGIS or other tools to visualise your findings - is simply a means to an end."

At ICIJ, research editor Emilia Diaz-Struck explained, "Our team is a multidisciplinary team with a combination of skillsets that allows us to bring different approaches to the data work we do. We have data reporters, researchers, developers, fact-checkers and an editor, working together on the data in coordination with our team of reporters. A combination of data analysis, research, coding, fact-checking and reporting can be very powerful for a data team."

At DATA4CHANGE, a non-profit working with civil society organisations to create data-driven advocacy projects, co-founder Stina Backer shared their experience: "Most people we select are T shaped, meaning that they are capable in many fields and expert in at least one. Once in a blue moon, we come across a unicorn -- a person with expert skills in many disciplines -- but they are rare. Some have years of experience, and others are rising stars. Diversity is the teams’ real superpower."

Building and leading a data team

Pete Sherlock, the BBC's Assistant Editor from the Shared Data Unit, gave a few tips for those starting out: "The key for me when setting up a data unit is not to fret too much about being able to do everything at first. Just tell great stories. Content is key, and about 90% of content can be created using Excel and sound data principles. When you face a specific problem, work out how to solve it. That might involve code, it might not. Build up your skillset as you move forward...Don’t worry about not knowing everything at once. Here’s a secret: nobody does."

When you have a multidisciplinary team of data journalists, designers and coders in one space, it can be tricky to pinpoint the best person to steer the team.

One journalist who has worked across a number of UK national newspapers on data-led investigations is Leila Haddou. The former data editor of The Times and The Sunday Times advised, "Whoever leads the data team should be focussed on stories and have a broad mix of skills. Even if they cannot code themselves, they must be at least have an understanding of what's possible and in what timeframe. In teams with a mix of skills, projects should be approached in a truly collaborative fashion."

Lost in translation

But how do data teams speak the same language when hailing from such different worlds?

James Tozer, a data journalist at The Economist, said the choice of tools for collaboration is key: "Because we're largely using R, it is possible for writers and designers to collaborate more closely, and inspect each other's work.

But he also highlighted how important building a culture of shared learning is for the team: "I think we've now reached quite a nice multi-disciplinary mix, where several people have particular skillsets (such as Elliot Morris in statistical modelling, or Evan Hensleigh in interactive design), which means that if you need help with a particular task, you can usually find someone on the team with more knowledge of it than you."

Moving beyond your speciality

And what about specialising in design, code or storytelling? Is it better to hone in on one skill or attempt to learn them all?

Marie Segger, a data journalist at The Economist, believes there's merit in developing a varied skillset: "I find that especially when teams are small, you need to be an all-rounder who can gather, analyse, visualise and write about the data."

The FT's data reporter David Blood values keeping an open mind about who can learn what: "I sometimes hear people say things like: “You can teach coding to a reporter, but you can't teach reporting to a coder”. The implication there is that journalistic instincts are the sole preserve of traditional reporters. I've found that to be a mistaken view: I've worked with developers who have great news sense. I think the real issue is that reporting and news writing are like any craft in that they require regular practice and take time to master.

When the newsroom doesn't 'get' data

Communication isn't just an area of concern between members of a data team. Misunderstandings can exist in the newsroom, too.

Niko Kommenda, who's both a journalist and developer at The Guardian, gave us his take: “Working on a team that can do both data analysis and visualisation is exciting because you can potentially get involved in all steps of the story process. The challenge, however, is in communicating with other teams in the newsroom when to approach us and what to expect.”

Investigative journalist Leila Haddou noted a similar experience: "Often the tendency is to treat highly skilled individuals in these newer roles as some sort of service desk. It is better that we make the best of everyone's talent and the results will speak for themselves!"

ICYMI: other happenings on DataJournalism.com

Often referred to as the fourth estate, journalism is key to a democratic society. But sometimes just reporting on an issue isn’t enough. To promote accountability, Civio founder Eva Belmonte explores how data is blurring the lines between advocacy and journalism. Check it out here.

Our next conversation

It’s time for another AMA! Joining us in the next edition, we have Sean Gilroy, the BBC's Cognitive Design Head and Neurodiversity Lead. We'll hear all about inclusive design and why it matters for your data storytelling. Wait, 'what's neurodiversity?' you ask. It's a relatively new term that relates to spectrum conditions such as dyslexia, autism, ADHD, dyspraxia and many other neurological conditions. Comment to submit your questions.

As always, don’t forget to let us know what you’d like us to feature in our future editions. You can also read all of our past editions here.

Onwards!

Tara from the EJC Data team

Q&A with The Sigma Awards team

Marianne Bouchart — Thu, 09 Jan 2020 21:58:00 +0100

Welcome to our first edition of 2020, where we kick off the new year by announcing the launch of a brand new data journalism competition, The Sigma Awards 2020!

To mark the launch, we caught up with the Sigma Awards' co-chair Aron Pilhofer and competition manager Marianne Bouchart -- two names synonymous with data journalism.

The pair talked to us about how the new competition came to be, why it differs from the previous awards, and the backstory behind the name.

The Sigma Awards are the new successor to the Data Journalism Awards.

What we asked

Tell us about the Sigma Awards. What’s with the name?

Aron: "The name is important for us on a couple of different levels. First, it is meant to make clear that this is something completely new, not just a continuation of the Data Journalism Awards (DJAs) under a new name.

When the DJAs went away late last year, Reginald Chua and I had a number of conversations about whether we should try to do something to keep the awards going. But the more we talked, the more we realised we were talking about something quite different. So a new name made sense. Thus, the Sigma Awards were born."

And what does "Sigma" actually stand for?

"The name itself was suggested by one of our jurors, Steve Doig, and it’s perfect. In mathematics, the Sigma means to sum up.

But for those of us (like Steve) who have lived much of their professional life in Microsoft Excel, the Sigma symbol is like an old friend sitting right there in the menu bar staring you in the face. It has a special place in every data journalist’s heart. So, again, it’s perfect."

Aron Pilhofer is the James B. Steele Chair in Journalism Innovation at Temple University and one of the co-creators of the Sigma Awards.

What happened to the previous Data Journalism Awards?

Aron: "As for what happened to the Data Journalism Awards, I don’t think any of us really knows. No one involved with the Sigmas had an official role with the Global Editors Network, so we don’t have any special insight into its finances. So I don’t think we have any better answers than what has already been reported."

How do the new Sigma Awards differ?

"As for how they will differ, that’s going to evolve over time. The mission of The Sigma Awards is:

To highlight the very best data journalism being done around the world;
To build programmes and resources around the awards that enable people in and out of the data journalism community to learn from this work;
To use the awards as a way to unite, galvanise and expand data journalism communities around the world.

What that means to us is that this awards programme has to be more than just a rubber-chicken dinner, oddly shaped plastic statue and line on a resume. We want them to be a centre of gravity for the global data journalism community -- an annual celebration of great work, but also (and more importantly) an opportunity to learn from that work.

How that manifests over time we aren’t quite sure yet, but you’re starting to see some glimpses of what we think the Sigmas should be about: There won’t be a big gala event. Instead, the winners will be invited to come to the International Journalism Festival in Perugia to participate in a series of panels, demos and hands-on sessions. They are coming to teach and to learn, in other words.

For next year and beyond, everything is on the table. We know we want this award to be “owned” by the growing community of data journalists around the globe. We know we want to be more directly connected to these communities, and we want to help empower them.

That’s our aspiration, anyway. It’s exciting."

Who is behind the Sigma Awards?

Aron: "So many people! And frankly, I am still in a state of disbelief at how quickly this came together. Late last year when it became clear the DJAs were going away, Reginald and I started talking about whether we might want to try to keep something going. We both felt strongly that there needed to be something celebrating all the incredible work happening around the world, but we had a very small window of time to figure out what we wanted to do for 2020.

We sent a note to our fellow jurors on November 27th describing what we wanted to do and asking if they wanted to do it with us. To our absolute shock and delight, we heard back instantly and emphatically that they were. We had a jury at least!

Simon Rogers at Google signed on as director. Paul Steiger came on as an advisor, and the supernaturally talented Marianne Bouchart agreed to manage the awards with help from Kuek Ser Kuang Keng.

Adam Thomas at the EJC signed on, as did Arianna Ciccone and Christopher Potter at the International Journalism Festival in Perugia."

Marianne Bouchart is founder of HEI-DA and the competition manager for the Sigma Awards.

Who is eligible to enter?

Marianne: "All organisations, regardless of their size, but also individual journalists, freelancers, news developers, or students can enter the competition. The projects submitted have to be pieces of journalism though, as data-driven projects that were clearly produced for commercial purposes won’t be considered by the jury. If you have any doubt on whether or not your project is eligible for this competition, get in touch with the Sigmas team."

What are the categories for the various awards?

"We’re giving a total of nine prizes among 6 categories this year: best data-driven reporting (small and large newsrooms), best visualisation (small and large newsrooms), innovation (small and large newsrooms)*, young journalist, open data, and best news application.

*Two prizes will be given for these categories, one to a small newsroom, one to a large newsroom."

When is the deadline to enter and where can I submit my entry?

"Entries to the competition are now open and data journalists from around the world have until 3 February 2020 at 11:59 pm ET to enter.

Go to the Datajournalism.com website and check out all the info about the competition. We’re using a platform called Judgify to gather entries. You can get directly to the application form via this link."

When will the Sigma Award finalists be announced?

"As soon as the deadline is reached on 3 February 2020, our pre-jury will go through all the entries and come up with a shortlist of the best projects in each category, for our jury to pick winners from. Winners of the first Sigma Awards for data journalism will be announced in the second half of February 2020."

What about the final event and prizes?

"Our goal is to announce the winners online in the second half of February 2020 and to organise a session at the International Journalism Festival 2020 in Perugia, Italy, where all winners will present their projects and be given a trophy. #IJF20 will also host a series of sessions on data journalism in which our winners will take part. And that’s the big prize for all the winners: an all-expenses-paid trip to the International Journalism Festival 2020, taking place on 1-5 April 2020 for up to two people from their team."

ICYMI: other happenings on DataJournalism.com

In a world where journalists are also ‘numbers’ people, teachers need to find innovative ways to overcome their students' math anxieties. Using research from other disciplines, Kayt Davies outlines fun exercises that can be used in any classroom. Check it out here.

Our next conversation

Behind every award-winning data visualisation, there's a hardworking team merging the best of design, code and journalism. But orchestrating such a data team is no small feat. In our next edition, we're looking for your advice on the dynamics of creating data units in the newsroom. Who is best equipped to lead a data unit? What do the best teams have in common? Can you really teach curiosity to a coder? What about coding to a journalist? We want to hear how you build a team with the right mix of skills to produce compelling data stories. Share your experiences here.

As always, don’t forget to let us know what you’d like us to feature in our future editions. You can also reread all of our past editions here.

Onwards!

Tara from the EJC Data team

AMA with La Nación

Momi Peralta Ramos — Wed, 18 Dec 2019 12:05:00 +0100

Buenos días! Welcome to our final edition of 2019, where we’ll be heading down to Argentina to find out more about how data journalism is practiced in Latin America.

It’s hard to think about Argentinian journalism without thinking about La Nación. The daily is considered one of the region’s powerhouse outlets and, since 2010, it’s also boasted a cutting-edge data department.

Whether it’s exposing Argentina’s biggest corruption scandal, or putting a data lens to breaking reports of a missing submarine, La Nación’s Data Team doesn’t shy away from challenging stories. Founding team member, Momi Peralta, joins us in this edition.

The La Nación Data Team.

What you asked

How does data journalism in Argentina compare to the rest of Latin America?

Momi: “I think there is good data journalism in Latin America, where many of the initiatives are led by independent media and also by NGOs. In Argentina, we have a strong relationship with transparency NGOs, hacktivism, and universities since 2011, when there was no transparency law or open data. Together with the open data community, we’ve built some datasets from scratch, many of them in events, hackathons, and meetups, and now we continue demanding better and more frequent open data formats and documents.”

On the topic of open data, how important is it in your reporting?

“Open data is very important for our reporting, and our reporting is very important to the open data movement and the health of transparency in Argentina.

Using open data is how we can bring it to life, and closer to our users, the citizens who can see and participate through applications or visualisations. For example, we build datasets and then open them for others to reuse, like our open statements of assets, or the dozens of daily and monthly indicators that we update from PDFs and open in CSV.

Our belief is that we are not only telling stories, we are also offering others the opportunity to add value after our initial effort of processing this data, saving them time and effort.”

An example of how the team digitises paper assets into interactive and reusable data.

Which of your investigations has had the biggest social or political impact in Argentina and what can other journalists learn from this work?

“Our collaborative investigation in the Death of the prosecutor Nisman case. In this investigation, we listened to more than 40,000 audio files of phone interceptions, using a platform we developed for crowdsourcing, and worked together with universities and volunteers. Collaborating allowed us to classify the audio data and transcribe some extracts, which helped us to build a dataset, discover new people involved, new stories, and develop an application that runs like a playlist.

After that, the Judge in the case asked for our La Nación investigation and our findings from the dataset, and this was included as proof in the case.”

The playlist-like application used by La Nación in their Nisman story.

What are the most valued skills in your team?

“Teamwork, perseverance, and a love for learning and creating.”

What are some of your go-to datasets?

“The census, the official statistics bureau, central bank data, the national and city of Buenos Aires open data portals, the national budget and national purchases and contracting site, and data from the Congress and Senate.”

What can audiences expect from your team in 2020?

“More interactive and data driven investigations, more technology and data applied to produce better services for our users, and more innovative ways of telling these stories on every platform available. We’re also looking for more open collaboration with universities, who are applying data science and AI to produce knowledge and content.

In addition, this year we are helping position an urgent agenda in La Nación, also using data, in our Nature Project (Proyecto Naturaleza). It is about changing the way we tell stories on the climate and extinction crises. In this project, which is being run across many of La Nación’s sections and magazines, we want to use data visualisation, public information monitoring, and dashboards.

Some of the stories that have already been produced through Proyecto Naturaleza -- with many more to come.

Supporting climate and nature stories with data and evidence-based journalism will make a difference to our readers, so that they can increase their participation as we facilitate their access to information and other ways to act on this urgent topic. La Nación has already participated in the Covering Climate Now initiative from the Columbia Journalism Review and The Nation, and was a media partner in the #6D it Now! global initiative.”

Follow La Nación’s work here.

ICYMI: other happenings on DataJournalism.com

Geographical information systems, or GIS, provide one of the most efficient means of uncovering patterns in geographic data. But what is GIS? And how can it be used in journalism? Jacques Marcoux answers these questions and more, in our entry-level introduction to GIS and spatial analysis for reporters. Check it out here.

Our next conversation

...okay, so we’re not ready to reveal the topic of our next conversation just yet -- for good reason. In 2020, DataJournalism.com will be launching an exciting new initiative and our first edition of the year will be dedicated to informing you all about it. Keep your eyes peeled for an announcement in your inbox shortly and our call for contributions!

On a personal note, I’ve decided that this edition will be my last conversation with you. Starting as a risky idea that we trialled in May 2018, Conversations with Data has now grown to 42 editions, reaching 7000 of you on bi-weekly basis. Thank you for all of the support so far! I’ll be leaving you in the very capable hands of our new Data Editor, Tara Kelly, who’ll be continuing our conversations in the new year.

Onwards!

Madolyn from the EJC Data team

AMA with Steve Doig

Steve Doig — Wed, 04 Dec 2019 12:05:00 +0100

Hi there! Can you believe it’s December already? With only one edition left before the end of the year, we thought we’d give you some lessons to ponder over the holiday season from one of the field’s greats.

That’s right, in this 41st edition of Conversation with Data, we let you lose to question renowned data journalist, now professor, Steve Doig.

With over 20 years of experience teaching budding journalism students at Arizona State University, and another 20 years pioneering data work at the Miami Herald before that, he’s got plenty of tips and tricks up his sleeve.

What you asked

What was your first data-based story and what did you learn from it?

“Hmm, I guess it might be from when I was covering the Florida Legislature in the Herald's state capital bureau. One task when doing a story that involved a roll call vote was to write out a ‘How They Voted’ sidebar list of names of who voted for and against some measure. Often in the story there also would be mention of some simple metric like how the parties had split in the vote. I realised my then-new IBM-PC could help me to that better.

I had started teaching myself to write programs in BASIC, so I conceived and built, over the course of a week or so, a clunky program that let me quickly mark in a table how each lawmaker had voted, and then with a press of a button would generate the sidebar. But even better, I had included what I'd call political demographics about each lawmaker, including such categorical data as party, gender, race, rural vs urban, state region, leadership vs. rank-and-file, and so on. So my program also would generate cross-tabs on each of those categories, which often would reveal more interesting explanatory patterns than simple party breakouts.

What I learned is that the computer can be a great tool for handling repetitive tasks, like typing out that required sidebar or looking for interesting patterns. I also learned that new tools come along quickly, including the simple off-she-shelf database program I discovered a few months later that made my roll call analysis program obsolete!”

Since that start to your data journalism career, what is the best advice you’ve received about data storytelling?

“I would say it was from an editor who made me do more reporting for an early numbers-heavy story about poverty. "This story needs fewer numbers and more voices," she said. "Find more people who are affected by those numbers, and get them into the story." I quickly learned that sometimes a good data story might hang on just one number pulled out of a large data analysis; the people, not just a table of numbers, usually are the real story.”

On the topic of editors: How can you convince your editor that you -- a ‘normal’ journalist with a personal interest in data -- need time to clean the data, and it can turn out to be useless? The transition from ‘personally interested’ to ‘data journalist’ is can be challenging in these times of overwork.

“I feel your pain. Good for you for wanting to add data skills to your reporting toolbox, but the reality these days is that you almost certainly will have to invest your own time and own money into developing a skill level high enough to impress the boss. That in fact was my own career path at the Miami Herald, a very competitive newsroom where you needed a superpower to stand out. In about 1981, I bought an Atari 800 computer to play with at home, but began to realise it could help me at work. Before long, I bought my first PC and my first spreadsheet (Lotus 1-2-3) and began adding some data analysis to daily stories. My bosses began to notice, and encouraged me to do more. Data work became my superpower.

So I suggest you start with small stories that you can present as ready-to-go, even if you have to spend time at night and on days off working with the data. It would be hard to persuade editors who may not have data skills themselves to give you, with no real track record of doing data projects, time to clean up a large and messy dataset, do a major analysis, and perhaps discover that there's no real story there. I have my students write pitch memos to a hypothetical editor describing the story, including what has been found in a preliminary analysis, a sample lede, and perhaps a few bullet points. But happily, there are lots fewer editors these days who think that data journalism means you just hit a few keys and stories magically appear.”

Great answer! This is something Maud Beelman and Jennifer LaFleur touch on in our guidance for editors as well. Now to another challenge faced by one of our readers: I tell stories through data visualisation and infographics with less text because of my graphics design background. Can I still call myself a data journalist?

“Yes, of course you are a data journalist. Good data viz is just another way of telling a story, no worse or better in the hands of a good journalist than text or videos or podcasts or other media. For some (many?) data-heavy stories, good infographics in fact can be the most effective way to tell such stories. If you are trained in graphics design, you probably know to avoid ‘chart junk’ and other data graphics sins deplored by Edward Tufte.”

What have you learnt during your transition from a practicing data journalist to a data journalism professor?

“The most important thing I learned is that expertise is necessary, but not sufficient, for being a good teacher. When I became a professor, I was accustomed to teaching newsrooms pros how to do simple things with a spreadsheet, usually because someone would come to me and say "I have this data, and I want to use it to answer these questions, but I don't know how to get those answers”. With young students, though, I learned I first had to show them how to think about data as a source of stories. I would have them do exercises of taking a dataset, looking at the variables in it, and then coming up with lists of interesting questions the data could -- and couldn't -- answer.”

So what do you think are the most important skills for data journalism students today?

“You probably want me to tell you which specific tools you should master; okay, Excel is the gateway drug into data journalism. But I think the most important basic skill is the mental agility to learn new tools and techniques, and to realise there is no single correct tool as long as whatever you use produces a correct answer. Consider programming languages: These days there are camps of journalists who do data cleanup and analysis with Python or SQL or R or SAS, all of which can do the job. But many of those journalists, at least the older ones, may have started with Visual BASIC or Perl or using now-extinct database programs like dBase or Paradox or Reflex or FileMaker. And five years from now, you may be among a contingent using tools that haven't even been invented yet. Furthermore, the skills and tools you wind up using will depend on what branch of data journalism you want to pursue, whether it is analysis of data for investigative projects, or design of front-facing interactive web graphics, or development of back-end newsroom systems. New ways of doing all those things are steadily emerging, and you need to be ready to adopt those that offer value. I'll add that college taught me NONE of the tools I use today, but college did teach me how to learn new things.”

For more from Steve, check out our video course, Doing Journalism with Data: First Steps, Skills and Tools, or our interview with him in Data journalism in disaster zones.

ICYMI: other happenings on DataJournalism.com

Even good predictions are hard to communicate to readers. But bad predictions, especially in high-stakes situations (such as elections or financial recessions), can be more than confusing and misleading -- they can be dangerous. From The Economist’s G. Elliott Morris, our recent Long Read, The dos and don'ts of predictive journalism, draws on examples from political journalism to describe some guidelines for good predictive journalism.

Our next conversation

This year, we’ve aimed to make our conversations a little more global, travelling across Africa and Asia to explore how data journalism is practiced in different contexts. Following on with this journey, for our final edition of 2019, we’ll be heading down to Latin America as well. Joining us all the way from Argentina, we’re excited to have the brilliant data team from La Nacion with us for our next AMA. Comment to submit a question!

As always, don’t forget to comment with what (or who!) you’d like us to feature in our future editions.

Until next time,

Madolyn from the EJC Data team

Sensor-based journalism

Thomas Hallet — Wed, 20 Nov 2019 12:05:00 +0100

Data is all around us -- be it the sounds of birds tweeting, or the levels of water pollution in a community -- it’s just a matter of capturing it. And while many of us rely on our trusty sources to analyse existing data, there’s an emerging opportunity to use sensors for the generation of new datasets on all types of physical phenomena.

In this 40th edition of Conversation with Data we’ll be looking at the weird and wonderful ways that journalists have used sensors, with some technical tips and tricks along the way.

What you said

We’ll start with some foundational advice from africanDRONE’s Frederick Mbuya: “It's the data, not the sensor that is of importance and, along the same line, it’s the story not the sensor.”

With that in mind, let’s take a look at some cool examples of story-based sensor work from our network. Over at the German public broadcaster, WDR, Thomas Hallet and his team created two web projects that played with the possibilities of sensor driven journalism: Superkühe and bienenlive.

“In Superkühe we followed three ordinary cows from three different farms (family farm, organic farm, factory farm) through the course of four weeks. We used different sensors and data streams to let the users observe how the cows were doing in a 24/7 mode: We tracked activity level, eating behaviour, health (body temperature and rumen pH) and -- of course -- milk production. The data were processed to feed a dashboard on the project website. Significant figures, on the other hand, were used to drive a chatbot on facebook messenger, where users could interact with simulations of each of the three cows,” Thomas told us.

Superkühe by WDR.

Jakob Vicari and Bertram Weiß, from Sensors & Reporters, worked alongside the WDR team on bienenlive, and gave us a behind the scenes look at that project as well:

“A bee hive is a blackbox. We installed sensors at three hives. A simple text engine transformed the sensor data into live diaries by the queen. We observed especially hard-working bees. But researchers found out: Most bees are not so active at all. We wanted to take a closer look -- and went on the trail of the chill bees. We found a paper by researcher Paul Tenczar who equipped bees with small transmitters. We got a bunch of transmitters. We tried to glue some transmitter backpacks on our bees.”

In the end, their sensors allowed WDR to report live from the lives of three bee colonies.

bienenlive by WDR.

Frank Feulner, of AX Semantics, shared another example of air pollution reporting by Stuttgarter Zeitung. Using democratised sensors, the team was able to report on fine particle and nitrous oxide pollution in the German city of Stuttgart.

“We aggregated data over time so we could assess how bad the situation actually was in a certain area, and when relief was to be expected. We mingled sensor data with official data about air pollution alerts and other external factors to create a unique database,” he explained.

Sensors have also been a focus at Media Innovation Studio at the University of Central Lancashire, where John Mills has been leading research into different types, shapes, sizes, and use cases:

“Projects range from sensing biometric data to create data dashboard of Ranulph Fiennes’ attempt to run the Marathon des Sables a few years go, enabling school children to map pollution levels on their way to school in our DataMakers / UKKO project, to turning paper into an interactive surface that detects touch and creates media experiences. Our most recent project -- SenseMaker -- is working in Manchester in the UK. It asks: if journalists and communities could create their own sensors, what would they be? We’ve been working with data and editorial teams at the Manchester Evening News and with the local community to build a range of ‘sensing’ devices. Our early concepts -- which are currently being built or in the early stage of deployment -- span pollution in people’s homes, stress levels during the daily commute and image recognition what colour people wear the most!”

Despite these cool projects, getting started with new technology is never easy. Amy Schmitz Weiss shared some learnings from her work on the What’s in the Air project with inewsource:

“For those getting started with sensor-based journalism, make sure to give yourself plenty of time to test out different sensors and their capabilities. There are a variety of manufacturers now that all have different capabilities and sensor data outputs, some are more accurate than others depending on cost. So having plenty of time for R&D before launching the sensor into the community is important. Doing a small pilot for a few weeks is highly recommended to get through any issues or problems before full launch. For our project, we gave ourselves six months to test out the sensors and determine the best ones and the overall plan,” she said.

Similarly, whilst at university, Reuters journalist Travis Hartman led a team to build a sensor-equipped tool that would collect data on noise pollution in Columbia, Missouri. His advice: “just dive in”.

“I started by learning to solder and building simple circuits with sensors that I could easily manipulate in the physical world. Thermometers, light sensors, etc, and learning how to write the code in the IDE to log and parse the data that was being logged. There are lots and lots of kits that come with the sensors and the code and wires all in the same package.”

And be sure to look for local initiatives and experts who might already be working on something, Jan Georg Plavec recommended. “You may then use your audience to further disseminate the sensors and work together with your readers.”

Now to the technical part: What sensors to use?

In Moritz Metz’s bi-weekly radio show, Netzbastel, he’s been lucky to experiment with a variety of different sensors. Here’s a snapshot of some options he’s tried and tested for you:

“The great Melexis MLX9064 makes a cheap but impressing 32x24 thermal camera with an ESP32 Microcontroller. With a Seed 4-Mic-Array we built a DIY-Voice-Assistant. An ordinary piezo-speaker serves as a great knock-sensor at my workshops front door, just knock the right rhythm and the door opens! My favourite sensor today is the cheap doppler radar sensor CDM324 (which is quite similar to HB100 or RSM2650). We needed to amplify its output signal quite a bit -- but with the proper preamp and an Arduino it is able to measure the speed of a passing vehicle -- so you can build a real radar gun for less than 20 Euros / 25$. I do wanna play more with ESP32-WHO which is already able to do machine learning-based face-recognition and identification on a sub 20$-device.”

Marcus lindemann added a few more tips:

“To get started I suggest buying a sensor that is easy to understand, e.g. we are all familiar with GPS since it has been built into our phones for a decade now. So buy a GPS-tracker for 100 Euros plus a SIM for the tracker (here you search for machine-to-machine-SIMs, short M2M, and buy a prepaid one). Then carry the tracker around in your office bag or in your car. Have a look at the data and try to make sense of it. Experiment with the settings -- how often should the GPS-sensor send its position? The one we use accepts any value from five seconds onward, the longer the interval the longer the battery will stay alive. What speed of movement do you expect from your target?”

Or, alternatively, why not build your own sensors. Jakob Vicari and Bertram Weiß often use the particle.io ecosystem for their sensors. “Particle offers a combined solution of hardware and cloud. It is easy out of the box and well documented. Find some quick start instructions in my blog.”

For more on sensors in journalism, be sure to follow future iterations of the Journalism of Things conference.

ICYMI: other happenings on DataJournalism.com

Editing investigations is challenging enough, but how can you succeed if the story involves data and you’re a numbers neophyte? In one of our latest Long Reads, Maud Beelman and Jennifer LaFleur share guidance for project editors on ways they can help their reporters shine, including best practices for workflows, bulletproofing, and writing techniques. Check it out here.

Our next conversation

It’s time for another AMA! Joining us in our next edition we have Steve Doig, renowned data journalist turned professor at the Walter Cronkite School of Journalism & Mass Communication of Arizona State University. You might’ve also seen his work in our Data journalism in disaster zones Long Read and our video course Doing Journalism with Data: First Steps, Skills and Tools. Comment to submit your questions.

As always, don’t forget to comment with what (or who!) you’d like us to feature in our future editions.

Until next time,

Madolyn from the EJC Data team

AMA with Vincent Ryan

Vincent Ryan — Wed, 06 Nov 2019 12:05:00 +0100

Research. It’s at the core of all good journalism, but that doesn’t mean it’s always easy...or is it?

Our recently launched course, Fundamental search for journalists, has been especially designed to help journalists advance their research skills. Using Google’s tools, the course showcases efficient ways to search and verify online information -- all from the comfort of your own desktop. And, like all of our courses, it’s free!

The course is taught by Vincent Ryan, who’s joined us for an AMA in this 39th edition of Conversations with Data. Formerly of the Sunday Times, Vincent now works with the Google News Initiative to help journalists find, verify, and tell news stories around the world.

What you asked

First of all, let’s talk about your course. If there’s just one lesson you hope journalists take away it, what would it be?

Vincent: “The power of search modifiers. The most basic tip in finding information but still one of the most powerful.”

What other kind of tool do you wish existed for data journalists (but doesn’t yet)?

“An intuition checker. A tool that would tell you if your hunch is worth pursuing before you put three days of effort into sorting the results to see if there is anything newsworthy.”

Your course also touches on verification. In your opinion, what are the most important considerations when using online tools to verify data accuracy?

“Treble checking that the data does indeed refer to what you believe it to be in your hypothesis.”

To get ahead in data journalism, what do you think are the most important skills or techniques that reporters should focus on developing?

“I don’t think that data journalism is radically different from traditional journalism. At the end of the day it is the value of the story that matters. Being able to understand what constitutes a good yarn is the most crucial skill for all journalists.”

Similarly, what are the biggest challenges facing data journalists today and what can be done to overcome them?

“Time is always the biggest challenge facing all journalists. Having an understanding editor is the best way to overcome it.”

Take Fundamental search for journalists here.

ICYMI: other happenings on DataJournalism.com

Because we know you’re busy -- and perhaps a lil too busy to constantly refresh DataJournalism.com for new content -- we’re launching a new section to highlight the site’s latest happenings. This week, we’re showcasing our recent Long Read, De-identification for data journalists, by Vojtech Sedlak.

Just like confidential human sources, journalists also need to evaluate what data to publish without revealing unnecessary personal details. In this Long Read, Vojtech provides an introduction to privacy in the context of data journalism and practical tips on how de-identification techniques can be introduced into journalistic workflows. Read it here.

Our next conversation

Data isn’t just found by scraping online sources or obtaining government records; increasingly, journalists are turning to sensors to glean data from the physical world. In our next edition, we want to hear about your experiences using sensors to collect data or tell stories. Comment to submit!

As always, don’t forget to comment with what (or who!) you’d like us to feature in our future editions.

Until next time,

Madolyn from the EJC Data team

Following the money

Jerry Vermanen — Wed, 23 Oct 2019 12:05:00 +0200

They say that ‘money makes the world go round’. If so, that means there’s a whole lot of stories to be found by looking into how people and organisations spend their dimes.

In this edition of Conversations with Data, we’ll be showcasing some of the ways you’ve investigated money trails, and resources that you can use to dig into the data behind Euros or Yen, dollars and pounds.

How you’ve followed the money

For inspiration, let’s start with an example of powerful reporting from the Netherlands.

Pointer, together with Reporter Radio and Follow The Money, decided to take a closer look at the financial statements of Dutch home care companies. After a broad analysis, their Care Cowboys project revealed that 97 care providers together made more than 50 million Euro in profit, paying more than 20 million Euro in dividends. In short: a lot of people getting rich in a publicly funded sector.

So, what was the secret to their project’s success? In addition to collaboration and opening up data, Pointer’s Jerry Vermanen emphasised the importance of finding relatable stories in data. “It's easy to report on a number,” he said, “but it's a good story when you can find victims of this fraud. Invest in your sources, talk to employees and clients.”

Just two of the stories that Care Cowboys unearthed.

Moving along to the Canadian Dollar, Kenya-based Emmanuel Freudenthal joined forces with Canadian journalist Hugo Joncas to examine money laundering in Montreal’s real estate market. But with nearly two million private dwellings to get through, this was no easy feat.

“To find purchases by politically-exposed persons, we matched two sets of data: one database of real estate owners in Montreal and a list of African politicians collected from news articles. This came up with a couple dozen hits. Then came the tedious work of looking through each of these politicians to find those whose salary didn't match the purchasing of million-dollar properties without a loan,” Emmanuel told us.

Ultimately, their project (articles here and here) found CAD 25 million worth of real estate purchased by politicians from seven countries, whose legitimate activities couldn't explain the source of the funds.

Can you guess which countries the politicians were from?

Likewise, Sylke Gruhnwald from Republik, shared her experiences from reporting on VAT fraud in Grand Theft Europe and an expensive tax loophole in The Cum-Ex Files.

“Following the money means following the people and the document trail. As a reporter covering money laundering, financial fraud, and corporate corruption, I have to connect the dots, using human sources, records, and a network of trusted experts in the field,” she said.

“Once I have financial records of a company on the table, I check the balance sheet and the notes section. I plug in the numbers in a spreadsheet to compare business years sourced from annual reports. In a next step, I check business registries as well as land registries, consult information from fiscal authorities and stock exchanges. This forms the foundation for any further investigation in corporate misconduct, offshore activity, and financial crime.”

Following the money can also lead to important stories on more niche issues too. Over at DeSmog, the team spends a large portion of their time looking at the money behind this network of quasi-academic 'think tanks' that push an anti-environmental message.

“When it comes to following money trails, particularly on politically sensitive issues like lobbying against climate action, it's important not to stop at the first association. Fossil fuel companies and high-profile individuals have got wise that direct donations to climate science denial campaigns could have a significant negative impact on them or their companies should the funding be uncovered. So instead they fund conduits or mediators to do the dirty work for them,” explained Editor, Mat Hope.

But despite the breadth of stories that can come from these threads, as Jennifer LaFleur pointed out, “searching across public datasets can be arduous, particularly on deadline”.

That’s why the Investigative Reporting Workshop created The Accountability Project.

“Seeing a need to streamline public datasets, we built The Accountability Project to put much of that data in one place so journalists, researchers and others could search across otherwise siloed data,” she said.

“We have a small team that has gathered, standardised, and mapped more than 300 databases with more than 530 million records. We continue to add data in new categories and are planning for some enhancements in the coming year. We also are taking suggestions for other data we should add to the site. We just accepted our first story pitch, but are seeking more. We're excited to see what newsrooms can do with the data.”

And that’s not the only tool on the market. At OpenCorporates, they’ve built the largest open database of companies in the world, with over 180 million companies from 135+ jurisdictions. Their CEO, Chris Taggart, shared how journalists have used their tool:

“Our data was used by the ICIJ in the Panama Papers, and is routinely used by investigative journalists, law enforcement, tax authorities, and corporate investigators. A great starting point is OpenCorporates Guide For Investigators.

A search for all company officers with ‘Smith’ in their name.

The Financial Times’ Gillian Tett spoke about the importance of OpenCorporates and other civic tech organisations who are fighting for greater transparency in her FT article and you can read more about how an AML investigator uses OpenCorporates as part of his workflow when investigating companies.”

For our US-based readers, there’s also followthemoney.org by the National Institute on Money in Politics. Their website provides a number of tools to help journalists query data on candidates, political donors, lobbyists, and legislatures. According to Executive Director, Edwin Bender, their data was instrumental in shaping this Pulitzer Prize winning story, from Eric Lipton of The New York Times, on how donors influence state attorneys general.

There’s tools available for those non-traditional story angles too. Tactical Tech is one of the leaders in this space.

“With our Exposing the Invisible project at Tactical Tech we’ve been working with all these investigators to test, apply, and share innovative ways in which evidence can be traced in various contexts, and most of this work consists of alternative approaches to the good old ‘follow the money’ mindset. We are collecting this knowledge, experience and cases in a resource for investigators called the Exposing the Invisible Kit. Here, among others, we look at how creative digital investigation techniques can help uncover who owns and runs websites and why some may want to hide their connections to certain platforms; how hidden financial (and other kinds of) information can be found online with simple techniques like dorking; how users of a social app end up often unknowingly sharing too much data about their financial habits to the broader public, or how our personal data has turned into a gold mine and a financial asset for marketing firms and political parties in electoral campaigns,” explained Laura Ranca, Project Lead at Tactical Tech.

A snapshot of tools from Tactical Tech’s Exposing the Invisible Kit.

Our next conversation

In some exciting news -- hot off the press! -- this week, DataJournalism.com is launching a new video course, Fundamental search for journalists, by Vincent Ryan. Using Google’s tools, the course will cover techniques for making your research faster and more accurate. To celebrate the launch, we’ll have Vincent with us for an AMA in our next edition. Comment to submit your questions.

As always, don’t forget to comment with what (or who!) you’d like us to feature in our future editions.

Until next time,

Madolyn from the EJC Data team

AMA with Rachel Glickhouse

Rachel Glickhouse — Wed, 09 Oct 2019 12:05:00 +0200

What happens when two or more newsrooms join forces on a beat? A journalistic collaboration, of course!

Welcome to the 37th edition of Conversations with Data, featuring Rachel Glickhouse, the driving force behind ProPublica’s recently launched Guide to Collaborative Data Journalism.

ProPublica is one of the field’s leaders in data-based collaborations, after bringing together over 1000 journalists to work on Electionland and more than 170 newsrooms for Documenting Hate. But, as Rachel tells us, there’s a lot that goes into making large-scale data collaborations work, and countless lessons to learn from.

What you asked

What types of projects are best suited to collaborations? Are there any types that aren’t well-suited?

Rachel: “You can collaborate on anything -- it just depends on how many newsrooms you want to involve. A collaboration can be as small as just two newsrooms, working together to co-report and publish a story. You can have a group of digital, print, radio, and broadcast newsrooms in the same city or state work together to tell stories about the same issue but on different platforms. A large group of newsrooms can work to report stories independently using the same shared dataset. The one thing you'll need regardless of the size of the collaboration is trust, which can be a challenge given the nature of our industry. For that, you'll need good communication, a clear set of expectations and guidelines, and someone managing the project.

If you have a very narrow, specific story to tell, it's more likely going to be suited to a small number of partners. Those newsrooms should be the ones best equipped to tell the story and also to deliver it to the right audience, whether that's determined by geography or the beat. If you want to focus on a specific issue but work independently, you can still band together and co-publish each other's work, or centralise your work on a landing page. If you have a large dataset spread across cities or countries, there's likely opportunity for a larger number of newsrooms to work together on it. (Think Panama Papers, for example.) Ultimately, it comes down to how you want to share resources and divide the labour. We have more tips on this in our collaborative data journalism guide.”

What are some of the biggest mistakes you’ve seen in collaborative projects and what can be learnt from these?

“One thing I often hear is that collaborations require at least one person to coordinate and manage the project. I can certainly attest to that!

I can also tell you about some of the challenges I've encountered. The first is to assume potential or new participants understand the project and how to use your resources. It's important to have a structured onboarding process, trainings, and written materials to make sure partner newsrooms are clear on what the project entails and how to use the data. It's critical to tell people what the data can be used for, and what it shouldn't. Set realistic expectations about requirements for participating in the partnership. If the requirements are technical in nature, make sure you communicate them clearly. If you're going to do a large partnership, it's best to keep barriers to entry low. Finally, the big challenge of any collaboration is ensuring that work will get done. In a big partnership, don't assume that every single partner will be productive or be able to find a story.”

You also launched a new tool with the guide, called Collaborate. Can you tell us what the tool does and some examples of how it can be used by journalists?

“This tool is meant to make it easier for journalists to work together on a shared dataset, whether that's within their own newsroom or between newsrooms. It's especially useful for crowdsourced projects, and can also be used for data projects.

An overview of the Collaborate tool.

Collaborate helps you track the status of each individual data point in a dataset. It lets you divide up a dataset in order to see which journalist is working on each data point or tip. It allows you to create filters and labels to parse, search, and sort the data. You can keep track of which tips were verified, and which data points have been reviewed. You can add notes to each data point, and export the entire dataset with all of the information you've added. You can create a contact log to track each time a tipster or source has been contacted, and by which journalist. You can also redact sensitive data. You can also limit access to each project by creating users with specific permissions.

Now that Collaborate is up and running, ProPublica is going to use it as our main tool for crowdsourced investigations, allowing us to coordinate between ProPublica staff and partner reporters. We're also planning to use it to open up some crowdsourced data sets to local newsrooms, since we sometimes share tips when we don't have time to tackle them all.

I think Collaborate can be really helpful in helping newsrooms tackle crowdsourced projects in order to track each tip or submission. It can also be useful in parsing through large datasets to find qualitative patterns or to select specific data points that would make for good stories or sources.

And if you're a developer, you can tweak Collaborate to meet your newsroom's needs; the code is available on Github.”

What are some of your favourite lesser known collaborative projects and why?

“Some of my favorite collaborations are:

Resolve Philadelphia, which works with basically the local media market in Philadelphia to report on prisoner reentry (a past project) and poverty (their current project). Some of their funding is shared among partners and their whole model is really cool. The fact that they convinced so many competitors in a local market to work together for so long and have been able to produce so much reporting is really remarkable.
Six newsrooms in Florida are working together to report and co-publish stories on climate change. The initiative was announced this summer. It's smart, timely, and a no-brainer.
The Bureau Local in the UK does absolutely incredible work carrying out national investigations and getting local journalists to do their own reporting. They are the gold standard for local-national data collaborations.
Comprova is a project in Brazil that grew out of an election fact-checking collaboration into an ongoing collaboration to fact-check and combat misinformation. There have been a number of these election projects around the world, but given the spread of misinformation beyond elections, I was really heartened to see this project was continued.”

Comprova brings together 24 Brazilian media outlets to investigate misinformation about federal government public policies.

What makes a collaborative project successful?

“There are many ways to evaluate the success of a collaboration, but a few I'd point to are output, reach, and impact. The larger the collaboration, the more difficult to manage the production of stories. If at least one story comes out of a collaboration, that's a success, and even better if it's a well-reported, well-told story. I'm partial to quality over quantity.

Next, it's useful to track the size of the collaboration and the audiences you've reached, which you can count by the number of newsrooms involved and their geographic coverage, metrics on stories produced from the project, and social media reach of posts related to the project. That can help you determine if your reporting made it to the audiences you were hoping to reach.

Finally, you can look at the impact of the reporting. I consider output and reach part of the impact of a collaboration, but if you can also determine changes that were a result of the reporting -- anything from a meeting held to a person arrested to a law changed -- that's success.”

Read ProPublica’s Guide to Collaborative Data Journalism here.

Our next conversation

From corporate watchdogging to political corruption, there’s plenty of stories to be found by digging into money trails. In our next edition, we’re looking for your advice on nailing ‘follow the money’ investigations. Comment with your tips!

As always, don’t forget to comment with what (or who!) you’d like us to feature in our future editions.

Until next time,

Madolyn from the EJC Data team

APIs for journalism

Walid Al-Saqaf — Wed, 25 Sep 2019 12:05:00 +0200

There’s a three letter acronym you’ve probably seen referenced in newsroom methodology documents, or listed as a data source option on various websites: API.

But what does API mean? And why should journalists take notice?

The acronym stands for ‘Application Programming Interface’ and, while they take many forms, APIs are most commonly used by journalists for querying and pulling data from a website’s internal database -- perhaps to power a data visualisation, or to populate a news app.

Despite these use cases, our research for this edition revealed that many of you still haven’t experimented with APIs in your reporting...for now at least. To get you inspired, here’s six ways to use APIs for journalism.

How you’ve used APIs

1. To analyse social media data

Twitter’s APIs were by far the most commonly used by the journalists we spoke to. Whether it’s to perform an issue-based sentiment analysis, or mine comments by politicians, these APIs offer rich pool of data to query.

Take this example from Aleszu Bajak, Editor of Storybench:

“Together with journalism + data science graduate student Floris Wu, I performed an analysis and wrote a piece for Roll Call that used Twitter data to disprove the mantra ‘When they go low, we go high’ uttered by Democrats in the lead-up to the 2018 midterm elections. I used a sentiment dictionary and R while Floris used a Fast.ai sentiment model and Python to arrive at our results -- which they let us publish as a scatterplot built almost entirely using R's ggplot2 package! We used Twitter's API for this project – accessed through Mike Kearney's easy-to-use rtweet package.”

Or these projects, from Hamdan Azhar of PRISMOJI:

“...we’ve used the Twitter API to study popular sentiment in response to current events. Specifically, we’ve found that the most commonly used emojis in tweets about a given topic (e.g. the 2016 election, or the Taylor Swift - Kanye West dispute) often provide a visually appealing and intuitive roadmap to understanding broad trends in society.”

To get started, PRISMOJI offers a tutorial for exploring emoji data using R and the Twitter API here.

And if you don’t have coding skills? While J-school lecturer Walid Al-Saqaf impressed that these skills are important for getting the most out of APIs, there are tools available that let anyone extract data from APIs, regardless of skill level. Walid himself has worked on an open source tool called Mecodify, which relies heavily on Twitter's APIs to fetch details of tweets and Twitter users, and can be used by anyone interested in uncovering trends in social media data.

Don’t forget: Twitter data is also likely to be personal data. Before you get started with their APIs, be sure to read up on the ethics of publishing Twitter data in your stories.

2. To build journalistic tools

Like Mecodify, there are plenty of other journalistic tools that help reporters access the power of APIs. We were lucky to speak with the team over at Datasketch, who use the Google Sheets API in their data apps.

“Data comes in different flavors, flat files, json, SQL. But the one data source that we have consistently seen in many newsrooms are Google Sheets. Journalists need to manually curate datasets and a collaborative tool like Google Sheets is appropriate. Unfortunately, it is not so easy for non-advanced users to use Google Sheets for data analysis and visualisations. This is why we use the Google Sheets API to allow journalists who do not have advanced data visualisation skills to connect a spreadsheet to our data visualisation tool -- an open source alternative to Tableau -- and generate many different charts and maps with a few clicks,” Juan Pablo Marin Diaz explained.

This GIF illustrates how Datasketch’s data visualisation tool integrates with the Google Sheets API. Help fund the tool’s development here.

3. To pull from media reports

While newsrooms are consumers of APIs, most of them also offer APIs for others to consume as well. Over at The Pudding, Ilia Blinderman and Jan Diehm found a fun way to use APIs from New York Times (NYT):

“We used the NYT Archive API for our A Brief History of the Past 100 Years project, which provided us with an unparalleled look at the issues that were most important to the journalists in decades past. While the archive doesn't allow developers to pull the full text of all articles, it does allow us to search the article contents and retrieve the metadata, as well as some of the text, from the NYT's 150+ year archive. It's a telling look at the issues of the day, and provides a helpful glimpse of the concerns that dominated the national conversation,” Ilia told us.

The API shows that the word ‘terrorist’ peaked in headlines during the 2000s, but has featured in many others throughout the decades.

4. To make maps

When it comes to making maps, APIs can help by pulling useful geographic data. In one of our favourite projects, Wonyoung So used APIs to map the distribution of citizen cartographers in North Korea.

“North Korea is one of, if not the most, closed countries in the world from diplomatic, touristic, and economic standpoints. Cartographers of North Korea aims to discuss how collaborative mapping strategies are used to map uncharted territories, using North Korea as a case study. OpenStreetMap (OSM) enables ‘armchair mappers’ to map opaque territories in which local governments control internet access to its residents. The project tackles the questions of who is mapping North Korea, which tools and methods the contributors use to have access and represent the country, and which are the motivations behind such mapping endeavor,” he said.

“This project heavily relies on the APIs provided by the OSM communities. The OSM data for North Korea was downloaded in October 2018 using Geofabrik’s OpenStreetMap Data Extracts, a service that breaks down OSM Planet data to country level and updates it daily. Contributors' activity can also be estimated by means of OSM changesets, which are a history of each user’s past contributions, and it can be retrieved by OSM API. Using these changesets, one can see which regions other than North Korea contributors have also worked on.”

The OSM API revealed the geographic locations of interesting locations throughout North Korea, including this nuclear test site.

5. To determine the sex of a name

Back in 2013, Alice Corona, co-founder of Batjo, started using APIs to examine the ratio of male/female Oscar nominees and winners.

“I would scrape the list of Oscar nominations and winners, and then use the Genderize.io API to help me determine the sex of each name. For each name, the API would return the most likely sex of the person associated with the provided name, together with a probability estimate (number of occurrences of the name in the database + % of such occurrences associated with the most recurring sex).”

But APIs aren’t foolproof, she warned: “While it sounds very clean and quick, a lot of manual work and fuzziness was still involved. For example, the API worked very well with English names and people, but was pretty clueless with foreign names...‘Andrea’ had a high chance of being a female name (in English), while it is mostly a male name in other languages (Italian). So the API provided a first classification, but then a lot of manual work was involved in the verification.”

6. To follow the money

The Financial Times’ recent investigation, Extensive ties found between Sanjeev Gupta bank and business empire, raised questions about the independence of a bank owned by the metals magnate Sanjeev Gupta. David Blood talked us through how they used APIs to follow the money:

“The core of the story was our discovery that, of the bank’s security interests registered with Companies House in the form of ‘charges’, almost two-thirds were held against Gupta-linked companies.

A charge gives a creditor security over the assets of a debtor. Banks often use charges to secure collateral for loans or other credit facilities. In the UK, most types of security interest must be registered with Companies House, the corporate register. However, Companies House doesn’t provide functionality for searching for charges. In order to find all the charges held by the bank, we had to scrape the Companies House API for all charges held by all 4.4m active UK companies.

I wrote a series of Python scripts for scraping data and retrieving PDF documents from the API. I used pandas in a Jupyter notebook to load and filter the data and identify the charges held by the bank. The Companies House API is quite well documented, which is not always the case, unfortunately, but was certainly helpful in reporting this story.”

Our next conversation

Last month, ProPublica launched their Guide to Collaborative Data Journalism, revealing their secrets from successful partner-based projects like Electionland and Documenting Hate. Joining to answer your questions about the guide and building data journalism coalitions, we’ll have Rachel Glickhouse with us in our next edition. Comment with your questions!

As always, don’t forget to comment with what (or who!) you’d like us to feature in our future editions.

Until next time,

Madolyn from the EJC Data team

AMA with Code for Africa

Emma. L N. Kisa — Wed, 11 Sep 2019 12:05:00 +0200

After our trip to Southeast Asia last month, we started thinking about what data journalism looks like in other regions as well...like, say, Africa?

And what better way to find out, than by letting you quiz the data team from Code for Africa!

Code for Africa, or CfA, runs the continent’s largest network of civic tech and open data labs, with teams across Cameroon, Ghana, Kenya, Nigeria, Senegal, Sierra Leone, South Africa, Tanzania, and Uganda. Their projects focus on empowering citizen engagement, including by uplifting data literacy in the media and opening up data for better use by the region’s journalists.

Here’s what they had to say.

What you asked

Africa is a data-scarce region, with little really granular local data. How do you do data journalism when you're often working in a data vacuum?

Justin Arenstein, founder and CEO: “CfA has projects like sensors.AFRICA and africanDRONE where we work with newsrooms and with partner communities to create hyperlocal data for everything from air and water pollution, to mapping data to better understand flooding, climate change, etc. Additionally, we have Sea Sensors which is pioneering the use of hydrophones (underwater microphones) to track illegal dynamite fishing.”

africanDRONE supports the use of drones to highlight civic issues. Credit: Johnny Miller, africanDRONE.

“We do have other data-creation initiatives, such as gazeti.AFRICA where we turn 'deadwood' government records into digital data and sourceAFRICA where we help investigative newsrooms digitise their source documents.”

Likewise, what does data journalism look like on a continent where many people don't have access to broadband internet or smart devices: is it data visualisation, or something else?

“Code for Africa builds data driven news tools, like Ezolwaluko and HealthTools, which underpin our data driven reportage and aim to give audiences actionable information so that they can turn their insights from the stories into real-world decisions. These tools are also accessible to people on SMS. CfA, therefore, does data journalism for text audiences too.”

How do you deal with conflicting data sources?

Emma Kisa, data journalist and fact-checker: “For conflicting data, we consider the source. Official government sources for census data, demographic health surveys and ministerial reports take precedence, as well as trusted data collectors such as the World Bank and the UN. However, if both sources are official and trusted, then we consider the latest data released.”

With your team split between various countries, across multiple time zones, how does that work from a production point of view?

“Our team uses multi-platform messaging apps and visual collaboration tools like Slack, Trello and Github to coordinate our work, with daily stand-ups, agile sprints, and the occasional audio or video calls. We avoid emails because of overflowing inboxes which have a mixture of internal and external messages. The apps gather messages in channels, encourage [real-time] brainstorming, enable rapid communication, reduce email and allow third-party integrations hence multiplying their use.”

Based on your training work, what’s the best way for newbies to get started with data reporting?

Soila Kenya, data journalist, fact-checker and trainer: “Microsoft Excel or Google Sheets might be a good place to start stretching your data analysis muscles. It can be intimidating to see all the coding languages on offer for data journalists such as Python and R. Spreadsheets can be a stepping stone to using those advanced programs and you can still analyse and visualise data to add into your reporting. Check out this simple course that Code for Africa built where you can learn how to use spreadsheets as a journalist.”

Jacopo Ottaviani, Chief Data Officer: “There are plenty of resources available online to get your hands dirty with data reporting, even if you have no familiarity with coding or computing. For example, the Data Journalism Handbook is a good start. Many data journalism tools such as Datawrapper, Flourish or Workbench do not require coding skills.”

How can data journalists outside of Africa better report on the region?

Soila Kenya: “Collaboration is key. A lot of context is lost in datasets and maps and numbers. Journalists on the continent can help provide the nuances surrounding the topic you are tackling. Data journalism is still journalism and journalism is storytelling. Your story needs to be multidimensional by including the voices of the affected.”

Tricia Govindasamy, Product Manager - Data: “For data journalism projects, you need to make use of a user-centric design. Identify and understand the problem that people in Africa are experiencing, and find a suitable solution for them. An example of this are a health information tool in Kenya and Bornperfect which is a female genital mutilation information site. You need to empathise with the Africans that you are reporting on and for this, there are various user centric design approaches available online.”

HealthTools is a suite of data driven web and SMS-based tools that help citizens check everything from medicine prices and hospital services, to whether their doctor is a quack or not.

Jacopo Ottaviani: “Also read as much as possible about Africa from a variety of sources. For example, we recommend subscribing to the Quartz Africa weekly brief newsletter and maybe read this evergreen piece”.

For more by Code for Africa, follow them on Twitter or explore their projects on GitHub.

Our next conversation

Building on the coding advice from our last edition, we’re curious about what happens when code turns into a powerful story. Or, to be more specific, when an API does. That’s right, for our next conversation, we’ll be highlighting cool examples of journalism based on APIs. Submit your work, or nominate someone else’s, using this form.

As always, don’t forget to comment with what (or who!) you’d like us to feature in our future editions.

Until next time,

Madolyn from the EJC Data team

Learn to code like a journalist

Paul Bradshaw — Wed, 28 Aug 2019 12:05:00 +0200

We’ve all heard it before: Coding has become an essential skill for newsroom journalists. Yet, despite the demand for these skills, many of us are hesitant to get started, either because of an aversion to numbers or the seemingly huge task of grasping a new language.

We think it’s time to strike down that fear.

In this edition of Conversations with Data, we’ve collected practical advice from a group of global experts, so that you can begin coding...like a journalist!

What you said about getting started

Almost all of our contributors agreed that the best learning strategy is to pick a project, rather than teaching yourself coding for the sake of it.

Why? “Because it will quickly seem a bit pointless and abstract,” said Paul Bradshaw, course leader of the MA in Data Journalism and the MA in Multiplatform and Mobile Journalism at Birmingham City University.

“Try to pick a project which is relatively simple, skills-wise, and then move on to new projects which add increasing complexity in the areas you're interested in.”

Ændrew Rininsland, FT, reiterated this advice: “Depending on your learning style, you'll likely get much further looking up specific questions that lead towards building a discrete project than you will by trying to memorise an entire Python textbook”.

You’ll still “need to learn the basics”, said Rádio Renascença’s Rui Barros. Try, for example, setting up a ‘Hello, World!’ program and then jumping into a small project that you can publish.

Often used to illustrate the basic syntax of a programming language, ‘Hello, World!’ requires the novice learner to create a simple program that outputs that text.

However, taking on a project doesn’t mean you will immediately accomplish your goals, Rui warned.

“You will not. You will dream big, but the lack of knowledge will always be a roadblock. Should that stop you from trying again? Of course not. Ira Glass said it about storytelling, but I think it fits perfectly with my life as a journocoder: ‘for the first couple years you make stuff, it’s just not that good. It’s trying to be good, it has potential, but it’s not. But your taste, the thing that got you into the game, is still killer. And your taste is why your work disappoints you’. Learn how to balance the dissatisfaction of something not being as great as you wish with the pride of accomplishing something.’”

Likewise, Kira Schacht, from Journocode and Deutsche Welle, told us to embrace mistakes and use them to refocus on your project’s goals.

“Don’t be afraid of messy code the first -- or the fiftieth -- time you try. There are many ways to solve a problem, so what if it takes you a few detours to find yours? The results of dead-end approaches often turn out to be useful for future projects anyway. Especially as a beginner, though, it can be easy to lose your goal in the woods of tutorials, functions, and data structures. Once an hour, take a deep breath and ask yourself: What exactly was I trying to do again?”

After you’ve built up some skills, then start teaching what you’ve learnt to others. Even if you don’t have someone to teach, Rui Barros suggested using Rubber Duck Debugging tactics, or you could create your own tipsheet, as recommended by MaryJo Webster from the Star Tribune.

“Writing down how to do something or telling it to someone else helps you realise the gaps in your knowledge and also helps it stick in your brain. If nothing else, make a repository of code snippets and put it in a place you can easily find. Mine is just a text document (not exactly ideal, but it gets the job done) where I put in things like ‘How to import an Excel file’ followed by a line or two of code that is either an example from a previous project or is a generic piece of code that will help me remember the syntax for doing a particular task,” MaryJo explained.

Geoffrey Hing from APM Reports seconded this advice, adding “that one of the biggest advantages to this is that it acts as a sort of Rosetta Stone for the language you use to describe what you're trying to do and the language used by other practitioners. Often when I'm searching the web for resources on how to do something, or asking others for help, the biggest impasse is that I'm using different language. In my snippet library, I try to reference both the language that feels natural to me (so I can search for it easily in the future) and the language used in the resources I eventually found.”

Similarly, you can also blog about the things that you’ve learnt.

“People generally get nervous about the idea of writing about a topic that they're not experts about. But hey, we journalists do it all the time! Writing about a new skill you learned has two positive effects: it forces you to think through all the angles of something you've learned and make sure you really understand it, and it documents that knowledge for the future. If you get something wrong, someone will (hopefully gently) point it out to you ... and you'll learn more!” explained computational journalist Ben Hancock, with reference to Sachua Cha's No Excuses Guide to Blogging.

What you said about choosing the right programming language

Now that we’ve covered the foundations of getting started, let’s look at picking the right language.

First up, the data team at Code for Africa highlighted the idea of ‘focussed learning’. Remember that you don’t need to learn everything, just what’s relevant to you. So, think about the tasks at hand, such as scraping and crawling or plotting and visualisation, and look for the best tools to get those jobs done. Here are some of their suggestions:

“If you want to use code for scraping/data analysis, learn that, focus on just the libraries you need; for data analysis, Numpy and Pandas are great and for scraping -- Scrapy (all from Python). Our recommended coding languages would be Python and R, and learning resources would be edX.org and Datacamp where a number of courses are free.”

Journo-dev Laurence Dierickx backed up this advice: “Before diving into the code, ask yourself which programming language will suit what you would like to do the most. If it is to deal with data, maybe you should first learn R or Python, then SQL. If you would like to develop interactive contents, PHP and MYSQL would probably be more relevant.”

For journalists specifically, Hack Oregon’s Ed Borasky said that there are two languages that you'll probably need to read: “Python and JavaScript/HTML/CSS. Other languages are either specialised (R, for example) or too low-level (Java, C/C++, Go, Rust) to be of much use in a busy newsroom”.

Paul Bradshaw also recommended JavaScript for visualisation or interactivity, Python’s lxml or Beautiful Soup libraries for scraping, and either R or Python's pandas library if you're interested in data analysis.

That said, he also reminded us that “all the main languages (JavaScript, Python, R) can do the same things, it's just that some tend to be used more for some things than others. And once you've learned one language it is a lot easier to pick up another. If you don't get on with one language it may be worth trying another until you find one that you get on with best.”

By now, you should have an idea of some languages to explore. To get experimenting with them, try these tips from MaryJo Webster, which she used to first learn R:

“...pick an analysis you've already done using another tool and replicate it in whatever programming language you are trying to learn. Under this scenario you don't have to learn a new dataset and/or a new topic, freeing up your brain to focus on learning the new programming language. And you don't have to go through the process of figuring out what to do with the data -- you already know the steps. I did this when I was first learning R. I had several analyses that I repeat every year -- like school test scores and an annual look at our housing market -- and had previously done in SQL or Excel, depending on the difficulty. I had notes outlining the steps in my analyses and I just had to figure out how to do the same thing in R. It made the learning process so much easier.”

For more standalone resources, check out Paul Bradshaw’s primer on programming concepts and computational thinking, along with these tipsheets (here and here) from Cailtin Ostroff of the Wall Street Journal. And don’t forget our video course, Python for Journalists, taught by NOS’ Winny de Jong.

Our next conversation

When Code for Africa submitted their coding advice, it got us thinking about how data data journalism is practiced across their region. The team operates Africa’s largest network of civic technology and open data labs, and we’re excited to have them with us for an AMA in our next edition. Be sure to submit your questions by commenting below.

As always, don’t forget to comment with what (or who!) you’d like us to feature in our future editions.

Until next time,

Madolyn from the EJC Data team

AMA with Kuek Ser Kuang Keng

Kuang Keng Kuek Ser — Wed, 14 Aug 2019 12:05:00 +0200

Helo! Apa khabar? Welcome to our 33rd edition of Conversations with Data, where we’ll be taking a trip over to Malaysia with Kuek Ser Kuang Keng.

Previously an alumni of Malaysia’s leading data journalism outlet, Malaysiakini, and now founder of DataN and co-organiser of Hacks/Hackers Kuala Lumpur, Keng’s got plenty to share from both the region’s traditional outlets and grassroots communities.

Kuek Ser Kuang Keng.

What you asked

What is the data journalism scene like in Malaysia, and how does it compare to other countries in Southeast Asia?

“Data journalism has caught the attention of many journalists and newsrooms in Malaysia but it is still in a nascent stage in terms of practice. Besides news website Malaysiakini, which set up its news lab team last year to experiment with data and visual storytelling, other media organisations are yet to allocate resources for data journalism.

There's still a huge gap between Malaysia and other countries in this region including Indonesia, the Philippines, Singapore, Hong Kong, and Taiwan, where the practice of data journalism has been going on for several years. This also means there are ample opportunities for Malaysian journalists.”

How do community driven organisations, like Hacks/Hackers, benefit the practice of data journalism?

“Through Hacks/Hackers Kuala Lumpur, we were able to bring together talents from different disciplines, including data scientists and programmers, to discuss the future of journalism with journalists. This is something local journalists have never experienced before. We brought in data journalists from both within and outside of Malaysia to share their experiences and projects. We also held an event to connect senior government officials in charge of Malaysian open data policy with journalists and other data users to talk about how they can build the open data ecosystem together. For Malaysian journalists, this is a unique platform for introducing them to technologies, tools, resources, ideas, and talents that can improve their reporting and journalistic products. We have also successfully built a multidisciplinary community that has a shared passion for enhancing journalism through innovation, technology, and a culture of knowledge sharing.”

Discussing Malaysia’s open data journey at Hacks/Hackers Kuala Lumpur.

What are some regional topics and data sources that you recommend for journalists to integrate simple data analysis into their work?

“Regional topics that can be better reported with data include labor migration, human trafficking, deforestation, and environmental issues like the annual haze and illegal trading of endangered species. However, the accessibility of data is still a huge challenge for journalists in this region as many governments here are also struggling with their own digital infrastructures and a lack of meaningful open data policies.

Cross-border collaboration among journalists and newsrooms in the region, such as this one, is one of the ways to overcome this challenge. International data portals like the World Bank Open Data Portal, Migration Data Portal, and Global Forest Watch are good data sources when governments don't share data. For more regional specific data, you might want to check out the Asian Development Bank Data Library.”

What are some of your favourite examples of local data journalism and why?

“In Southeast Asia, media organisations that have used data extensively in their reporting include Malaysiakini from Malaysia, Katadata from Indonesia, Straits Times from Singapore, and Rappler from the Philippines. I would like to highlight two projects from them.

The first is Rappler's #SaferRoadsPH campaign that won the Data Journalism Website of the Year in the 2018 Data Journalism Awards. It is not just a data journalism project. It combined both online data driven reporting and an on-the-ground campaign to drive real impacts in local communities.

Secondly, there’s the May 13, Never Again multimedia project by Malaysiakini. It revisits the racial riots in 1969 by weaving together geolocation data, historical images, archival documents, and never-before published testimonies in an engaging multimedia package.”

Are there any up and coming journalists from within your network that we should be following?

“Lee Long Hui, who leads Malaysiakini's Kini News Lab, Mayuri Mei Lin from the BBC World Service's East Asia visual journalism team in Jakarta, Gurman Bhatia from the Reuters Graphics team in Singapore, and Rebecca Pazos from Singapore's Straits Times. Yan Naung Oak, a Singapore-based Burmese dataviz developer, who works with media in Myanmar, and Thiti Luang, who founded a data startup in Bangkok, are also worth watching.”

And, finally, we had one reader from Johor asking how they can make money in journalism?

“I've spent almost 15 years of my youth in journalism and I can say that a career in journalism rewards you with much more satisfaction that money can give you.”

Our next conversation

Like learning a new language, getting started with coding can be daunting. But that’s no reason not to try! Do you have any tips from your first programming experience? Or insights to share on dappling with code for the first time? Help out your colleagues by commenting with your advice.

As always, don’t forget to comment with what (or who!) you’d like us to feature in our future editions.

Until next time,

Madolyn from the EJC Data team

AMA with The Economist's data team

Marie Segger — Wed, 31 Jul 2019 12:05:00 +0200

In 1843, The Economist’s inaugural edition went to print with a table front and centre. Clearly ahead of his time, the editor of the day recognised the power of data journalism over a 100 years before the field’s modern emergence.

Almost 176 years later and the outlet’s appetite for data driven stories is still going strong. In 2015, they brought in a specialised data team and, this year, they launched a dedicated data section in print.

To find out more about The Economist’s affinity with data, we let you pose your burning questions to the data team themselves. Here’s what they had to say!

What you asked

Let’s start with your data visualisations. How do you decide what to use in your publication?

Matt McLean, Visual Data Journalist: “Mainly by looking at the data. The data structure and type will suggest certain visualisation devices (bars for stock, lines for flow; scatter to show a relationship between two variables etc), but it is always worth experimenting to see if you can find new and engaging ways to visualise the data that go beyond standard chart types but still preserve the integrity of the numbers. We also have to consider what will work in the space available and make sure there is enough variation to keep things interesting -- no-one wants wall-to-wall bar charts.”

The team uses varied types of visualisations in their daily charts to engage readers with the data at hand.

You’ve being using GitHub to share data from your Graphic Detail section, and it is much appreciated! Will you eventually be sharing the data and/or code from your daily charts as well?

Evan Hensleigh, Visual Data Journalist: “We’re glad people are using the data we’ve published on GitHub -- it’s been really great to see our data reused in new and interesting ways. Unlike our Graphic Detail pages, our daily charts usually use data that is already generally available, and don’t involve a lot of original analysis, so it doesn’t make sense for us to re-publish the data. But in those cases where our daily charts do involve us creating an interesting dataset, we may start publishing those datasets for other people to do more work on.”

And what other open data steps will you be taking this year?

“We’re looking for ways to make our data easier to access and use. We’re currently working on creating an R package with our publicly available data that will make it easier for people to access our data, mimic our styles, and use things like the Big Mac Index in their own projects.”

What is your dream dataset (that you don't already possess)?

Marie Segger, Data Journalist: “Generally we try to be imaginative and if we don’t have the data we need, we try to get it by either asking for it or by scraping the web as one of our team members has described in an article on Medium. Personally, I think that the relatively new Missing Numbers project that highlights data that the UK government should be collecting but isn’t, is a great place to look for interesting ideas.”

James Fransham, Data Correspondent: “I think Raj Chetty, the Princeton economist, has done incredible things with microdata, constructing longitudinal studies using data and methods that no other social scientist thought possible. His website has a wealth of information, and I would love to have the keys to public data that would allow us to mimic his research outside of America.”

G. Elliott Morris, Data Journalist: “I would love to get my hands on Barack Obama’s political data operation, as unveiled in Sasha Isenberg’s The Victory Lab. As described, the data contained voting records for hundreds of millions of Americans and was compiled by on-the-ground staffers who went door-to-door asking them about anything from their political preferences to dreams for society. It is the biggest of political ‘big data’, and we could use it to improve our already-stellar modelling interactive.”

A screenshot from the team’s interactive statistical model, which predicts voting preferences based on different factors.

On the topic of modelling, how do you build predictive models for sustainable development using the SDGs framework?

James Fransham: “A great question. I was in Ghana recently with the UN Foundation exploring how the government there is measuring poverty for the SDGs. The scale and level of ambition of the government is impressive: attempting a fully-digitised census next year; produced a digital postcode for every 5mx5m piece of land; and a biometric ID for each citizen. Nonetheless, data gathering remains a time consuming, expensive and exhausting process. How we build a model that could anticipate Ghana’s progress is meeting the SDGs? I think first we grab every bit of historic data we have about the country, and other similar countries at different levels of development, throw it into a big soup, and hope that something tasty comes out!”

You can keep up-to-date with The Economist’s data team and their latest work here.

Our next conversation

From an established outlet, to something more grassroots, for our next conversation we’ll be heading down to Southeast Asia. The region is booming with data journalism communities, emerging through meetups and local learning groups. To hear all about it, we’ll be hosting an AMA with Kuek Ser Kuang Keng, founder of DataN and co-organiser of Hacks/Hackers Kuala Lumpur. Comment with your questions, or to let us know what you'd like to see in future editions!

Until next time,

Madolyn from the EJC Data team

Bad charts

Alberto Cairo — Thu, 18 Jul 2019 07:01:00 +0200

Data visualisation can be dangerous -- people inherently trust a map or chart. But, as we know, all maps lie and charts can be just as tricky. While that doesn’t mean that the truthful journalist should avoid visualisation, there are some common mistakes to watch out for.

Welcome to our 31st edition of Conversations with Data, featuring visual faux pas that other data journalists made, so you won’t have to.

What you said

Perhaps it’s fitting to start with some sound advice from Alberto Cairo, whose new book How Charts Lie deals directly with this issue: “...paying attention to charts is critical to understanding them. Many readers -- and this includes many journalists -- believe that charts can and should be understood easily and quickly, as if they were mere drawings. This is a dangerous assumption. Charts need to be read carefully, and this includes the descriptions and documentation their creators provide -- or should provide.”

Alberto provided us with the above graphic, by Our World In Data's Hannah Ritchie, as an example of what he means. It's well designed, but it can be easily misinterpreted if we don't read the story where it is inserted and the numerous caveats it contains.

Klaudia Gal and Birger Morgenstjerne, from the infographic agency ferdio, also sent through a couple of best practices to prevent visual blunders:

Avoid junk charts: “Chart junk refers to visual elements in charts that are not necessary to comprehend the information, or that even distract the viewer, or skews the depiction and makes it difficult to understand.”
Remember the rules of data visualisation: “It is great to come up with unique and creative solutions, but in most cases it is best if you go with the established rules. The audience is used to certain ways of reading data visualisations, it is your job to decide if there is space for you to challenge these ways.”

Now that we’ve got the basics down, let’s turn to some more specific advice on particular chart types. Like many of us, Cole Nussbaumer Knaflic is a fan of bar charts: “they are familiar and easy to read. However, they must have a zero-baseline! Because we read the length of the bars relative to each other and the axis, we need the full bar to be present in order to make an accurate visual comparison.”

Jon Schwabish, from PolicyViz, agrees. He sent us the below example of what-not-to-do.

“This bar chart appeared on Fox News in 2013 and violates a data visualisation rule that nearly everyone agrees on: Bar charts must start at zero! Here, the top marginal tax rate in the United States was potentially slated to increase from 35% in 2012 to 39.6% in 2013. By starting the chart at something other than zero (here at 34%), the difference between the two numbers is overemphasised.”

When it comes to maps, journalists need to be just as wary. Let’s take a look at the following maps provided by Highsoft’s Mustapha Mekhatria.

“The first map displays Canadian provinces where members of the First Nations make up the majority of the population (in green). Given the large geographical area of these provinces, one might infer that a sizable part of the Canadian population is First Nations. However, the fact is that First Nations members make up only 5% of the total population. Unless the focus of the data story you are trying to tell is the geographical size of these provinces, you should complement your chart with an explanation, ideally also in chart form (see the second map), or don’t use the map,” he explained.

Likewise, Kaiser Fung of Junk Charts fame, also critiqued a map of China’s ‘digital silkroad’, which plotted the wealth of each province (measured by GDP per capita) and the level of internet penetration in geographic form.

“Irrelevant text, such as names of surrounding countries and key cities, expands the processing load without adding insights,” he warned.

As an alternative, he suggests conveying the correlation between the two variables in a grid format, with limited text to avoid overcomplicating the visual’s message.

And, finally, beware the humble pictogram. Severino Ribecca, creator of The Data Visualisation Catalogue, reminded us that because pictograms are typically perceived as displaying quantities, other uses can tell a misleading story.

Take the following pictogram, for instance.

“This pictogram chart on colour blindness is confusing because makes it looks like more women have the condition because of the use of more female icons than male ones. However, it's tricking you with statistics as 1 in 12 is way more probable than 1 in 200. If we calculate the percentages you get: (1 ÷ 12) x 100 = 8.33% men have colour blindness; (1 ÷ 200) x 100 = 0.5% women have colour blindness,” he told us.

Want to learn more? Jonathon Berlin covers data visualisation and mistakes he’s made in our DataJournalism.com course here.

Our next conversation

It’s been a while since we last hosted an AMA! So, in our next edition, we’ll have The Economist’s data team with us to answer your burning questions. Ask them about building statistical models, how they’re trying to make their data more open, or anything at all!

As always, comment to let us know what you’d like us to feature in our future editions.

Until next time,

Madolyn from the EJC Data team

Under-reported news

Christina Elmer — Wed, 03 Jul 2019 08:30:00 +0200

If a story isn’t reported, did it even happen? Too often, what’s reported dictates what matters, often leaving affected communities and individuals without a voice.

But these stories don’t need to remain invisible. Data reporting offers a unique opportunity to dig below the status quo and shed new light on a situation.

In this 30th edition of Conversations with Data, we’ll be highlighting examples where journalists in our community have done just that.

Before we get started though, SPIEGEL ONLINE’s Christina Elmer sent through some general tips on bringing untold stories to light with data:

“In my experience, a big challenge is to attract attention and tell stories that truly affect people. Data journalism can raise awareness or present context in detail, but it also has to connect to the reality of our readers' lives in a concrete way. With under-reported topics, that is especially challenging. From my point of view, a holistic approach is advisable in such cases, linking data-based investigations with conventional reporting, especially with multimedia content.”

Under-reported issues you reported on

1. Women’s land rights

In many African countries, if women become widows and they don’t have any male descendants, they risk losing everything they own in favor of another man from their husband’s family, even if they’ve never met before. But a practice in northern Tanzania, called ‘Nyumba ntobhu’, offers some women a solution. It allows older widows, without male descendents, to marry a younger woman who does. Marta Martinez, also an EJC-grantee, told us about digging deeper into this under-reported story.

“We used paralegals on the ground to tell us about how many female marriages there were in 10 different villages and it turned out that the number of female marriages now represented over 20% of households,” she explained.

“I'd say that, even if it's at a very small scale, like we did in Tarime district in Tanzania, relying on local NGOs, it is worth trying to gather some data that will make your story factually stronger, and will actually allow you to monitor the situation over time. For example, if we wanted to continue our reporting in the future on this issue, or if someone else wants to expand this to another country to compare trends, having the data would allow journalists, academics, researchers, anyone interested, really, to build on that knowledge and expand it -- so everyone benefits!”

2. Access to drugs and unethical practices

Access to essential drugs and shortages of cheap medicines is still an under-reported issue in many localities. Keila Guimarães, from the Penicillin Project, told us more.

The project, which investigates a long worldwide shortage of penicillin, presented challenges both because of the global scale of the medicines' trade and supply chain, as well as the scarcity of open data sources on the drug industry.

In the end, “it required thorough examination of documents from various medicines' regulatory agencies -- from the US to Brazil to China -- interviews with dozens of experts throughout the world, analysis of databases from the World Health Organization, the Global Burden of Disease Study, the UN Trade, and various other data sources. The feeling we had at the time was that relevant information to bring light to this issue was scattered all over and our work was not just to collect it, but also merge, combine and create meaning out of it.”

Looking back, her advice for other journalists reporting on similarly hidden stories is to “look for the data, but connect with people”.

“Every expert I spoke with opened doors to another expert and that chain was crucial in helping me to better understand the documents and the data I found during my research phase.”

Eva Belmonte is another data journalist working on the medicine beat for Civio -- this time, focussing on the pharmaceutical industry’s financial relationships with Spanish doctors. Here’s what she had to say:

“The issue has been discussed in the media, but in a very superficial way, just repeating what the pharmaceutical companies spinned: news articles were based entirely on press releases, mentioning the overall amount paid, without context and without further breakdown. We analysed the data thoroughly to explore, first, how transparent each company was in their disclosure (spoiler, very little), denounce that doctors who receive more money are less transparent and compare the money spent on Spanish doctors vs those in other countries. A year later, we published for the first time the names of those doctors receiving the largest amounts (sometimes bigger than their annual salaries) and we investigated whether or not they declared their conflicts of interest. In both cases, six people had to work for a few months to convert very cumbersome PDF documents from 145 pharmaceutical companies, which involved both custom conversion scripts and a lot of manual cleaning work.”

Turning to another sinister drug-related story, Anuradha Nagaraj, the Thomson Reuters Foundation (TRF)’s South India Correspondent, uncovered a dangerous practice where garment workers were given unnamed drugs for period pains. The aim was to keep production lines running, but in the end more than half of these women experienced serious health issues.

It was a difficult story to establish, Anuradha told us: “it took more than a year to generate data through detailed interviews of 100 women in Tamil Nadu to establish a trend of illegal pills being given to young workers to keep production lines running.”

Her tip: “These stories take time and persistence and endless digging –- but they do pay off if they work out.”

3. Gun violence and police killings

Back in 2016, Fabio Teixeira, TRF’s Brazil Correspondent, stumbled upon an important lead, which would eventually help shed light on a trend of police killings in Rio de Janeiro.

It was a list of all killings between 2010 and 2015 and, after requesting the spreadsheet it was based on, he found it included the names of all the police involved.

“Eleven months of data work and investigation later, I reported for O Globo that 20 policemen were involved in the deaths of 356 people -- 10% of all people killed by police during that period. Data for under-reported issues is hard to come by, especially in third world countries. If someone hands you a list, try to access the database behind it. You never know what you may find!”

Anastasia Valeeva, Co-founder School of Data Kyrgyzstan, also shared a great example of data reporting in this area: The True Cost of Gun Violence in America by Mother Jones.

“This is a story based on the absence of data, rather than on its analysis. The attempts to estimate this cost and the evidence of the truth being hidden, under-reported for a purpose, is what makes it so powerful,” she explained.

“I believe telling under-reported news is one of the magic features of data journalism since it is only with the help of data that you can reveal systematic wrongdoings of a system that go unnoticed, sometimes for years, and with these findings, we can tell invisible stories.”

4. Feeding the world

To date, very few journalistic investigations have dug into what drives the agricultural trade model that feeds the world. That is, until The Enslaved Land project came along.

“In 2016, El Diario and El Faro joined forces to investigate five crops (sugar, coffee, palm oil, bananas, cocoa) consumed widely in Europe and the US and we revealed how land property, corruption, organised crime, local conflicts and supply chains of certain products are still part of a system of colonialism,” said Raúl Sánchez, one of the EJC-grantees behind the story.

Their main learning: “Use the story of a specific crop to explain the global phenomenon and also let the data take you to the focus of the story.”

Also looking at food security, Mirjam Leunissen and Stan Putman from de Volkskrant took an innovative approach to the question: ‘How to feed 10 billion people in 2050?’

Their project, De Voedselzaak, responded to reader suggestions that forced birth control could solve population growth and ensuing food insecurity crises.

How? “We gave them an interactive calculator (zo groeit de wereldbevolking verder) to let them explore what changes in birth rates would actually be needed. Using a variety of data and text, we also explain why it is better to invest in socio-economic improvement than forced birth control to change,” Stan told us.

While this edition has provided a snippet of under-reported stories out there, there are plenty more waiting to be uncovered. To help, the EJC currently has an open call for grants worth an average of €15,000 for French and German-speaking journalists. Find out more here.

Our next conversation

After collecting your favourite charts and maps, we’ll now be turning our attending to those visuals you weren’t so fond of. You know, charts that got it wrong, or ever-so misleading maps. Help us showcase what-not-to-do, by submitting your favourite examples of inaccurate or confusing data visualisations.

As always, comment to let us know what you’d like us to feature in our future editions.

Until next time,

Madolyn from the EJC Data team

AMA with Eva Constantaras

Eva Constantaras — Wed, 19 Jun 2019 08:00:00 +0200

Despite what our name might lead you to believe, the European Journalism Centre is a supporter of data journalism all over the world -- not just in Europe!

Whether it's how data informs fact-checking in Indonesia, or cross-border investigations in West Africa, we want to be sure that our conversations continue to include voices from every corner of the globe.

So, don't be shy! If you're analysing data in Algeria, visualising it in Venezuela, or parsing it in the Philippines, please post here to share your work and advice on future topics.

In the meantime, we've brought in Eva Constantaras, who'll be helping us expand our data journalism horizons in this edition's AMA. Eva is an investigative trainer, with teams in Pakistan, Afghanistan, and Kenya, and author of Data Journalism By, About and For Marginalised Communities in the Data Journalism Handbook 2.

A training session, run by Eva, in Pakistan.

What you asked

What are some of your favourite examples of data journalism, across various continents, and why?

"My favourite projects are those that try to tackle some of the world's biggest challenges: inequality, injustice, and discrimination, through evergreen projects that serve as a cornerstone for beat reporting on the issue, whether it's for journalists within that newsroom or for others.

I'll give you some examples of projects on state and organised crime violence from large and small newsrooms all around the world, where data is used to document and expose violence despite the lack of guaranteed press freedom.

A dónde van los desaparecidos (Where to the Missing Go?) documents the clandestine graves in Mexico and provides the only standardised nationwide dataset about mass graves, which is essential for accountability reporting in the country going forward.

From the project, a time-series map of the almost 2000 clandestine graves discovered in Mexico between 2006 and 1016. Try it live here.

In Asia, there are a couple of projects: Malaysiakini's Death in Custody and Rapplr's continued coverage of Duerte's drug war, despite the president's legal assault on the outlet, are extraordinary because of the scale of documentation that they have achieved during an ongoing crisis, within repressive media environments and despite limited access to police data.

Similarly, Al Jazeera's Yemen: Death from Above and Kenya's Nation Newsplex's Deadly Force Database open opportunities for accountability reporting that -- had they not been based on data -- would have probably been subject to outright censorship or self-censorship, never making it into the news cycle at all. These databases have allowed local media, and not just the journalists who produced them, to report more critically, consistently, and safely on the government and scrutinise their leaders in a way they had been hesitant to do before."

The Deadly Force Database lets audiences explore data to understand more about each person killed by the Kenyan police and the circumstances surrounding their deaths.

In some parts of the world, mis- and dis-information can present real security risks. What are some examples of best practice data reporting in these scenarios?

"One factor that makes people more easily fall prey to mis- and dis- information is a failure of journalists to inform citizens in the first place. Often, misinformation spreads faster because of a lack of credible public interest reporting that grounds citizens' understanding of an issue in evidence. Do citizens know how the economy works, or the education system? How about healthcare? It is not enough just to stop misinformation and fake news; there has to be a credible, data driven news alternative to fill the void.

A lot of data journalists are working to establish a common understanding of the true underlying structural challenges facing society. Issues such as inequality, climate change, labour rights, and gender discrimination are often best understood and solved through the lens of data. And the more journalists working on these issues, the more likely they are to recognise bad data, spot gaps in people's perception and reality, and steer the conversation back to reality."

How do different cultures influence the practice of data journalism?

"A huge difference I see in data journalism teams in the Global South is a willingness to be much more explicit in producing data driven beat reporting. Instead of producing expensive, high profile 'one-hit-wonder' interactives, they relentlessly cover the roots of inequality.

If you look at the landing pages of data driven media outlets such as IndiaSpend in India, The Nation Newsplex in Kenya, La Nacion Data in Argentina, or InfoTimes in Egypt, most of their data stories are about measuring policy progress, not politics, and how bad policies affect people's lives. Out of necessity, they tend to be more resourceful with the technology they use to get the story done, whether that means doing the analysis in Excel, working with a data scientist to scrape Right to Left text and non-unicode data, or partnering closely with civic tech to dive into a new government monitoring database."

Front page stories at the InfoTimes.

So, what can western data journalists learn from these reporters?

“Western data journalists could embrace the idea that a lot of the challenges facing the West are just about the same as the ones facing the rest of the world. Everyone needs to do a better job of covering these issues in a global context, preferably through data driven reporting collaborations with data reporters around the world. Covering climate change, reproductive rights, or immigration by highlighting transnational efforts to roll back rights and regulations would serve audiences much more than the persistent “us” and “them” narrative that has enabled populist politics to thrive. For global coverage of these issues, check out (and add to) Outriders Network database of global interactive and data stories.”

And vice versa?

“Journalists from the rest of the world should keep a close eye on the collaborative data journalism model being pioneered in the US and Europe to get relevant data driven content out to the entire country. Local reporting projects led by the ProPublica Associated Press Shared Data Program, OCCRP, the BBC Shared Data Unit, and Bureau Local make sense in a lot of countries. They maximise resources through a centralised data team that can experiment with new technologies and make locally relevant datasets available to other journalists, working in news deserts where little quality news is available.”

Our next conversation

Like in many of Eva’s favourite projects, we’re always awed by journalists who are able to get difficult data on under-reported news. Because there are so many important and unreported stories out there, our next edition will be highlighting ways that data journalists can help bring them to light.

To get you started, we can think of a few areas where data and reporting are hard to come by: domestic violence, female sanitation issues, and elder abuse. Share your tips for data reporting on any of these issues, or tell us about another topic that needs focussing on here.

As always, don’t forget to let us know what you’d like us to feature in our future editions.

Until next time,

Madolyn from the EJC Data team

Parsing the European Parliament elections

Ashley Kirk — Wed, 29 May 2019 00:00:00 +0200

Come election day, voters are all asked one crucial question: Who are you going to vote for? But behind every vote, there’s plenty more. What does this candidate stand for? Which party do I align with most? And, regrettably in some elections, will my vote even count?

Last weekend, up to 400 million Europeans took to the polls in the 2019 European Parliament elections to answer these questions for themselves. And, as we found out, there were many data journalists working to inform their vote in unique ways. As the results roll in, we took a look at 5 ways that data journalists have parsed the European Parliament election.

1. What does the election look like for me?

This one is key for any voter. In the UK, BBC Online News helped Bristons look beyond a (possibly) impending Brexit, putting together a region-by-region rundown on the country’s candidates. Robert England gave us a behind the scenes look:

“To gather the data we scraped publicly available lists of MEPs and candidates to find out which areas had candidates seeking re-election and which had candidates running for the first time -- potentially facing a very short political career in the EU. We did this by cross referencing the two groups, candidates and sitting MEPs, in Excel to show not only which areas should expect the most new faces, but also which regions had the widest/smallest choice of political parties to vote for.”

“The issue of MEP pay and benefits, a subject extensively covered nationally, was another area we wanted to personalise. Using the UK Office for National Statistics earnings estimates, we broke the question of pay down to a regional level by comparing the wage of MEPs against average wage estimates,” he explained.

BBC.

2. ...So, what about Brexit?

Speaking of the UK, results are now showing that the Brexit Party has dominated the country’s vote. Who would’ve thought? Well, it turns out Ashley Kirk of The Telegraph did. They published a suite of analysis pieces, with stories including a comparative analysis of all pro-Remain parties versus all pro-Brexit parties in the UK, a revelation that Theresa May's Conservative Party could slump to their lowest vote share since they were formed in 1834, and a look into how establishment parties across the EU are forecast to lose seats.

“When it comes to presenting these, we've paid particular attention to how we can help our readers understand each EU-wide bloc in the Parliament and where they sit on the political spectrum. On a UK level, we have been analysing how the two establishment parties are struggling to hold their ground, with the advance of the Brexit Party,” he wrote.

The Telegraph’s polling analysis, indicating a high chance of Brexit Party votes.

3. What can we do to prevent filter bubbles?

From Brexit to filter bubbles (or is it the other way around?), the folks over at Talking Europe came up with an ingenious way to break through echo chambers and encourage dialogues between voters of different political persuasions. The platform matches people in Europe, each with different political opinions, for one-on-one chats about current issues.

Malte Steuber talked us through their artificial intelligence (AI) techniques, which leverage a matching algorithm based on answers to five yes/no political statements. “Users then get matched to somebody living in another European country and, in the best-case, who provided five different answers,” he said.

But that’s not the only AI they use: “To make it possible for everybody to chat in his/her native language we use DeepL technology to translate the chats in real-time. Users can write and read in their chosen language and select whether they want to see the translation.”

4. How popular are the populists?

According to Bloomberg’s Andre Tartar, “the big question going into the EU elections was whether populists would win enough votes to make a difference”. And there was no shortage of data teams looking into it.

For Andre’s part, they charted the combined populist seat-share rising steadily to 21.7% in 2014 and forecast to reach 29% this time around -- but their analysis wasn’t all good news for populists.

“The flip side of the story is the historic disorganisation of populists and their difficulty holding on to members. To show this we anchored the piece around a scrolling graphic of the political chamber (called the hemicycle) produced using D3. We first experimented with different representations of this, including a 3D model, and zoom on scroll, and how we could use it to walk readers through the story,” he said.

Over at SPIEGEL ONLINE, Marcel Pauly also conducted an analysis of populism in Europe (in English here), showing how populist parties have performed in national parliamentary elections over the past two decades.

SPIEGEL ONLINE.

Here’s a little more about how they did it: “I got the election results from ParlGov, a nice database with data on parliaments and governments for ‘all EU and most OECD democracies’. For the parties’ classifications I used The PopuList, an academic ‘overview of populist, far right, far left and Eurosceptic parties in Europe’. It was initiated by The Guardian, whose data team kindly provided me with a lookup table to merge the two datasets on The PopuList’s party names and ParlGov’s party IDs.” You can also recreate or reuse Marcel’s preprocessing and data analysis, available on GitHub here.

But, of course, the past doesn’t equal the present. And polling data is in constant flux in the lead-up to an election. To capture gains by populists and other parties, Sweden’s Newsworthy put out a monthly, automated analysis of European polls.

It wasn’t easy though. As Jens Finnäs told us, “just categorising the most likely parliament group of each national party is a very challenging task that needs continuous updates”. Their report was only “made possible by the polling aggregation from Politico (formerly pollofpolls.eu)” and their “huge data cleaning and refining effort”, on top of which Newsworthy laid their analyses.

5. Who will win? (c’mon, we had to!)

When Camille Borrett and Moritz Laurer, Co-founders of European Elections Stats, first asked this question a year ago, they were faced with the sad reality that there was little interest or money for pan-European polls.

Their solution: “we wrote several scripts in R which (1) scrape national polls from open data sources for all 28 member states; (2) calculate four seat projections for the European Parliament based on these polls and different political scenarios; (3) upload all the raw data into an Open Data Hub, where anyone can explore and download our data; (4) visualise the data to allow for easy comparison of political scenarios.”

See European Elections Stats for the interactive R-Shiny application.

Now that the elections are over, they will continue to aggregate national polls and use them to calculate an aggregate index of the pan-European political mood. Watch this space for updates.

Our next conversation

While our eyes were focussed on the 400 million voters in Europe, Eva Constantaras reminded us of the important work being done to inform elections in other parts of the world. Earlier this month, for example, journalists in India worked hard to inform the nation’s 900 million (!) eligible voters. Using data, reporters revealed how many promises were kept since the last election, visualised all 8049 candidates, and more. Inspired by their work, we’re keen showcase the work of other data journalists from all over the world. To help, we’ll have Eva with us for an ‘ask me anything’ in our next edition.

Eva is an investigative trainer, with teams in Pakistan, Afghanistan, and Kenya, and author of Data Journalism By, About and For Marginalised Communities in the Data Journalism Handbook 2. Submit your questions here.

As always, don’t forget to let us know what you’d like us to feature in our future editions.

Until next time,

Madolyn from the EJC Data team

Data on the crime beat

Ryan Martin — Wed, 22 May 2019 08:00:00 +0200

If it bleeds, it leads…or so they say.

While shocking crimes may carry headlines and capture the public’s imagination, the crime beat encompasses so much than murder mysteries and whodunnit stories.

In this edition of Conversations with Data, we’ll be looking at other important angles for crime coverage and the ways that data can inform them in each phase of the reporting cycle.

What you said about story angles

As Albert Bowden told us: “Criminal Justice data is inherently limited, your stories don't have to be”. After speaking to crime reporters across the globe, we have to agree. Here’s a snapshot of four story angles, and how to approach them.

1. Hate crimes

In the United States, there is no reliable national data on hate crimes or other prejudice-driven incidents, despite anecdotal evidence of their prevalence. Step in: ProPublica’s Documenting Hate project. Bringing together a coalition of news organisations, the project aims to build a comprehensive database on hate incidents. Rachel Glickhouse, Partner Manager for the project, told us more about reporting on this angle:

“...in many cases, it’s up to officers to check the correct box or pick the correct dropdown, and those internal numbers eventually make their way to state and federal agencies. That means there’s a lot of bad data -- a lot of undercounting, especially since more than half of US hate crime victims don’t report to police at all -- but also overcounting if they check the wrong boxes. Here’s an example of that problem. We also know that many police aren’t trained on how to investigate these crimes, and that some police aren’t necessarily invested in taking these crimes seriously.”

So, when investigating hate crimes, ask questions like:

Does your department receive training on how to investigate and track hate crimes?
How do you track hate crimes internally? Is there a box to check on police reports?

She also suggests requesting data for the previous year: “Based on their response and how difficult it is for them to pull that information, you’ll get a sense of how they’re tracking these crimes”.

2. Cyber crime

In 2016, Norwegian outlet VG began investigating a sinister dark web website and its nordic users. By monitoring traffic and obtaining information from police, they eventually uncovered an Australian-led, international operation into the world’s largest online community of child sexual abusers. Håkon Høydal, one of the journalists behind the project, shared his learnings from Breaking the Dark Net:

“Closely collaborate with data scientists. While you as a journalist may know the target and the goal, they may know where to find and how to extract the data you need to get there.
Find 'live' data if possible. This allows you to follow trends and operations in real time. This was helpful in our contact with sources during the investigation.
Keep looking. The first couple of months of research led us to a case we couldn’t follow. But it gave us valuable knowledge that led us to the Australian operation.”

3. Witness intimidation

Sometimes murder isn’t the story. After noticing that cases were being dismissed because of witness no-shows, Ryan Martin, from the Indianapolis Star, started wondering: Could witness intimidation be to blame? To answer this question, Ryan and his reporting partner started collecting court and police records for every homicide investigation over a three-year period.

“We input the interesting and relevant data into two spreadsheets. Then we ran our own analysis. Our reporting led us to a four-part series called Code of Silence. After its publication, the city created a $300,000 witness protection program to finally address the problem of witness intimidation,” he explained.

His tip: “Finding your next great investigative story on the crime beat begins with collecting your own data.”

4. Treatment in prisons

In another angle from ProPublica, and their partner The Sacramento Bee, the Overcorrection project looks at resources, safety, and crowding in California’s jails. Although the project has only begun publishing this year, reporter Jason Pohl shared some insights from their progress so far:

“California collects death-in-custody data and details, and this information is maintained in a database online (a records request yielded a more complete and updated version). This was our starting point. Unfortunately, many states do not collect this information at all, so some reporters have had to create innovative ways to get records and build databases of their own. From here, we worked to clean, code, and analyse state-wide and county-by-county. Those findings will help inform our reporting going forward.”

What you said about crime data

No matter how good your angle is, your story’s success depends on obtaining, understanding, and accurately representing its underlying data.

Phase 1: Obtaining data

So, where’s the best place to start looking for data? Sylke Gruhnwald, from the Swiss magazine Republik, sent us an excellent checklist for following paper trail in white collar crime investigations:

“First things first: Check the international press archive for past coverage, and note the names of persons and organisations mentioned.
Check these names against local and national commercial registries.
If you have access to international business registries, check there. If not, you may get access through your university library. And don't forget to check ICIJ's databases, e.g. the Offshore Leaks Database here.
Gather all available public records on the persons and/or organisations that you are investigating.
Read and learn to understand financial statements, financial reports by auditors, and so on.
Build your own database. Helpful insights on how to build your own database are shared by the authors of Story-Based Inquiry here.”

More often that not, crime stories will rely on public records or access requests. If you come up against challenges, SpotCrime’s Brittany Suszan, reminded us that “journalists (and the public) have the right to access data in a timely manner and machine readable format -- especially if a government vendor already has access”.

“Don’t let a vendor control your public crime data, don’t let your police agency give preferential access to a private vendor (more here),” she continued.

Phase 2: Analysis

As a first piece of advice, Paul Bradshaw highlighted the importance of understanding “the difference between the two key pieces of crime data: 1) data on recorded crime, and 2) data on experiences of crime. The second type of data is important because many crimes don't get reported, and particular types of crime are more under-reported than others.”

In differentiating these data-types, Wouter van Loon, from Dutch outlet NRC, provided some examples of crimes that are, and aren’t, as likely to be reported. Take the drug trade: “none of the concerned people, sellers or buyers, have a reason to go to the police. So we only have information about the drug trade when the police find an armload of drugs.”

And then there are crimes that are easier to measure: “Murders, for example: there is always someone missing, and most of the time there is a body. However, even then there are pitfalls. There is a jungle of terminology around crime statistics. It is easy to confuse the number of suspicions, convictions, convicted perpetrators, victims, registered crimes, attempted crimes, and crimes which are actually committed.”

Alain Stephens, West Coast Correspondent for The Trace, agreed. To ensure you’re understanding data correctly, he recommends learning the lingo or getting an insider to translate for you.

“Without knowing know their vernacular, numbers can lie to you: Increased drug arrests can look like a spike of a dangerous new narcotic, when really you’re looking at a concentrated enforcement effort on the side of the police. High conviction rates may have little to do with a prosecutor’s skill, and more to do with them dropping cases they feel won’t tick up their win ratio”, he explained.

Another strategy, used by Clementine Jacoby of Recidiviz, is to physically map out how a particular flow through the justice system is modeled in the dataset that you’re working with. When they looked as technical revocations, which is where a person is sent back to prison for a technical rule violation, they “picked a single person in the dataset, looked at all of their historical records, and manually walked through their history. This simple exercise helped us get clear on how the revocation process worked in that state and how the data represented it".

Phase 3: Reporting

Being able to understand and explain your data is also important to defend your story’s findings -- as Lauryn Schroeder, the San Diego Union-Tribune’ Watchdog Reporter, knows all too well:

“Our Crime Counts project was based on records that came directly from local police agencies, but we analysed it in a much more granular way. This led to a first-of-its-kind series on violent crime at nearly the street level, but, as a result, we encountered a lot of push back from police and local officials who either couldn’t understand what we did or weren’t able to reproduce it -- or both. During our reporting phase, I spent most of my time defending our methodology and explaining how we got the results we did to convince sources to contribute or comment. There was a lot of ‘if your findings are right’ and ‘we can't comment on something we don't know is true’. Providing original files and the code isn’t always enough, and every data journalist should be prepared for that.”

And, finally, going back to the ‘bleeds/leads’ adage, journalists should be wary of sensationalising findings. Focus on “news not narrative”, said Andrew Guthrie Ferguson, author of The Rise of Big Data Policing.

“Scandal may sell but it does not tell the story readers deserve. Too many news stories about big data policing rely on fear-inducing tropes and not facts. By recycling the same metaphors (‘Minority Report’) and examples (the same few studies), the narrative of data-driven danger generates clicks and criticism, but not critical thinking. The real stories are more nuanced and harder to frame as good or bad, transformative or dystopian.”

For more tips, or to source the community’s help with an upcoming crime project, we’ll be continuing the discussion in our forums here.

Our next conversation

This week, more than 400 million voters across Europe will have the opportunity to select the next European Parliament. The election is the first since the rise of populism and, given the Parliament’s long term trends of voter disengagement, we thought we’d take a look at how data journalists are informing the public’s vote. Submit your favourite examples of data election reporting or visualisations, or tell us about a piece you’ve produced.

As always, don’t forget to let us know what you’d like us to feature in our future editions.

Until next time,

Madolyn from the EJC Data team

Ethical dilemmas in data journalism

Stijn Debrouwere — Wed, 08 May 2019 07:00:00 +0200

To publish or not to publish? It’s a question every data journalist will undoubtedly face at some point in their career.

In this edition of Conversations with Data, we’ve collated your anecdotes and advice into seven tips to help you work through common ethical dilemmas. Whether it’s concerns over data privacy, identifying bias, or providing appropriate context, we’ve got you sorted.

What you said

1. Watch out for noise in your data; it can bias conclusions -- Stijn Debrouwere, course instructor for Bulletproof Data Journalism and Going Viral Using Social Media Analytics

“For me, the big dilemma is always whether we’ve been able to collect enough information, conduct enough analyses and listen to enough viewpoints to warrant publication. It always seems as if there is one more check you could do, one more thing you could investigate.

I once did a story where I tried to gauge how too much tourism can spoil a city for its inhabitants, and which European cities suffer most. But guess what: Every city has a different definition of a city centre, for some cities the most recent tourism numbers are from five years ago and for others there is no official estimate of the population density in that area.

What to do? Personally, I don’t mind that much if data is a bit vague, imprecise, and perhaps not so expertly collected. That’s noise, and different sources of noise tend to cancel each other out. I do get worried when I notice that the noise in the data starts to bias conclusions one way or the other. That’s usually when I decide that I would rather publish nothing than risk misleading the audience.”

Stijn’s course Bulletproof Data Journalism helps reporters protect themselves against all kinds of errors that can arise when working with data.

2. Be just as skeptical with data, as you would be with human sources -- Subramaniam Vincent, Director of Journalism and Media Ethics at the Santa Clara University Markkula Center for Applied Ethics

“Journalists are forever on deadline. One of the ethical challenges in working with data about people is the temptation to take a large dataset at face value and make rich visual representations quickly. But gaps in surveying methods and labeling vocabulary can ignore crucial context from the lives of the people the data is supposed to ‘represent’.

Data journalists could get baseline training in data ethics and exploratory data analysis (EDA) to develop comfort with data-skepticism the same way they do with people as sources. This may help them interpret a dataset's usefulness carefully, target conventional methods (e.g. interviews) to corroborate the story the data may offer prima-facie, or even explore a different story altogether.”

3. When scraping online, think about whether data subjects would’ve consented -- Amy Mowle, PhD Candidate at Victoria University

“I had figured out how to use a web-scraping tool called TWINT to bypass Twitter's API and scrape historical Twitter data, exported into an Excel spreadsheet and ready for analysis.

I took this breakthrough to my ethics officer, to make sure there was nothing untoward about using this method for data collection. While I was aware that webscraping was forbidden by the Twitter Terms of Service (TOS), I had assumed the TOS didn't directly apply to me, since I didn't have a Twitter account and thus wasn't bound by their TOS.

My ethics officer pointed out that those individuals whose data I was scraping did agree to the Twitter TOS, and as a result, were (theoretically) under the impression that their data wouldn't be scraped. By going against the Twitter TOS, I was, in a way, breaching the trust of those whose data I was collecting. While this method of data collection had to ultimately be abandoned, the experience really opened my eyes to the different ways we need to think about consent and participation.”

4. Accept responsibility when things go wrong -- Chiara Sighele, EDJnet Project Manager based at OBC Transeuropa

“Europe One Degree Warmer was the first large data driven investigation we carried out as the European Data Journalism Network. The investigation provided a tangible example of global warming in Europe and was successful. However, particular challenges came with producing such a piece of work based on data: How to deal with mistakes when data journalism goes wrong? How to handle transparency and corrections while building trust with our audience? How to put in place more checks to ensure the integrity of the data, whilst respecting the independence of each partner?

After the first round of publications was out, we noticed that the data for some cities was erroneous. We acknowledged the mistake (also on Medium) and fixed it, but as a network, we had to critically reflect on our practices of doing data journalism in a heavily collaborative, transnational, and decentralised setting. We have faced up to the importance of reinventing research and fact-checking procedures when working in a network, and have opened up our database to allow for external scrutiny and reuse.”

This graph shows the difference between the correct and erroneous for the Swedish town of Kiruna and its surroundings.

5. Get trained in data security and privacy -- Colin Porlezza, Senior Lecturer at City, University of London and author of a chapter on data journalism and the ethics of open source in Good Data

“Journalism -- and data journalism in particular -- is becoming increasingly networked: Collaborative news work often goes beyond the boundaries of the news organisations and includes actors from outside the institutionalised field of journalism, such as computer scientists, hackers, visualisation experts, activists, or academics. However, such partnerships can entail specific ethical challenges related to the use of (personal) data: Who is in control of the data? With whom can the data be shared? How should the data be protected and secured?

News organisations should not only make sure that sensitive data is securely (electronically) stored, but also that journalists get appropriate training on how to avoid security risks and potential misuse of data due to increased sharing and partnerships. A good place to start could be the recently published guide on data protection and journalism by the UK’s Information Commissioner’s Office – which is, by the way, currently calling for views on a specific data protection and journalism code of practice.”

6. Question the ways that data has been classified -- Brant Houston, Professor and Knight Chair in Investigative Reporting University of Illinois

“A number of years ago one of my graduate students at the University of Missouri was working on a story on guns and acquired a county database on those licensed to carry small firearms. The database contained both those that had been approved or denied but had no specific column for that status. Instead, there was a column with a license number in it if the person had been approved.

If the person had been denied, the short reason for denial was put in place of a license number. In some cases, the phrase listed was ‘mental case’. We could have used that information in connection with the names of the persons, but the judgment on the person's state of mind was being made by an unqualified clerk. We decided to note in the story that ‘mental case’ was one of the reasons for denial but did not connect that reason with a specific person. And we also questioned the use of that phrase.”

7. Beware of the bell curve -- Rebekah McBride, author of Giving data soul: best practices for ethical data journalism

“When reporting on data that is representative of a population, context is key. Interviews often provide that context while strengthening data journalism stories, but interviews can also lead to the misrepresentation of communities and perpetuation of stereotypes if the sources represent only the tail end of the data in the curve. In other words, an interview with only one or two individuals from a community may result in a misrepresentation of the median or average population, and instead highlight only the extremes. Interviewing several members of a community who represent a range of different age groups, socioeconomic backgrounds, and so on, can help reporters to avoid this ethical dilemma. Additionally, becoming embedded in a community and building relationships will help reporters to diffuse an understanding of what it means to be a member of that community.”

Caught in an ethically sticky situation? We’ll be keeping our forums open for you to source extra advice from the community. The Ethical Journalism Network is also working on course to help data journalists critically reflect on ethical challenges throughout the each stage in the reporting process. Watch this space for more.

Our next conversation

Journalists working on crime beats are no stranger to ethical dilemmas. It seems fitting then that our next edition will give these journos a chance to share their experiences with criminal justice storytelling. Let us know your top sources of crime data, unexpected challenges on this beat, and any advice for crime newbies, by starting a discussion in our forums.

As always, don’t forget to let us know what you’d like us to feature in our future editions.

Until next time,

Madolyn from the EJC Data team

Python and newsrooms: AMA with Winny de Jong

Winny de Jong — Thu, 25 Apr 2019 11:26:00 +0200

Ever dabbled in Python, but somehow didn’t persevere? Or is your newsroom looking to invest in a data journalist? In our two latest courses on DataJournalism.com, renowned Dutch data journalist Winny de Jong answers these questions and more.

But we know that sometimes you’ll still have lingering questions after finishing a course. So, in this edition, we have Winny with us to get you some closure.

Winny teaching for the European Journalism Centre.

What you asked

How can non-coding data journalists first get started with programming?

“Simple: Pick a data story that isn’t too ambitious -- preferably one that you know how to make without programming and no deadline stress, figure out how to get it done with programming, then do it.

This way, you’ll be working on the edge of your comfort zone: You’re learning something new, but there is no deadline stress; you have a backup method when deadline stress begins to occur; and, if you’re not using it as a backup method, you can at least check on your own very first programming results. (And you’ll probably want to check on your first programming work -- I know I did.)”

How and why did you choose to go with Python, when there are so many other programming languages out there?

“When I just started learning data skills, I procrastinated by learning new things. Ever since I learned that about myself, I live by this rule: Only learn the things you have an immediate use for. So a couple years ago, I wanted to use more complex data analysis for my stories. I went with Python since that was already a popular programming language among data journalists, and compared to other languages its syntax is quite friendly. Also, I knew I wanted to do data analysis more than data visualisation. (Otherwise, Javascript might have been a better option.)”

What books can journalists use to learn Python?

“Personally I’ve always used the internet as my university. There are a lot of good Python resources out there, here’s a list of some favorites:

the Coursera specialisation Python for Everybody, for its beginner friendliness
Learn Python the Hard Way (Paid), for its thoroughness
RealPython.com for its tutorials -- nice when you want to learn a specific library or tool
DataCamp Python (Paid), very data focused -- what’s in a name? ;)
shameless self-promotion: the Python for journalists course at DataJournalism.com.

Over time I did use some books: Automate the Boring Stuff, which I liked for its immediate use cases; Python for data analysis (Paid); and Learn Python (Paid). There’s also this open e-book that goes with the Coursera specialisation, Python for Everybody, that I liked.”

Winny's courses on DataJournalism.com.

What are the biggest challenges when newsrooms first hire a data journalist, and how can these be addressed?

“In my experience, culture and lack of knowledge are the two biggest challenges. The person who hired you might know what you do and how it adds value, but if no one else in the newsroom does you’ll have to work on that first. Organise data talks/lunches, create a data consultation hour, and collaborate. All. The. Time. With. Everybody.

When I started working at the Dutch national news broadcaster, there were a lot of conversations on what to expect when hiring a data journalist (me). Being a team of one, talking with editors, coordinators, and colleagues, managing expectations and creating open communication lines helped me to integrate into the newsroom pretty fast. Keep in mind that colleagues might not understand your data process exactly, but to collaborate it’s enough if they get the abstract version. For example, to discuss a scraper all parties need to know what it is, but not all parties have to know how to write one.”

What additional considerations are there for data journalists working at national and public broadcasters?

“Well, when working at broadcasters in general, if you’re doing data driven journalism it might be that your work is the foundation of a story -- but this foundation is not really visible. Some stories I’ve worked on for days, but no one would know. In the final product the anchorman says: ‘…our research shows that…’. But if you work on stories for the people affected by that story and not your ego, you should be fine. If a story is easier to grasp when you don’t show its data-foundation, you should be a-okay with it.”

Our next conversation

And now to a topic that all journalists will be familiar with: ethical dilemmas. Whether it’s concerns about the privacy of individuals in a dataset, biased data, or potentially misleading conclusions, we want to hear from you. Share the dilemmas you’ve faced, so that others won’t have to.

As always, don’t forget to let us know what you’d like us to feature in our future editions.

Until next time,

Madolyn from the EJC Data team

Data visualisation trends and challenges: AMA with Data Journalism Handbook 2 authors

Wibke Weber — Wed, 10 Apr 2019 07:00:00 +0200

Hi there,

Today, we’re sending a very special edition of Conversations with Data -- it’s our first officially connected with DataJournalism.com!

In case you’ve been too busy to fully explore it, be sure to check out our new video courses, long reads, newsletter archive, and community forums, where you can continue our conversations.

DataJournalism.com is also the new home to our much-loved Data Journalism Handbooks, so it’s fitting that we have our data visualisation authors with us for this edition’s AMA. Let’s see what they had to say!

What you asked

Our first question is from our data newbies. It can be confusing to start understanding and applying design concepts, so you asked:

How should I pick a visualisation design that fits my dataset and what it wants to say?

Will Allen says: “There are lots of different chart types and design choices available to people wanting to visualise data. Each dataset presents its own unique combination of features, which lend themselves to particular choices. Fortunately, there are several good resources that provide guidance to help you choose. One of these is Andy Kirk’s book Data Visualisation: A Handbook for Data-Driven Design. Another is this online guide that explains how to read some of the most common chart types. Finally, The Financial Times (a UK-based newspaper) developed a useful poster that explains why particular charts are better suited for expressing certain kinds of ‘stories’ within datasets.”

Seeing Data provides a useful online guide, which steps you through reading different chart types effectively.

So, to make a visualisation have impact, what is the most important thing that data journalists should know about audiences?

Rosemary Lucy Hill says: “Audiences are very diverse. Their educational and socio-cultural backgrounds, gender, age, ethnicity, and political views all mean that they are likely to view the same visualisation differently. There are simple things that data journalists can do to make visualisations more impactful across this diversity of users, all of which make them easier to understand. These include: give the visualisation a good, descriptive title; include keys, legends and labels; use high contrast colours that colour blind people can differentiate; put a ‘how to read this chart’ box next to the visualisation; and use text to draw out the key points.”

Moving to newsrooms, what is the most important data visualisation trend for them to be aware of and why?

Wibke Weber says: “The most important newsroom trend for data visualisation is mobile-first. It has become a design principle for data visualisation because more and more people access news and other information via smartphones. Designing for mobile devices means taking the affordances and the usability of small screens into account. This leads to data visualisations that are less interactive, less complex, and more simple in their visual forms. It leads to what we call ‘scrollytelling’: telling data stories that reveal themselves as users scroll down."

And what about data visualisations firms? With so many out there, how should newsrooms decide when to hire one?

Helen Kennedy says: “When to hire one is relatively straightforward – when you haven’t got the in-house resource to do what you want to do. Which one to hire is slightly trickier. You can assess the quality of the work that datavis firms are producing by interacting with them – this guide, produced by Andy Kirk for the Seeing Data project – is designed to help you carry out such an assessment.

Check out firms’ presence on social media too – are they widely followed? Do they seem to be esteemed by others working in the field? And ask yourselves if you like their visual style.

However, newsrooms are increasingly hiring in their own datavis expertise, and developing bespoke, in-house data visualisation tools that can be used by journalists who are not expert in such matters. This helps to ensure that all datavis that are produced conform to house style, and that they are widely embedded across news stories.”

Finally, what is the biggest challenge facing visual storytellers today and what can be done about it?

Martin Engebretsen says: “I believe there are two equally big challenges facing visual storytellers today. The first one is the lack of time and attention. People read in a hurry, and they tend to be easily disturbed. That means that visual storytellers need to connect meaning to visual forms in ways that are more or less intuitively understood, and to create a very clear storyline. The other challenge is about trust. In an era of fake news, visual storytellers need to be very explicit about their sources and methods.”

For more on trends and challenges, read the team’s full Data Journalism Handbook 2 chapter, Data Visualisations: Newsroom Trends and Everyday Engagements.

Our next conversation

To mark the launch of DataJournalism.com, we also released two highly anticipated video courses from Winny de Jong:

Winny de Jong is data journalist at the Dutch national broadcaster NOS, and we’re excited to have her join us for an AMA in our next edition. Submit a question in our forum.

As always, don’t forget to let us know what you’d like us to feature in our future editions.

Until next time,

Madolyn from the EJC Data team

Launch of DataJournalism.com

Letizia Gambini — Tue, 09 Apr 2019 19:59:00 +0200

Hi there!

This is Letizia from the EJC Data team. I’m hacking this special edition of Conversations with Data to introduce to you our brand-new hub for data journalism: DataJournalism.com!

Here are three tips to get you started:

Learn from an expert in our new Long Reads section. From mapping mistakes to putting data into context, we’ve got you covered with insights from some of the most influential data journalism experts in the world.
Master a new data skill. Enrol in one of our many video courses, including a brand new course on Python for journalists with Winny de Jong.
Ask questions and meet with other like-minded data journalism enthusiasts in our discussion forums and expand your professional community.

Join us: it’s free! After signing up, you can enrol in any of our video courses for free and take part in our community discussion forums.

But wait, there’s more!

All DataJournalism.com members get a full set of promotions and discounts to top-level tools directly from their profile. Your welcome goodie bag includes security app 1Password, scraping made easier with DIFFBOT, email providers ProtonMail & FastMail, off-shore hosting on FLOKINET, mind mapping tool MindMeister, beautiful charts with Piktochart, surveys on Survs, web hosting with SiteGround, and writing with ProWritingAid.

If you have any questions, comments or suggestions, please don’t hesitate to get in touch.

I hope to see you on DataJournalism.com!

Best, Letizia

DataJournalism.com is created by the European Journalism Centre and supported by the Google News Initiative.

Why we love journalistic databases

Leonard Wallentin — Wed, 27 Mar 2019 11:04:00 +0100

Databases can be so much more than just a place to find stories. Why not make a database the story?

Welcome to our 22nd edition of Conversations with Data, where we’ll be looking at the many ways journalistic databases can be used as an interactive end-product.

From localising stories for individual readers to summarising complex data in a more user-friendly format, searchable databases are a must in any data journalist’s toolbox.

We don’t need any more convincing, but in case you do, here are 7 of the community’s favourite journalistic databases.

What you liked

1. Dying Homeless, the Bureau of Investigative Journalism (TBIJ)

“I really appreciate the Dying Homeless database from TBIJ. It combines data from many sources -- local reporters, charities, grassroots groups, and readers -- and really makes the most of the audience engagement aspect of creating such a database. The database itself combines quantitative and qualitative data in a powerful way -- the headline figure of 800 people have died homeless since the database began in October 2017, combined with the stories of who the people were.

-- Zara Rahman, co-author of Searchable Databases as a Journalistic Product in the the Data Journalism Handbook 2

2. Europe One Degree Warmer, European Data Journalism Network

“In Europe One Degree Warmer we collected local temperature data from 117 years, in 558 cities across Europe. This allowed us to do local reporting on global climate trends. Because our focus was on the local stories, we published our findings as separate, robot-generated texts in nine languages for each city, producing in total 5,022 articles, with over 60,000 maps and charts to accompany them. The reports, and the underlying data, are now being updated to include 2018 temperature data. Dive into the numbers behind the reports here, or read our notebook explaining the method used for the analysis.”

-- Leonard Wallentin, journalist and CEO at J++ Stockholm

3. Better Government Association projects

“We produce a number of different databases and tools with our investigations. For example, in Elevators in Chicago Public Housing, we used a database to show that residents of Chicago Housing Authority apartment buildings, mostly seniors, have faced elevator problems for years. The database allowed users to check the status of their elevators, including failed inspections and violations issued by city inspectors. Hover over the squares to get additional information, such as dates and notes from city inspectors.

Another example is the Suburban Police Shooting project. Since 2005, there have been at least 113 police shootings in suburban Cook County. Not a single officer involved in those shootings was disciplined, fired or charged criminally, a year-long investigation by the Better Government Association and WBEZ found. What’s more, almost none of those shootings were even reviewed for misconduct.

The searchable database, shedding light on police shootings in Chicago.

In the database, a summary of each shooting is listed, along with a map that details the location of each incident. To track each case, we compiled tens of thousands of pages of public records and conducted more than 100 interviews with police, law enforcement supervisors, elected officials, witnesses to the incidents, and many of the individuals who were shot. The documents behind the data, as well as the data itself, are available to download.”

-- Jared Rutecki, investigative reporter for the Better Government Association

4. The Press and Journal Scotland Sex Offender map

“We used Freedom of Information to request the number of Registered Sex Offenders in our area. After finding out the number had increased we decided to show this on an interactive map and searchable database. We felt this would show the data in a more interesting manner than just listing it and would let readers engage with the story by searching for their postcode.

A screenshot of the project’s explorable map.

Both the map and database were created using Flourish who very kindly provide free pro accounts to newsrooms via the Google News Initiative. It was the most clicked on story on our website that month and generated a lot of reader engagement.”

-- Lesley-Anne Kelly, data analyst, newspapers

5. Paul Biya’s Cameroon

“Paul Biya, president of Cameroon since 1982, spends much of his time abroad on private trips, in particular at the luxurious Intercontinental hotel in Switzerland. This was a well-known fact, but nobody had documented exactly how much time he was away from his office. Luckily for us, each time Biya comes and goes from Cameroon, the event makes the cover of the government-owned newspaper, the Cameroon Tribune. So, we collected as many of these covers as we could and found out that President Biya had spent over 4.5 on ‘private trips’ abroad, costing over $100 million to Cameroon's citizens. We wanted to make the detailed information public so people can find out when the president was away, so we uploaded all of the covers and the data on a website.

-- Emmanuel Freudenthal, freelance journalist

Newspaper covers from the database.

6. USASpending.gov

“USASpending.gov is an endlessly fascinating site: it documents pretty much every cent that the US government spends, in the USA, and around the world. I have spent many hours using the ‘place of performance’ filter to see what is being spent in individual countries outside the USA.

Whether you want to look at big patterns or small details, there are thousands of potential story ideas here: from the Israeli genomics firm receiving Pentagon money; to President Chavez’ brother’s firm, which was high on the league table of recipients of federal funding even as the White House was spurning dealings with Venezuela."

-- Jonathan Stoneman, data journalism trainer

7. Ranking das Escolas, Renascença

“We usually transform the Portuguese national exams classifications database into a searchable database. This allows the user to learn more about every school in the country and discover things like the average grade per school. Clicking over the school on the table allows the user to acquire further data, such as the average grade per subject, as well as to understand more about the social context of that school, for example the percentage of students with financial aid from the state, the number of teachers, and the average number of years that the students’ parents were in school.

-- Rui Barros, data journalist at Renascença

Our next conversation

What’s better than Conversations with Data in your inbox? A live version of course! Next week, on 4 April 2019, we’ll be bringing the format to the International Journalism Festival in Perugia, featuring our Data Journalism Handbook 2 authors Christina Elmer, Paul Bradshaw, Jacopo Ottaviani and Lindsay Green-Barber.

For those of you who aren’t Perugia-bound, you don’t need to miss out on the fun! In our next edition, we’ll have the authors of our data visualisation chapter with us to answer your questions on newsroom trends and challenges.

As always, don’t forget to let us know what you’d like us to feature in our future editions.

Until next time,

Madolyn from the EJC Data team

Data journalism in the post-truth world: AMA with First Draft

Madeleine Gandhi — Wed, 13 Mar 2019 21:41:00 +0100

Almost three years on from the declaration of ‘Post-Truth’ as the 2016 Word of the Year, it looks like its close cousins, mis- and dis-information, are here to stay.

And, unfortunately for data journalists, it’s easy for our arsenal of numbers and statistics to aid various untruths. So, what’s a reporter to do?

In this edition of Conversations with Data, we gave you the opportunity to ask the team at First Draft. With a firm focus on tackling information disorder, First Draft is an international nonprofit that uses fieldwork, research, and education, to promote trust and truth in this digital age.

Image: From First Draft’s report, Information Disorder: An interdisciplinary framework.

What you asked

To make sure we’re all on the same page, let’s start with a question on the various terminologies used in the information disorder space:

What are the differences between fake news, disinformation, and misinformation?

'At First Draft, we don’t use the phrase fake news whenever possible. This is because it’s actually a terrible phrase for understanding this space -- much of the most problematic content isn’t fake, it’s genuine but used out of context. Similarly, a lot of it could never be described as news -- it’s memes, social posts, and fabricated videos. More importantly the term has been weaponised by politicians around the world to describe journalism they don’t like. As a result, research shows that when audiences are asked about fake news, they think it describes the professional media. So when journalists use the phrase, it’s harming journalism itself.

We prefer misinformation and disinformation. Misinformation is false or misleading information that is shared by people who don’t realise it’s not false and they mean no harm.

Disinformation is false or misleading information that is shared by people who know that it’s false or misleading and hope to cause harm by creating or spreading it.'

Image: Types of Information Disorder. Credit: Claire Wardle and Hossein Derakshan, 2017 (CC BY-NC-ND 3.0).

Because audiences tend to inherently trust numbers, do you see data journalists as playing a distinct role in the fight against mis- and dis-information?

“The biggest challenge in this space is that audiences gravitate to information that supports their existing world view. There is some good research by Dan Kahan at Yale that showed highly intelligent people doing mathematical somersaults with data in order to make it support their pre-existing views on gun control. But it’s true that in order to convince people about the importance of evidence in society, data journalists play an important role. We need deeper investigations, both traditional investigative journalism as well as data journalism to build trust with audiences.”

What solutions are realistic for society to become better at understanding statistics?

“We need stronger education to teach people the skills they need to be able to read data and statistics. This should be in schools as well as part of life-long learning initiatives. Many journalism schools still don’t teach this enough, which means too many journalists can’t critically assess data, let alone their audiences. It’s definitely a required skill that too few people have.”

What is the most successful data driven initiative you’ve seen to date?

“We focus on misinformation, so we’ve been very impressed by journalists like Craig Silverman or Julia Angwin who have used publicly available data to put pressure on the technology companies. For example, Craig looked at the most shared viral misinformation stories on Facebook. Julia found that advertisers could target ads on Facebook to people based on their religion or ethnicity in ways that were highly problematic.”

If you could choose one thing for all data journalists to implement in their work, in order to curb mis- and dis-information, what would it be?

“We need more data journalists to investigate potential algorithmic bias. We can only see how untended bias plays out when people search different queries and compare results. Most bias is not intentional but with the power algorithms have on almost all aspects of our life now, we need to continuous research and oversight from independent journalists and researchers.”

You can get started in algorithmic accountability reporting with our how-to chapters in the Data Journalism Handbook 2, or expand your verification skills with Craig Silverman’s LEARNO course.

Our next conversation

While most journalists produce stories by exploring databases, a new form of journalism is emerging where databases are the end product. Remember ProPublica’s Dollars for Docs project (or its German equivalent, Euros for Doctors by Correctiv)? These types of journalistic databases allow readers to explore deeper into the aspects of an issue that immediately interest them. In our next edition, we need your help to share other great examples of searchable databases as end-products.

As always, don’t forget to let us know what who you’d like us to feature in our future editions.

Until next time,

Madolyn from the EJC Data team

Award-worthy data journalism

Wendy Ruderman — Wed, 27 Feb 2019 21:25:00 +0100

With the journalism awards season well truly underway, this week’s edition of Conversations with Data looks at the crucial question: ‘What makes data journalism award-worthy?’

Could it be the use of strikingly beautiful data viz? Or hard-hitting investigative insights? Or perhaps it’s neither of these. Read on for insights from past awardees, circuit observers, and competition insiders so you too can experience the honour, the prestige, and the glory!

What the awardees said

Taking in a Philip Meyer Award and also recognised as a finalist in the Online Journalism Award, the Philadelphia Inquirer’s ongoing series Toxic City suggests that impact could be a factor. Wendy Ruderman, from the investigation, told us more:

“The series examines how environmental hazards in Philadelphia public schools make children sick and deprive them of healthy spaces to learn and thrive. We started the project in early fall 2017 by asking ourselves, ‘What data would be worth getting?’ and then, ‘How do we get that data?’ We wanted to empower parents with information about physical conditions in their child’s school,” she said.

The team spent seven months training teachers to take water samples in their schools to test for mould, asbestos, lead paint and lead. Using nearly 190 samples from 19 elementary schools, the tests exposed millions of cancer-causing asbestos fibres, and lead dust, at hazardous levels.

“The series harnessed data, science, human sources, medical records and the voices of children and parents impacted by health hazards in their neighbourhood schools. The result was a project that resonated with readers and spurred state and local leaders to take action.”

On the other end of the spectrum, the recently-announced winners of the World Data Visualisation Prize spoke about the importance of balancing a story’s elements. From the firm Interacta, Nikita Rokotyan shared their recipe for success:

“We think it's all about finding a balance between data, visual design, and story. Such balance actually doesn't exists because every reader perceives visualisations differently through their own prism of past experiences. But what you can do -- if you're working on an interactive piece -- is to let users to go deeper into data if they feel like exploring it and come to their own conclusions as well.”

Their winning visual is a perfect example of this approach. Using an artificial intelligence technique called t-SNE, they created a new world map, where different countries are clustered by similar scores on various indicators. The project is interactive and users can explore the map through over 30 metrics.

Image: Interacta.

What the observers said

When we spoke to journalists and researchers who follow the circuit, there was one key award-worthy factor: telling important stories well.

Marie-Louise Timcke, Head of the Interactive Team at Leitung Funke Interaktiv, said that “the best data journalism projects are those that approach a relevant topic with a different, unexpected twist. It doesn't have to be anything big, complicated, or time-consuming to research and implement. It's the small but ingenious ideas.”

As an example of this approach, she pointed to the New York Times’ piece How much hotter is your hometown than when you were born. By breaking the complex issue of climate change down to the individual, it leaves “goose bumps on the user with just a very simple line chart”.

Image: New York Times.

And these thoughts are backed up by research from Bahareh Heravi and Adeboyega Ojo at the University College Dublin.

“A majority (73%) of winning data journalism projects between 2013 and 2016 appear to aim to ‘inform’ the audience. This is followed by stories that aim to ‘persuade’ the reader (41%) and ‘explain’ something to the reader (39%). Stories with the aim of ‘entertainment’ had the lowest number in DJA winning stories in this period, with only 2% the winning stories,” explained Bahareh.

Like the team from Interacta, their research also highlighted the popularity of interactivity amongst judges.

“...nearly 60% of the winning stories between 2013 – 2016 had interactive visualisations, and in 27% they had search, filter or selection options. Projects with only static elements formed around only 10% of the winning stories between 2013 to 2016. In 2017, however, we saw a surge in winning stories with only static elements, forming a third of all winning cases. This may be due to the increasing popularity of tools and libraries such as ggplot2.”

Another analysis, conducted by Alice Corona of Batjo, again underscored the importance of quality journalism and stories, above all other criteria. Her tip: "If you have a good story, don't chicken out just because you think you don't have a fancy enough interactive viz to go with it. It's, first of all, a journalism award”.

Image: Distribution of dataviz subjects of the 302 data visualisations that Alice analysed.

To sum up, Wiebke Loosen, who authored a Data Journalism Handbook 2 chapter on the subject, provided these top components of an award-worthy story:

it critically investigates socially relevant issues
it makes the data society understandable and criticisable by its own means
it actively tries to uncover data manipulation and data abuse
it keeps in mind, explains, and emphasises the character of data as ‘human artifacts‘ that are by no means self-evident collections of facts, but often collected in relation to very particular conditions and objectives.

What the insiders said

In the end, it’s always up to what the judges think -- although it seems that they too value the features we’ve just discussed. According to David McKie, from the 2018 Philip Meyer Award panel, the strength of a project is driven by the quality of the data and the stories behind the numbers.

“The quality of the data -- obtained through freedom-of-information battles, scraping websites, the use of social research methods, or all of the above -- sets everything in motion. The findings become compelling, important, potentially game-changing. The people whose stories are told humanise the findings, making them relatable. These were the common characteristics of the finalists, and many of those that were also in the running for the 2018 Philip Meyer Award. In short, award-worthy, data journalism projects use data to tell stories that citizens and lawmakers need to know.”

Simon Rogers, Director of the Data Journalism Awards, put it simply: it’s about “tackling the huge issues of the news today and changing how we see the world”.

And even if you don’t win, Christopher Persaud of the Sunshine State Awards, reminded us that there’s still value in entering.

“...awards give data reporters something to work towards, plus shows them by example how they can do better,” he said.

Our next conversation

If there’s one thing journalists are united in the fight against, it’s disinformation. At the European Journalism Centre, we’re proud to support the team at First Draft in combating mis- and disinformation through fieldwork, research, and education. To this end, they’ll be holding three invite-only summits in March, offering our readers (that’s you!) the exclusive opportunity to attend. Register here for free.

Can’t attend? No problems. We’ll be having the team with us for an AMA on data and disinformation in our next edition.

As always, don’t forget to let us know what who you’d like us to feature in our future editions.

Until next time,

Madolyn from the EJC Data team

Historical data journalism

C.W. Anderson — Wed, 13 Feb 2019 21:34:00 +0100

History buffs, this one’s for you.

Whether it’s as far back as 1888, or even a hundred years later in 1976, there have always been journalists hard at work using data -- but we so very rarely get to see their work. This edition is about to change that.

To shed light on the work of early data pioneers, we’ve collected four of your favourite examples of data journalism from history.

Starting with our earliest piece, let’s begin our trip through time.

What you liked

1. The Great Storm, 1888

“Everett Hayden helped co-found the National Geographic Society and published The Great Storm... in its Magazine's first issue (1888). It contains four weather maps that show isobars, isotherms, and wind direction. The bucketed diverging blue-red colour palette goes from below- to above-freezing. The last two charts show the track of data-recording vessels and a more abstract look at the variation of barometric pressure.

Image: National Geographic.

Hayden's prose elevates this exceptional example of a data story. He teaches readers what they need to know about meteorology to appreciate the Great Storm. He details the impressive data provenance (‘never have the data been so complete and reliable for such a discussion at such an early date’), admits to its limitations (‘be careful and cautious in generalIzing from the data at hand’), and respects the reader's intelligence while easing them into technical terms (isobars and troughs). Hayden, a sea storm expert, makes the storm come alive with salty language: ‘the warfare of the elements so soon to rage with destructive violence’. Over 130 years later, this piece of data journalism is still a model worth studying, and aspiring to.”

-- RJ Andrews, author of Info We Trust: How to Inspire the world with data

2. The Men and Religion Forward Movement, 1913

“My favourite example of historical data journalism probably isn't what you were expecting. It isn't beautiful. It isn't even good data, or good journalism. But this is my point -- in my chapter in the Data Journalism Handbook, I argue that there are emotional and aesthetic meanings to ‘datafied journalism’, feelings that go beyond the provision of meaningful and accurate information. In other words, being immersed in a world of data journalism provides us, as news consumers, with a certain feeling of quantified objectivity that is meant to represent a particular mode of making news and a way of being in the world.”

-- C.W. Anderson, author of [Genealogies of Data Journalism] in the Data Journalism Handbook 2

The Men and Religion Forward Movement, found in Press Messages and Journalism of the Men and Religion Forward Movement (1913).

3. The Economist, 1967

“This map was published in The Economist on December 16, 1967. What I like about the map, despite its obvious shortcomings (pie charts, no scale for the bars, it's black and white), is that it seems the editors took a risk on it: It is big and crucially it comes with an explanation of how to read it next to the legend in the top right corner. This shows that readers at the time weren't expected to immediately ‘get’ the visualisation at first glance. I think we have come a long way in terms of both data visualisation and data literacy since then. The map is a great reminder that we need to be willing to take risks and experiment with new visualisations -- even if we sometimes go wrong, in the long run, that's how we make progress.”

-- Marie Segger, The Economist

The Economist.

4. Mission to Earth: Landsat Views the World, 1976

“Published in 1976, the book documents the first four years of data collected by Landsat. It dates from an era before the world wide web and easy download of digital data, when information was mainly distributed on paper. Over 460 pages long, the book features 400 plates; mostly full Landsat scenes but also mosaics, locator maps, and geological sketches.

Mission to Earth: Landsat Views the World.

At the time of publication, satellite imagery was novel, so there’s a thorough description of the data and a list of applications—ranging from monitoring agriculture and mapping urban sprawl to detecting floods and measuring snow cover. There’s even a little muckraking:

‘Although it had been known that clear-cutting techniques were being applied to the heavily forested regions of Oregon, its widespread use was not fully realised until Landsat 1 produced the dramatic scene shown in Figure 6. In this scene, with volcanic Mount Hood visible in the left-centre and the Columbia River at the top, clear-cut areas are vividly displayed as light patches within the dark red of the forested areas. Imagery such as this can be of considerable value to Federal and State agencies having the responsibility of monitoring and protecting our forest resources.'"

-- Robert Simmon, Planet

Our next conversation

There’s a question that haunts any data journalist submitting their work for an award: ‘What makes a story award-worthy?’

With different judging panels, and all of their subjectivities to account for, it can be quite a challenge to know which story to select. So, to help shed some light on the process, we’ll be sourcing your thoughts on this question in our next edition.

If you’ve ever enjoyed a winning entry, judged on a panel, or simply follow the data journalism award circuit, please share your wisdom.

And don’t forget to let us know what who you’d like us to feature in our future editions.

Until next time,

Madolyn from the EJC Data team

SPIEGEL ONLINE: AMA with Christina Elmer, Marcel Pauly, and Patrick Stotz

Christina Elmer — Fri, 01 Feb 2019 21:19:00 +0100

What do Google, Facebook, and a German credit bureau have in common? As it turns out, a lot. At least when it comes to operating seemingly harmless algorithms.

At SPIEGEL ONLINE, they’ve taken each of these companies on in a major drive to put algorithms in the spotlight and hold their developers responsible. But with 6 million Google search results, 600 Facebook political ads, and 16,000 credit reports to trawl through, these investigations are no easy feat.

So how do they do it? The answer, they’ve told us, is collaboration.

Welcome to our 18th edition of Conversations with Data, where we have Christina Elmer, Marcel Pauly, and Patrick Stotz with us from SPIEGEL ONLINE’s data team to answer your questions on algorithmic accountability reporting, and the teamwork behind it.

What you asked

As we’ve mentioned, investigating algorithms isn’t for the fainthearted -- something you were also keen to hear about when you asked:

What’s the most challenging aspect of reporting on algorithms?

Christina: “Many of the algorithms we want to investigate are like black boxes. In many cases it is impossible to understand exactly how they work, because as journalists we often cannot analyse them directly. This makes it difficult to substantiate hypotheses and verify stories. In these cases, any mechanism we can make transparent is very valuable. In any case, however, it seems advisable to concentrate on the impact of an algorithm and to ask what effect it has on society and individuals. The advantage of this: from the perspective of our readers, these are the more relevant questions anyway.”

You mentioned looking at impacts -- algorithmic accountability reporting often uncovers unintended negative impacts, have you ever discovered any unexpected positive effects?

Marcel: “We haven't discovered any unexpected positive effects by algorithms, yet. But generally speaking, from an investigating perspective, it can be considered positive that decisions made by algorithms are more comprehensible and reproducible, once you know the decision criteria. Human decisions are potentially even less transparent.”

So, what kinds of skills should newsrooms invest in to report on this beat?

Christina: “There should be a permanent team of data journalists or at least editorial data analysts. It needs this interface between journalism, analysis and visualisation to have the relevant skills for algorithmic accountability reporting on board and to permanently keep them up to date. In addition, these teams are perfect counterparts to external cooperation partners, who often play an important role in such projects.”

Let’s talk some more about the team’s collaborations. Does the team engage external specialists to understand the complexities of certain algorithms?

Marcel: “When we investigated the credit bureau Schufa's scoring algorithm we teamed up with our colleagues from Bayerischer Rundfunk. We were only a couple of (data) journalists and a data scientist, so of course we consulted practitioners like bank insiders who have worked with the algorithm's output. And we asked data and consumer protection advocates about their experiences with Schufa scoring. Only in this way do you get a deeper understanding of the matter you're reporting on.”

Image: SPIEGEL ONLINE.

What about internally? How does your team collaborate with other teams at SPIEGEL ONLINE?

Patrick: “In our workflow, the collaboration between data journalists and editors covering a certain beat is routine practice. However, algorithmic accountability projects come with an extra challenge for ‘ordinary reporters’. Most of the time, they won’t be able to actively participate in the technical part of the investigation and will rely on our work. As a result, we in our team have to develop a deeper understanding of the specific domain than in most other data journalistic projects. At the same time, constant exchange with colleagues that have domain specific knowledge should never be neglected. Our strategy is to establish a constant back and forth between us investigating and them commenting results or pushing us into the right direction. We try to explain and get them involved as much as possible but do realise that it’s on us to take responsibility for the core of the analysis.”

Christina, you’ve written a chapter in the Data Journalism Handbook 2 on collaborative investigations. If there was only one tip you would like readers to take from your chapter, what would it be?

Christina: “Don't dream of completely deconstructing an algorithm. Instead, put yourself in the perspective of your audience and think about the effects that directly affect their lives. Follow these approaches and hypotheses – preferably in collaboration with other teams and experts. This leads to better results, a greater impact and much more fun.”

Our next conversation

From an emerging issue, to something a little bit older. Next time, we’ll be featuring your favourite examples of historical data journalism.

It’s a common misconception that data reporting has only evolved in tandem with computers. But reporting with data has existed long before new technologies. You need only look at the charts featured in the New York Times throughout the 1800s, or The Economist’s front-page table in its inaugural 1843 edition.

Until next time,

Madolyn from the EJC Data team

Reproducible journalism

Jeremy Singer-Vine — Wed, 16 Jan 2019 21:14:00 +0100

We’re back!

It’s been a while since we last spoke -- but we haven’t forgotten you. Over the past few weeks, we’ve spent some time reflecting on our first year of Conversations with Data, which saw us feature advice from 75 individual community members and 17 expert AMAs. It’s been great talking with you, and we want to keep our conversations relevant. Be sure to let us know what you’d like us to cover, or who you’d like us to chat to.

As we gear up to reproduce another great year, we thought it would be fitting to look at journalism that does just that. So, for this conversation, we got talking with Jeremy Singer-Vine, Stephen Stirling, Timo Grossenbacher, and more, about how to set your stories up for repeatability.

For background, check out Sam Leon’s Data Journalism Handbook 2 chapter here.

What you said

1. Documentation is everything

If there was ever a common theme across your responses, it was this: document, document, document.

In the words of BuzzFeed’s Jeremy Singer-Vine, “reproducibility is about more than just code and software. It's also fundamentally about documentation and explanation. An illegible script that ‘reproduces’ your findings isn't much use to you or anyone else; it should be clear -- from a written methodology, code comments, or another context -- what your code does, and why”.

To keep good documentation, Stephen Stirling from NJ Advance Media suggested reminding yourself that others need to understand your process.

“Keeping regular documentation of steps, problems, etc. not only keeps your collaborators in the loop, but provides instruction for others hoping to do the same work. Documentation on the fly isn’t always going to be pristine, so make a point to go back and tidy it up before you make available to others.”

2. Take time to develop good documentation practices

While we’re agreed that documenting is crucial, what does good documentation look like in practice? As our Handbook chapter author, Sam Leon, explained there are many ways to make data reporting reproducible.

“Often just clearly explaining the methodology with an accompanying Excel spreadsheet can enable others to follow precisely the steps you took in composing your analysis. In my chapter in the Handbook, I discuss the benefits of using so-called ‘literate programming’ environments like Jupyter Notebooks for reproducibility,” he said.

There’s no set workflow, and often it may be a matter of tailoring your approach to your project’s needs. Brian Bowling shared some thoughts from his experience:

“A change log would be a good idea for a longer project, particularly when other people will be using or viewing the workbook. When I mainly worked in Excel, I didn't do that. As I branched out into R, Python and MySQL, I started keeping Word documents that included data importing and cleanings steps, changes, major queries used to extract information and to-do lists. Since I wasn't working exclusively in one piece of software, keeping separate documentation seemed like a better idea. It also makes it easier to pull the documentation up on one screen while working on another.”

For those of you working in Excel, have a go at using Jeff South’s solution for keeping track of his workflows:

“Every time I make a significant modification to data, I copy it to a new worksheet within the same file, and I give the new worksheet a logical name ("clean up headers" or "calculate crime rates"). Also, I insert comments on columns (and sometimes on rows) to document something I've done, like "Sorted by crime rates and filtered out null values").

This system means that all changes are clearly documented in one file and you don’t have multiple files with similar names.

3. Consider your software

Timo Grossenbacher, SRF Data, warned: “Scripts may fail to execute only months after initial creation -- even worse, they suddenly produce (slightly) different results without somebody noticing. This is due to rapid development of software (updated packages, updated R / Python version, etc.). Make sure to at least note down the exact software versions you used during initial script creation.”

If you’re using R, Timo’s created an automated solution for version control, which Gerald Gartner said has helped Addendum avoid problems with different versions of packages. They’ve moved all of their analysis to R and now, he said, “the 'pre-R-phase' feels like the dark ages.”

At BuzzFeed, they also use R, along with Python. Jeremy Singer-Vine suggested going for these types of ‘scriptable’ software rather than point-and-click tools. That said, “if you're wed to a point-and-click tool, learn about its reproducibility features, such as OpenRefine's 'Perform Operations'".

4. Make use of data packages

Serah Njambi Rono, from Open Knowledge International, reminded us that datasets are constantly evolving -- and this can limit successful reproducibility.

“Outdated versions of datasets are not always made available alongside the updated datasets - a common (and unfortunate) practice involves overwriting old information with new information, making it harder to fact check stories based on older datasets after a period of time.

Frictionless Data advocates for the use of data packages in answer to the latter issue. Data packages allow you to collate related datasets and their metadata in one place before sharing them. This means that you can package different versions of the same dataset, provide context for them, publish them alongside your reporting and update them as needed, while allowing for repeatability.”

Our next conversation

Next time, we’ll be putting algorithms in the spotlight with Christina Elmer and the data team at Spiegel Online. Have a read of Christina’s Data Journalism Handbook 2 chapter on algorithmic accountability here.

Until next time,

Madolyn from the EJC Data team

Data Journalism Handbook 2: AMA with Jonathan Gray and Liliana Bounegru

Jonathan Gray — Wed, 12 Dec 2018 21:11:00 +0100

What. A. Week.

We know it’s only been a few days since our beta Data Journalism Handbook 2 was launched, but we’re excited to interrupt your reading with a special edition newsletter to celebrate its release.

From its humble beginnings at MozFest in 2011, the Data Journalism Handbook has come a long way to serve as both textbook and sourcebook for what was then an emerging field. It’s been downloaded over 150,000 times, translated into over 12 languages, and helped countless journalists embrace data in their reporting.

But data journalism is no longer emerging. So, in this new edition, our editors, Jonathan Gray and Liliana Bounegru, challenge us to critically reflect on what the field has -- and could -- become.

We gave you the opportunity to quiz them on the second Handbook and (potentially) suss out what’s in store for the full release next year. Let’s see what they had to say.

What you asked

Naturally, a lot of you were interested in the difference between Handbook 1 and Handbook 2:

Why another Data Journalism Handbook?

“We were really lucky with the previous edition, which seemed a good fit for the moment in 2012. It has been widely translated and adopted as a core text on data journalism courses and trainings around the world. We designed it to age well, providing not just practical guidance on how to work with data, but also a diverse snapshot of the hopes and practices of data journalists at that particular moment in time.

Translated versions of our first Handbook

The field of data journalism has travelled a long way since 2012. Not just because of more sophisticated technologies, but also because the social, cultural, political and economic settings of the field have changed. We’ve seen not only major initiatives like the Snowden leaks and the Panama Papers, but also debates and controversies around the role of data, platforms and digital technologies in society. Lots of tough questions have been raised about what data journalism is, who it is for and what it might do in digital societies.

Rather than just having a narrower focus on data practices, we take a broader look at these questions and consider what we might learn from them. In parallel to doing the first edition of the Handbook, we’ve also gone through our graduate studies and into universities where we’ve been researching these kinds of questions about the societal implications of digital data, methods and platforms. The new edition of the book is our attempt to make sense of the field of data journalism and its changing role in the contemporary world, thinking along with a diverse mix of practitioners and researchers.”

What has the evolution of the data journalism landscape meant for you as editors?

“To use a metaphor: there was a time when the number of readily available books in the world was small enough that it was not considered crazy for a librarian to try to build what they could consider a pretty comprehensive collection of books. Their acquisitions policy could just be: ‘whatever we can get our hands on’. In 2012, it might have not have seemed so crazy to, for example, have a go at making a list of data journalists and their projects from around the world. One very practical consequence for us as editors is that we cannot kid ourselves about our partiality: can only hope to cover a comparatively small number of the projects, practices and themes in this ever-growing field, and it was a difficult job to decide what to focus on in the book.

Jonathan and Liliana working hard on the second edition

Also, as we’ve mentioned above, there have been many debates and controversies about the role and status of data journalism, which we’ve sought unpack and address in the book, as well as examining its relation to other developments. Since 2012 we have seen the rise of questions about the societal implications of technology precipitated by actors, events and issues as diverse as Snowden, Trump, Brexit, Bolsonaro, Xi Jinping, the Syrian civil war, Cambridge Analytica, Gamergate, #metoo, #IdleNoMore, Black Lives Matter, strikes and walkouts from tech workers, Uber riots, the European refugee crisis, Facebook’s News Feed algorithms, “fake news” and misinformation, the Gab platform and the rise of far-right populism and extremism. Several chapters in the book suggest how data journalists might attend to and position themselves in relation to such phenomena.”

On that note, how do you think data journalism can help respond these questions, particularly around the use of algorithms and development of fake news?

“While this is a fast-evolving area, there are many ways that (data) journalists can respond. One of the areas that we’re particularly interested in is how to report on not just content, but also its social life and how people share, interact and make sense with it, including the platforms and infrastructures through which it circulates. We’ve provided several ‘recipes’ for digital methods investigations into ‘fake news’ in our recent Field Guide to ‘Fake News’, which has led to collaborations with BuzzFeed News and others. There’s a dedicated section in the Data Journalism Handbook on investigating data, platforms and algorithms.”

Nick Diakopoulos on his chapter, which helps journalists investigate algorithms

Will the Handbook provide information on tools to help journalists undertake this kind of reporting?

“The Handbook does not focus on tools as such. Specific tools enter the story insofar as they are relevant in accounting for different projects or practices. So this made it less difficult from an editorial perspective as we haven’t had to decide about tools, but instead have invited contributors to talk about different projects, practices, methods and settings - and tools have entered into these stories.

This is not, of course, to underestimate the importance of taking tools seriously. As our colleague Bruno Latour puts it: ‘Change the instruments, and you will change the entire social theory that goes with them’. The choice of whether to use a spreadsheet, a relational database, a network, a machine learning algorithm or a visualisation tool to work with data can shape what you notice and attend to, and the way you think about the phenomena you are investigating. We’re currently working on a research agenda at the Public Data Lab examining and comparing network practices in different fields -- including their narrative capacities in journalism. Several chapters in the book look at techniques, and some look at the use of particular tools. But tools as such have not been the editorial starting point.”

Alongside the changes that you’ve discussed so far, many people say that data journalists will eventually become regular journalists with good data literacy. What do you think of this view?

“As you say, many people have emphasised the point that data journalism is still a form of journalism, and that perhaps the ‘data’ part will become less important as data journalism practices become more widespread and more normalised. At the same time, we think it is worth considering when, why, how and for whom the distinction might matter, or not.”

Like the first Handbook, our second edition will also be translated into several languages. Watch this space in 2019, more info to follow soon!

Our next conversation

Just as our second Data Journalism Handbook asks you to reflect upon the field, we’d like to invite you to reflect on our first year of Conversations with Data.

Our first edition was released in May, and since then we’ve produced 16 distinct issues with advice on environmental reporting, scraping, fact-checking and more. As we start thinking about the new year, we want to know what you’d like to see Conversations with Data look like in 2019.

Until next time,

Madolyn from the EJC Data team

Lying charts? AMA with Alberto Cairo

Alberto Cairo — Thu, 22 Nov 2018 21:01:00 +0100

If a picture speaks a thousand words, then it can also speak a thousand lies. For this edition of Conversations with Data, we’re bringing you someone who knows all too well about the visual trumpery of charts.

You guessed it! Alberto Cairo stopped by. Alberto has two decades of experience in data visualisation and journalism, including several books under his belt.

In addition to talking misleading viz, he answered your questions on undervalued skills, workflows, and the infamous pie chart.

What you asked

Let’s start with a question that’s all about getting started...with data viz that is. You asked:

Could you describe your workflow when creating a data viz -- does the question or data come first? Or neither?

"It depends. Sometimes a good visualisation idea appears when you’re exploring data sets at leisure, just to see what you can discover, particularly with the help of experts who know much more about the data and the topic than you do. In other cases, you may begin with a preconceived story you may want to tell, then you try to test it with the data to see if it has any merit; if it does, then you tell it."

With that in mind, what is the most underrated or undervalued skill for data visualisation practitioners?

"The capacity to think as data professionals, storytellers, and visual designers, all at the same time. It’s a difficult balance to achieve, and very few visualisation practitioners are good at everything (I am certainly not!), but when someone is able to balance those up, the result is usually a top visual communicator."

What do you think is the most misleading visualisation type and why?

"All of them can be. One of the reasons why a visual may lie is that we tend to project onto them our existing beliefs and biases. If I already believe that, say, a particular policy is good for the job market, and I see a chart that shows that the job market improved after the policy took place, my brain will immediately infer a causal connection between the two phenomena, even if they may be completely unrelated, and their proximity in time may be just mere coincidence."

So then how do you feel about pie charts?

"Pie charts receive a lot of hatred for no good reason. A single pie chart with three or four segments is harmless. The problem is when we misuse pie charts, and we end up designing them with twelve or even twenty segments! As any other graphic form, the pie chart can be used —or misused. The key is to always think about the purpose of the visualisation you’re about to design, and then assess whether the graphic form you’ve chosen meets that purpose."

Not all pie charts are bad. Keep them simple. Credit: Lulu Hoeller (CC BY 2.0).

Since many journalists have no background in data science, how should the field address this challenge?

"I think that you don’t need to be a professional data scientist and statistician, you just need to educate yourself a bit in numerical reasoning and then try to make friends with a few true statisticians so they can help you make sure you don’t screw up!

Nowadays there are plenty of excellent popular science books that can help anyone educate themselves in numbers. I’d begin with Charles Wheelan’s Naked Statistics and Ben Goldacre’s Bad Science. Then, try to read a few intro to stats textbooks, and do the exercises. Some coding doesn’t hurt either, some R or Python."

Our next conversation

This December, the European Journalism Centre has something exciting in the works. We’re releasing an online beta version of the Data Journalism Handbook 2!

It’s going to be a preview launch, with only a first batch of chapters available, and we’re giving you the opportunity to submit questions for our editors, Jonathan Gray and Liliana Bounegru.

Until next time,

Madolyn from the EJC Data team

Favourite maps

Jon Schwabish — Wed, 07 Nov 2018 20:57:00 +0100

Did you know that there’s an infinite number of possible map projections of the Earth? While cartographers may struggle to find the most accurate representation, this variety means that there’s no shortage of beautiful and innovative ways to map in journalism.

But just because you can map, doesn’t mean you should. To share best practices, we asked our resident geography and cartography geeks for some of their favourites.

Maps you like

With the above warning in mind, we thought it’d be helpful to start with some words of wisdom from PolicyViz’s Jon Schwabish:

"Maps are a tricky thing. On the one hand, people love maps because they are familiar and they can find them easily themselves. On the other hand, geographic areas may not correspond to the importance of the data values (consider, for example, that Russia is about the size of the United States and Australia combined) and thus may result in certain distortions."

That said, distortions can be a beautiful thing -- as Jon’s favourite map demonstrates. The map, produced by Brandon Martin-Anderson, plots a point for every person in the United States.

Brandon Martin-Anderson on CityLab

Jon likes it because: "The only thing you see only thing you see on the screen is the data--there are no state boundaries or city labels, no rivers, no lakes. The only thing you see is the data. And the only reason you recognise it as the shape of the United States is because people tend to live on the borders and coasts. This isn't to say we should plot all the data all the time (and probably too many of us show too much data too much of the time), but this map does a great job of focusing your attention directly on the data."

For another great way of distorting with purpose, let’s take a step back in time to 1948 in Braunschweig, Germany. This marks the beginning of Bollmann maps, a brand of unique aerial views of cities around the world. John Grimwade, from the VisCom school at Ohio University, explains why they’re his favourite:

"Using a modified axonometric projection that Herman Bollmann developed, they show an informative view with diminished roofs and exaggerated sides to help us with on-the-ground navigation. Streets are widened so we can clearly see them. And, unlike a perspective map, the scale is constant."

A 1962 map of New York City by Bollman maps.

Bollman maps aren’t the only type that overcome scaling issues. There’s plenty of other options, like Mustapha Mekhatria’s favourite: the tilemap.

"A tilemap creates an idealised representation of geography by making every geographic area (e.g., country, state, city) uniform in size. Tilemaps allow you to remove the inherent bias in traditional maps, where some areas have more real-estate on the maps than others. Unless the relative size difference between geographical areas is essential, a tilemap helps bring focus on the data that is of most interest," explained Mustapha, from Highsoft.

A tilemap created by Mustapha, using Highcharts.

And now to a map on a mission. Shared and developed by Gramener’s Tejesh P, this map of public health facilities in India provides an excellent example of mapping for solutions journalism:

"The visual narrative can be used by decision makers to make new investments in improving existing health facilities to meet Indian Public Health Standards (IPHS) guidelines. It answers why (IPHS), where (Geo points highlighted) and what (the resources/facilities) questions. The visual narrative not only answers these, but also captures the inaccurate data nuances."

A screenshot of the map, focussing on Ernakulam district, live version here.

Finally, who said all maps have to be strictly cartographical? We were intrigued by Marco A. Ferra’s favourite map from National Geographic, pictured below. In explaining why he choose it, Marco said:

"It’s different from a conventional map a Mercator projection, for example) because it compares, in a very visual and straightforward way, the air pollution between different countries, continents, and GNI per capita. At the same time, it presents an additional dimension of information that is usually not available in infographics: intervals – in this case, the range between the cleanest and most air-polluted cities within the countries."

A partial screenshot of the map, full version here.

Our next conversation

In keeping with the visual theme, we’re excited to announce that we’ll have the renowned author of The Truthful Art and The Functional Art with us for an ‘ask me anything’ (AMA) in our next issue.

That’s right, Alberto Cairo is stopping by!

Until next time,

Madolyn from the EJC Data team

The Markup: AMA with Jeff Larson

Jeff Larson — Wed, 24 Oct 2018 14:17:00 +0200

Welcome to the 13th edition of Conversations with Data! This week, we’re shifting away from our usual focus of reporting with technology, to reporting on it.

That’s right, we’re talking the tech beat.

While there’s a growing number of dedicated journos investigating the social impacts of technology, you might’ve noticed that there are no dedicated newsrooms for this beat. That is, until now.

Step in: The Markup.

Due to launch in 2019, The Markup will leverage scientific and data-centred methods to produce journalism that closes this gap.

The newsroom will be led by a dream team of tech-savvy journos, including Jeff Larson as Managing Editor. To find out more, we got together with Jeff to answer your burning questions. Read on for a sneak peek into their plans.

What you asked

First up, let’s talk goals: What are you hoping to achieve with The Markup?

"In our investigations, The Markup will show the consequences of society’s reliance on new technologies for decisions that impact people’s lives.

We’ve already got a great team in place. Julia Angwin, a colleague of mine at ProPublica, is our Editor in Chief, Susan Gardner, formerly of Wikipedia, is our Executive Director and I’m the Managing Editor of The Markup. We also have fantastic people working on investigations: Lauren Kirchner, Surya Mattu and Madeleine Varner.

In the past, we teamed up on investigations that revealed that criminal risk scores were biased against African Americans, that technology companies were funding hate speech, and that minority neighbourhoods pay higher for car insurance than similarly risky white neighbourhoods. None of those stories could have been done by any of us alone, and I’m excited to build a pipeline of collaborations between data journalists, like myself, and reporters, like Julia, at The Markup.

From ProPublica’s risk score investigation, which wrongly rated the chances of Prater and Borden reoffending. He did, she did not.

In each of those stories, we were inspired by the scientific method. We had a hypothesis, we wanted to test it, we looked at the data and it panned out. We also realised that investigative journalism is a team sport: you need a bunch of people with different talents to really hold power to account."

How much will The Markup focus on day to day news vs. longer investigative work?

"We’re aiming to publish an investigation a month, but we will publish smaller stories every day.

We won’t be doing a lot of breaking news. It just doesn’t make sense for a small newsroom to chase that when there are so many other news organisations that are better at that than we could hope to be.

Our daily stories will be more explanatory for ordinary folks. Technology is serious, it isn’t a Silicon Valley fad, an endless series of new gadgets, or a story of earnings calls and Wall Street deals. It is 2018, and technology is choosing how we raise our kids, who we vote for, who gets what job, who is able to get housing, healthcare and a comfortable life. We feel that people need a helpful and knowledgeable guide through these effects of technology on their daily lives.

The Markup will be that guide. We aim to help people navigate these issues through our investigations and our daily stories. If that seems awesome to you, please keep an eye on our jobs page, we’ll be hiring over the next few months!"

As you mentioned, you’re no stranger to the tech beat. What’s the most challenging technology-focussed story you’ve worked on and why?

"I’m not super into war stories because they drive the focus away from team based work -- what may have been difficult for me probably wasn’t for other folks.

I’ll say that I’m most proud of the car insurance story that we did at ProPublica (with Consumer Reports), because that story was a new way of looking at insurance and it showed that many national insurance companies were overcharging minority neighbourhoods when compared with white neighbourhoods."

Do you have any advice for other data journalists covering the social impacts of technology?

"In general, my advice for data journalists is to focus on human impact. Ask yourself, who is being harmed or affected here? It is an easy thing to focus on trends or novel quirks of data, but it is a much harder task to tell a human story through that data. It is weird, but this is a common practice when covering government policies, like education, criminal justice, or healthcare, but it is rare in data journalism about technology."

Our next conversation

It’s been a while since we last talked about our favourite charts -- we’ve even had a lil bit of data viz withdrawal. So, for our next edition, we figured it’d be fun to share your favourite maps.

Until next time,

Madolyn from the EJC Data team

Data scraping for stories

Paul Bradshaw — Wed, 10 Oct 2018 12:13:00 +0200

We’ve all been there. Searching, searching, and searching to no avail for that one dataset that you want to investigate. In these situations, the keen data journalist might have to take matters into their own hands and get scraping.

For those of you new to the area, data scraping is a process that allows you to extract content from a webpage using a specialised tool or by writing a piece of code. While it can be great if you’ve found the data you want online, scraping isn’t without challenges. Badly formatted HTML websites with little or no structural information, authentication systems that prevent automated access, and changes to a site’s markup, pose just some limitations to what can be scraped.

But that doesn’t mean it’s not worth a go! This edition of Conversations with Data brings together tips from scraping veterans Paul Bradshaw, Peter Aldhous, Mikołaj Mierzejewski, Maggie Lee, Gianna-Carina Grün and Erika Panuccio, to show you how it’s done.

Double check your code so you don’t miss any data

Peter Aldhous -- science reporter, BuzzFeed News

"I fairly regularly scrape the data I need using some fairly simple scripts to iterate across pages within a website and grab the elements I need.

A few years back I used Python Requests and Beautiful Soup, for example for these stories:

Why Are Dope-Addicted, Disgraced Doctors Running Our Drug Trials? (Here I scraped records of disciplinary actions against doctors in New York state from an earlier version of this site.)
The Inside Track (Here I scraped authors, and other metadata, including scientific discipline and citation counts, from 10 years of papers published by the Proceedings of the National Academy of Sciences, more details here.)

Nowadays, in keeping with the rest of my data analysis workflow, which makes extensive use of the R tidyverse, I use the rvest R package, for example for these stories:

Why Track-And-Field Stars Don’t Set World Records Like They Used To (But Swimmers Do) (Here I scraped data on the 100 all-time top outdoor performances in many track-and-field events from the International Association of Athletics Federations website.)
'I Have The Best Words.' Here's How Trump’s First SOTU Compares To All The Others. (Here I scraped the full text of every State of the Union and other Presidential addresses to both houses of Congress from The American Presidency Project website; details of the analysis here.)

I find rvest more straightforward and intuitive than the Python alternatives. There are several tutorials, for example here, here, and here.

Advice: You need to pay a lot of attention to checking that you're getting all of the data. Subtle variations or glitches in the way websites are coded can throw you, and may mean that some gaps need to be filled manually. Use your browser's web inspector and carefully study the pages' source code to work out how the scraper needs to be written. Selectagadget is a useful Chrome browser extension that can highlight the CSS selectors needed to grab certain elements from a page."

Good quality scraping takes time, so communicate effectively and look for existing APIs

Mikołaj Mierzejewski -- data journalist at Gazeta Wyborcza, Poland’s biggest newspaper

"I think there are three different situations which everyone encounters in scraping data from websites:

First one is the easiest - data is in plain HTML and you can use browser tools like Portia to scrape it.
Second - data is trickier to obtain because it needs cookies/session preserving or data is loaded but it requires tinkering with developer tools in the browser to download.
The third one is where you basically need a programmer onboard - it's when data is dynamically loaded as you interact with the site. He/She will develop a small application which will act as a browser to download the data. Most sites will allow downloading data at 0.75 seconds per request speed, but if you want to download loads of data, you will again need a programmer who will develop a more effective scraper.

One of the hardest parts of scraping is communicating your work to your non-technical colleagues, especially with non-technical managers. They need to know that good quality scraped data needs time because as you develop scrapers, you learn the inner workings of someone's web service and believe me that it can be a mess inside.

If you're curious about how we recently used data scraping here are links to articles showing data scraped from Instagram - we took posts that had '#wakacje' ('#vacation' in Polish) hashtag and put their geolocations on a map to see where Polish people spend their vacations. Articles are in Polish, but images are fascinating:

Vacations across different continents here.
Domestic vacations in Poland here.

I'd also add one more thing regarding data scraping -- always look for APIs first, before getting your hands dirty with scraping. APIs may have request limits but using them will save you a lot of time, especially if you're in a prototyping phase. Postman and Insomnia are good tools for playing with APIs."

9 things to remember about scraping

Paul Bradshaw -- Course Director of the MA in Data Journalism at Birmingham City University and author of Scraping for Journalists

"Some thoughts about stories I've worked on that involved scraping...

It doesn't have to involve coding: my BBC colleague Dan Wainwright made a scraper using Google spreadsheets for this story on noise complaints. He wrote a how-to on it here.
Think about T&Cs: we wanted data from a property website but the T&Cs prohibited scraping - we approached them for the same data and in the end, they just agreed to allow us to scrape it ourselves. Needless to say, there will be times when a public interest argument outweighs the T&Cs too, so consult your organisation's legal side if you come up against it.
Use scraping as a second source: for this investigation into the scale of library cuts we used FOI requests to get information -- but we also used scraping to go through over 150 PDF reports to gather complementary data. It meant that we could compare the FOI requests to similar data supplied to an auditor.
If it has a pattern or structure it's probably scrapable: as part of a series of stories on rape by the Bureau of Investigative Journalism, we scraped reports for every police force. Each report used the same format and so it was possible to use a scraper to extract key numbers from each one.
Check the data isn't available without having to scrape it: the petitions website used for this story, for example, provides data as a JSON download, and in other cases, you may be able to identify the data being loaded from elsewhere by using Chrome's Inspector (as explained here, for example).
Do a random quality check: pick a random sample of data collected by the scraper and check them against the sources to make sure it's working properly.
Use sorting and pivot tables to surface unusual results: when scrapers make mistakes, they do so systematically, so you can usually find the mistakes by sorting each column of resulting data to find outliers. A pivot table will also show a summary which can help you do the same.
Scrape as much as possible first, then filter and clean later: scraping and cleaning are two separate processes and it's often easier to have access to the full, 'dirty' data from your scraper and then clean it in a second stage, rather than cleaning 'at source' while you scrape and potentially cleaning out information that may have been useful.
Scrape more than once -- and identify information that is being removed or added: this investigation into Olympic torchbearers started with a scrape of over 5,000 stories - but once the first stories went live we noticed names being removed from the website, which led to further stories about attempts to cover up details we'd reported. More interestingly, I noticed details that were added for one day and then removed. Searching for more details on the names involved threw up leads that I wouldn't have otherwise spotted."

Use scrapers as a monitoring tool

Maggie Lee -- freelance state and local government reporter in Atlanta

"I don't have a 'big' story, but I'd put in an endorsement for scrapers as monitors, as a thing to save beat reporters' time.

For example, I wrote a scraper for a newspaper that checks their county jail booking page every half-hour. That scraper emails the newsroom when there's a new booking for a big crime like murder. Or the reporters can set it to look for specific names. If "Maggie Lee" is a suspect on the run, they can tell the monitor to send an email if "Maggie Lee" is booked. It just saves them the time of checking the jail site day and night. The newsroom uses it every day in beat reporting.

For another example, I have a scraper that checks for the city of Atlanta audits that get posted online. It emails me when there's a new audit. Not every audit is worth a story, but as a city hall reporter, I need to read every audit anyway. So, with this scraper, I don't have to remember to check the city auditor's site every week."

Make sure your scraper is resilient, and always have a backup plan

Gianna-Carina Grün -- head of data journalism at DW

"These two stories of ours relied on scraping:

Follow the money: What are the EU's migration policy priorities?

Here we scraped the country pages of the EU Trust Fund for Africa for information on projects in these countries.

If you have a scraper running regularly, you have to design it in a way that small changes by the data providers in wording on the page or within the data itself will not break your code. When you write your scraper, you should try to make it as resilient as possible.

The scraper code can be found here.

World Cup 2018: France won, but how did Ligue 1 fare?

Here we scraped multiple sources:

to get player names and club affiliations out of the PDFs provided by FIFA
to get information on in which league a club played
to get information on which player played during the World Cup
to get information on how each team scored during the World Cup

When we first did the test run with the WorldCup 2014 data, FIFA provided information on 1, 3 and 4 - and we hoped that we'd get the data in the same formats. Other data teams tried to figure out in advance with FIFA what the 2018 data format would look like (which is a useful thing to try). We planned for the worst case - that FIFA would not provide the data in the needed format and relied on our 'backup plan' of other data sources we could get the same information from.

Code for all scrapers can be found here."

Don’t forget to save scraped data

Erika Panuccio -- communications assistant at ALTIS

"I faced issues related to data scraping when I was working on my Master's degree thesis. I had to collect data about pro-ISIS users on Twitter and I ended up with a database of about 30,000 tweets from about 100 users. I chose the accounts I wanted to analyse and then used an automated IFTTT platform to save the tweets whenever they were published, storing them in spreadsheet format. In this way, I could keep the data even if the account was suspended (which happened very frequently because of Twitter's policy on terrorist propaganda) or if the owner deleted the tweets."

Our next conversation

AMA with Jeff Larson, The Markup

From using technology in stories to investigating its impact. Next time, we’ll have Jeff Larson, Managing Editor of The Markup, with us for an ‘ask me anything’.

With a plan to launch in early 2019 as a nonprofit newsroom, The Markup will produce data-centred journalism on technologies, and how changing uses affect individuals and society.

Until next time,

Madolyn from the EJC Data team

Social data reporting: AMA with Lam Thuy Vo

Lam Thuy Vo — Wed, 26 Sep 2018 19:42:00 +0200

What do you remember most about 2012? Some of you might think back to the London Olympics, or perhaps Gangnam Style… but we’ll always remember it as the year that the European Journalism Centre launched the Data Journalism Handbook.

Flash forward six years and we’re producing a second edition for beta release. But we’re a wee bit too excited to wait until then. In anticipation, we’ve already published an exclusive preview chapter on algorithmic accountability.

And now, this edition of Conversations with Data brings you an 'ask me anything' with the author of the Handbook’s social media chapter, BuzzFeed’s Lam Thuy Vo. She answers your questions on getting started with user-generated content, privacy, combating fake news, and more.

What you asked

What makes social reporting different from other types of data journalism?

Lam: "There’s a tendency among some journalists to examine the social web as a one-to-one representation of society itself. It’s a natural inclination: the ways in which we see information come in on newsfeeds, timelines and comment strings, makes it seem continuous and like a representation of the people who compose our actual immediate environment.

But social media data is odd: while there’s a rigidity to its format that’s akin to that of other data sets (date times, content categorisation like text, video or photos, etc.), it comes with all kinds of irregularities related to who posted the content."

Can you give us an example?

"Take the data from a Facebook group, for instance. Even if a group has thousands of followers, only a fraction of them may actually actively react or comment on content, let alone post. The posts may come in spurts or fairly frequently without much of a regularity to them -- and this data may change at any moment as more people comment or react to the posts. All this makes for highly unwieldy data that needs to be interpreted with care and caution."

In addition to this risk of misinterpretation, social data is inherently personal, bringing with it privacy and ethical challenges. How can journalists address these?

"First, there’s an approach I often refer to as a quantified selfie -- an examination of a person’s data with their permission and also with their help of interpretation. Since social media data is by nature highly subjective, doing stories about an individual is likely best done with their help to interpret these stories."

From Lam’s quantified selfie project, which used most played songs to showcase the emotional impact of moving to New York.

"When working with content created by everyday people, journalists should also be very conscientious about people’s privacy and what the amplification of social media posts could mean to them. BuzzFeed News’ ethics guide contains very helpful language on the subject: 'We should be attentive to the intended audience for a social media post, and whether vastly increasing that audience reveals an important story -- or just shames or embarrasses a random person'."

What are some other challenges?

"What’s particularly difficult is that social data can be very ephemeral. For one story, for instance, Craig Silverman, Jane Lytvynenko, Jeremy Singer-Vine and I looked into how popular hyperpartisan news organisations were as compared to their more mainstream counterparts. The hardest part was that during the reporting, data parsing (more than four million Facebook posts) and analysis, Facebook pages included in the analysis kept being deleted. Not only did we have to deal with large amounts of data that would strain my computer, this data set was also a constantly moving target."

Following on from that example, how do you think journalists can use social media data to combat fake news?

"One very important part of combating the spread of misinformation and false made up stories is to report on the subject and find ways to hold companies accountable while also educating news consumers."

"...there’s a huge need for people to learn about the fallacies of how we see the world through social media. Explanatory reporting in that realm can be hugely impactful. From the distortion of information through filter bubbles to issues surrounding troll attacks, through examples of other individuals we can hopefully raise a level of scepticism in people to prevent them from sharing information in reactionary and emotional ways, and to encourage a more critical reading of the information they encounter online."

A screenshot from Lam’s trolling visualisation, illustrating to audiences what a Twitter attack feels like.

Finally, do you have any advice for journalists who are new to user-generated content?

"While each story is unique I’d say there is one good guideline I’ve picked up: be specific in your stories -- people can get lost in data stories that are too large in scope. For example, it’s more definitive and hard enough to do a story about one particular group of politicians who are Facebook users and are spreading hate speech, than to try and prove the spread of hate speech among an entire part of society."

Our next conversation

While social media data can usually be obtained via API, there’s often times where you’ll have to scrape it yourself. For our next conversation, we want to feature your tips for these situations.

Until next time,

Madolyn from the EJC Data team

Environmental data journalism

Fenja De Silva-Schmidt — Wed, 12 Sep 2018 19:30:00 +0200

The environment is full of data. Whether it’s emissions levels, pollution measurements, or satellite imagery, there is certainly no shortage of metrics to inform environmental journalism.

But despite the volume of data available, reporting on the environment can be complex and challenging. To help, we asked for your advice and collated these into a roundup of five must-read tips below.

What you said

1. Take time to understand your data

It’s tempting to want to dive into a new dataset straight away, but doing so risks producing a misleading or inaccurate story. First, advised scientific journalist and founder of formicablu, Elisabetta Tola, you need to understand the science behind the data, including the models, variables and measures used by scientists to collect the data.

To this end, Dianne M. Finch suggested that you interview your data as you would a human source: Who produced the dataset and why? How was the data collection project funded? What is the margin of error? Where are the outliers--and are they data entry errors or actual outliers?

Done correctly, this process will ensure that your journalism meets standards of fairness and accuracy. For example, Jamie Smith Hopkins from The Center for Public Integrity told us about a situation where their checks identified that a manufacturing plant was accidentally listed among America’s 100 biggest polluters due to an emissions reporting error.

2. Reach out to scientists

As part of your workflow to understand the context and format of your data, don’t be afraid to reach out to scientists and researchers working in the field.

Scientific data explained Elisabetta Tola, "are produced with highly specialised software, and the outputs are often in formats which are not the ones we use in data journalism".

For example, the Texas Observer’s Naveena Sadasivam told us about how hazard classifications, provided by the National Inventory of Dams, have misled reporters to assume that a 'high' hazard dam is in poor condition and is more likely to fail. But, in actuality, this classification is simply a measure of the level of damage if a dam fails.

Asking scientists and advocates for their input can also help to focus your reporting, said Aleszu Bajak from Northeastern. "If it’s going to be overwhelming, they’ll tell you. If it’s interesting and achievable, they may help."

But, Elisabetta Tola warned, do not treat a scientist’s insights as infallible. Look for conflicts of interests, biases, and the possibility of misinterpretation. Above all: "it is crucial not to base an entire story on a single source, albeit an expert one, and integrate it with others to test and verify."

3. Avoid scary data tactics

If you’re looking to convey a big picture message about an environmental issue, ironically the advice from Global Forest Watch is to narrow things down and make them relevant.

"Often people feel overwhelmed or even powerless when learning about complex, global environmental issues. Bring it back by personalising stories or even including a call to action so that your readers can relate."

This finding is echoed in media effect research. Fenja De Silva-Schmidt, from the University of Hamburg, said that stories which frame climate change in terms of ‘catastrophe‘ and ‘guilt‘ are often demotivating for audiences, rather than helping them understand and mitigate climate change in their daily lives.

To this end, Gianna-Carina Grün, Head of Data Journalism at DW, shared an example where they used travel costs to personalise CO2 emissions data.

DW.

Although it’s a great way to connect with readers, Gianna warned that hyper-personal projects, such as The New York Times hometown interactive, can present their own set of challenges. Many readily available datasets don’t hold data on a very granular level, and most teams lack the resources to research adequate datasets, one country at a time. So, her tip is to personalise by scaling down to a region or even just a few countries.

4. Consider the constraints of visualisation

The nature of geospatial data can also impose limitations on how it can be visualised. Timo Franz, from Dumpark, said that they often face technical constraints, such as loading times and computational limitations, when mapping large and detailed datasets.

When producing their dot map of plastic pollution in the world's oceans, for instance, this meant that they had to reduce the number of overall dots, ending up with each dot representing 20kg rather than more intuitive 1kg or 10kg representations.

The end product, featuring more than 13 million dots each representing 20kg of plastic pollution.

Similarly, Timo noted that mapping around the Earth’s poles can also be problematic, as many tools use projections that aren’t well suited to it.

As a more general visualisation tip, Federica Fragapane, creator of Carbon Dioxide Emissions, explained the importance of providing your audience with different levels of visual analysis. In her piece, she complemented the main informative layer, annual country-level carbon dioxide emissions between 1992 and 2012, with colour shading to illustrate how the emission situation has evolved throughout the years.

5. Sometimes you have to take matters into your own hands

Fiona Macleod, Editor at Oxpeckers Investigative Environmental Journalism, told us that many government bodies and civic organisations claim their data is open source, but negotiating access can prove tricky and time-consuming.

In many instances, they’ve had to use South Africa’s freedom of information framework, or try to find ways to scrape the data from the web.

"Sourcing, cleaning, sorting, analysing and applying the data in this way may be counter-intuitive and time-consuming, but in some instances it's the only way to get the job done," she said.

Our next conversation

We're excited for an ‘ask me anything' with the author of the Data Journalism Handbook 2's social media data chapter, Buzzfeed’s Lam Thuy Vo, next week.

Until next time,

Madolyn from the EJC Data team

Fact-checking: AMA with Africa Check, Tirto.id, RMIT ABC Fact Check, Correctiv/EchtJetzt, and Factchecker.in

Anim van Wyk, Dinda Purnamasari, Matt Martino, Tania Roettger, Samar Halarnkar — Wed, 29 Aug 2018 16:36:00 +0200

Ever wondered if a politician’s claims really add up? Or perhaps you read a news story which seemed a little fishy? Armed with data, fact-checking organisations across the globe work tirelessly to help separate these facts from fiction, and any misnomers in-between.

Welcome to the 9th edition of Conversations with data, where -- as I’m sure you’ve already guessed -- we’re talking fact-checking. To find out more about debunking with data, we’ve gathered a global group of fact-checkers for an exclusive ask me anything.

What you asked

We all know that data can be manipulated to say anything. With this in mind, you asked:

How do fact-checkers deal with situations where the same data has been used to support different sides of an argument?

Anim van Wyk, Chief Editor, Africa Check: "We’re fond of the quip that some people use statistics 'as a drunken man uses lamp posts - for support rather than illumination'. Depending on what you want to prove, you can cherry-pick data which supports your argument.

An example is different stances on racial transformation in South Africa, or the lack thereof. A member of a leftist political party said in 2015 that 'whites are only 10% of the economically active population but occupy more than 60% of the top management positions'. The head of the Free Market Foundation, a liberal think-tank, then wrote: 'Blacks in top management… doubled.'

Both were right -- but by presenting only a specific slice of the same data source to support their argument."

Dinda Purnamasari, Senior Researcher, Tirto.id: "In our experience, many use the right data, but the context is incorrect. Then, the data becomes incredible.

For example, reports that PT Telkom (state-owned telecommunication company in Indonesia) had provided Corporate Social Responsibility funds of around IDR 100 million to a Mosque and, in comparison, IDR 3.5 billion to a church.

We found that the numbers (IDR100 million and IDR3.5 billion) were right, but the purpose of the funding was incorrect. The 100 million was granted by PT Telkom in 2016 to pay the debt from a mosque renovation process. On the other hand, 3.5 billion was granted to renovate the old church, which also became a cultural heritage site in Nusa Tenggara Barat in 2017.

In this case, again, the context of data becomes an important thing in fact-checking. We must understand the methodology and how the data was gathered or estimated, even by double-checking on the ground, if needed."

This brings us to your next question:

How do you determine the reliability of a data source?

Matt Martino, Online Editor, RMIT ABC Fact Check: "When considering a source, it’s always pertinent to ask: 'what is their agenda?' If their motivations for providing data might influence the data in a partisan way, it’s best to leave it alone. As always, it’s a good idea to consult experts in the field on what is the best source to use in verifying a claim."

Tania Roettger, Head of Fact-Checking Team, Correctiv/EchtJetzt: "When we’re investigating a claim, one task is to understand what exactly a given piece of data is able to tell. We establish how and why it was collected, what it contains and it excludes. Usually, we note the shortcomings of a statistic in the article. Whenever we are uncertain about the evidence we have gathered, we discuss the issue among our team."

What about situations where data on an issue isn’t available?

Samar Halarnkar, Editor, Factchecker.in: "If data are not available -- or independently verified data are not available -- there is only one substitute: Verification through old-fashioned, shoe-leather reporting.

For instance, India’s Prime Minister once claimed that his government had built 425,000 toilets within a year. With no independent verification, this claim was hard to dispute. Obviously, it was impossible to verify that 425,000 new toilets had indeed been built in all of India’s schools. But after sending reporters to conduct random verifications in eight Indian states, it quickly became apparent that the Prime Minister’s claim was -- to put it plainly -- a lie."

What are some good examples of data in fact-checking?

To end, you requested that our fact-checkers share good examples where data has been successfully used to check a claim. Here’s a few they came up with:

This debunked claim that refugees in Germany sent 4.2 Billion Euros to their home countries in 2016
Tirto.id’s investigation into a claim by the president of Indonesia that the country’s economic growth ranked third in the world
A look at Australia’s historical financial accounts by RMIT ABC Fact Check
A new database, created by Factchecker.in, to provide a missing picture of cow-related violence in India

For more detail on these fact-checks, and the rest of your questions, check out the full interview here.

Our next conversation

And now to a subject that is itself often the focus of contentious reportage: the environment.

How have you used data to report on climate change, pollution, or other environmental issues? What are some common challenges on this beat? Got any advice for overcoming them? Share your wisdom.

Until next time,

Madolyn from the EJC Data team

Favourite chart types

Alvaro Valiño — Thu, 16 Aug 2018 16:32:00 +0200

Hi there! Can you believe it’s our 8th edition of Conversations with Data? To celebrate, we’re bringing you eight of your favourite chart types.

But, before we begin, it’s important to bear in mind that the best chart type the depends on your dataset.

As Birger Morgenstjerne reminded us, "charts are all about making your data/information accessible, effective and powerful to your audience. A common mistake is not paying attention to what you’re trying to communicate. Visualising data is the discipline of what you want to communicate (data/story) and how to tell it (the visualisation). Always ask, does the chart support and make your story shine through?"

If you’re in doubt about which chart to choose, Mona Chalabi, Data Editor at the Guardian US, suggests: "visualise your data in two different charts and ask a friend (or better yet, your mum) which is the least confusing to them."

Okay, now, let’s chat charts!

Charts you like

1. Timelines

"I love enriched timelines because they help to provide valuable context to the data shown. In these days of avalanches of data, providing context and connecting pieces of information can help to transform them into accessible knowledge. As with most of charts, labelling is a critical part of the process, many times neglected by chart creators.

For example, these timelines summarising the lives of two well-known artists: Michael Jackson and Paul Newman. They provide a landscape vision into their existence just after they die. They also facilitate insights into their artistic careers thanks to the connections established between professional-personal dimensions. The reader can get a sense of their richest creative periods because their achievements are shown in a temporal context." - Álvaro Valiño, Publico

Álvaro's Paul Newman timeline for Publico.

2. Grid chart

"When plotting multiple data series with very similar or very different values, a single chart may obscure more than it reveals. One alternative way to visualise multiple series that overcomes this is called the 'small multiple' technique, or 'grid chart'. This technique splits each series across individual charts in a grid and makes it easy to read and compare each series." - Mustapha Mekhatria, Highsoft

3. Pictorial small multiple

"My favourite chart type is the 'pictorial small multiple'. Each of its marks is a miniature canvas. At a glance, illustrated marks anchor the viewer to the chart's topic. In focus, each mark is an opportunity for detailed inspection, and comparison between marks. The arrangement of the marks provides an order for higher meaning to emerge. Together, many levels of understanding are possible." - RJ Andrews, author, Info We Trust: How to Entertain, Improve, and Inspire the World with Data

A pictorial small multiple devised by Bashford Dean and drawn by Stanley Rowland.

4. Stacked bar chart

"Line graphs and bar charts are really the first go-to charts I use. Almost all of the time, you can quickly get your data organised and visualised using one of these two charts, but my favourite is probably the stacked bar chart. Any time someone tries to use a pie or doughnut chart, I convert it over to a stacked bar. Horizontal or vertical, it doesn't matter. Representing a portion of a whole should be done using rectangular boxes, not radial degrees like a pie chart." - Brian Suda

5. Flow map

"One type of chart I particularly like is the flow map. The arrows easily communicate that movement is taking place in this chart. Although they’re not great for accurately displaying values, flow maps are much better at giving a generating view of the amount of movement of good or people taking place over a geographical region." - Severino Ribecca, The Data Visualisation Catalogue

6. Beeswarm chart

"Although I don't truly have a favourite chart type, I do find myself using a beeswarm technique to position my data quite occasionally. I like how this technique still allows you to show the data at a very detailed level (each datapoint separately), while simultaneously showing how all of the data is distributed, typically across one axis. And you can play with the 'datapoints' as well, you can colour them, resize them or give them some other visual mark based on a variable that shows even more insights, and therefore context, about the data.

Furthermore, instead of one datapoint, you can also apply a beeswarm to 'small multiples'. Where each 'mini chart' is positioned on the screen by using some aggregate value of the mini chart (an average for example). So I love the combination of versatility, the level of detail it can show, and the general visual appeal it gives to a data visualisation." - Nadieh Bremer

Using the technique in The Top 2000 loves the 70s & 80s.

7. d3-force

"I don't necessarily have a favourite or even a go-to chart type, but I do have a favourite D3.js module. d3-force is an implementation of the force-directed graph drawing algorithm, an algorithm for calculating the positions of nodes in a network graph. I love it for its flexibility: I can calculate node-link graphs with it, but I can also calculate beeswarm plots with it. I can cluster nodes into groups with it, and I can make the nodes fly around the screen like sprinkles. I've thought up a visual for my data and gone to d3-force to make that idea into reality. And that's why I don't have any particular chart type I love because there isn't any one chart that meets even most of my needs, but there are tools that I love because they help me translate what's in my brain to visuals on the screen." - Shirley Wu

8. Column chart

"My favourite chart type is the humble column chart, here annotated for added effect. It triggers the viewer’s most intuitive comparative understanding – comparing heights of items – and is therefore extremely simple to process cognitively. Simplicity is power, especially in communicating crucial facts such as those about our rapidly changing planet and the impact we’re having on all of its inhabitants." - Mikko Järvenpää, previously CEO, Infogram

Mikko's Column chart, illustrating the CO2 impact of protein choices.

Our next conversation

Like all journalism, charts have the potential to mislead if they aren’t checked correctly. So, for our next conversation, we thought it’d be fitting to turn our minds to the work of fact-checkers.

Ever wondered how data is used in fact-checking? We’ve got a diverse and global team of fact-checking organisations ready to answer your questions.

Happy summer!

Madolyn from the EJC Data team

West Africa Leaks: AMA with Will Fitzgibbon and Daniela Lepiz

Daniela Q. Lepiz — Wed, 01 Aug 2018 16:25:00 +0200

Companies in West Africa make billions every year, yet most of the region’s citizens live on less than $2 a day. Why? Well, as West Africa Leaks revealed, the answer often lies in two words: tax evasion.

Welcome to the 7th edition of Conversations with Data, featuring your questions on the investigation. We’re joined by Will Fitzgibbon, from the International Consortium of Investigative Journalists (ICIJ) and Daniela Lepiz, from the Norbert Zongo Cell for Investigative Reporting in West Africa (CENOZO).

Credit: ICIJ/Rocco Fazzari

Over several months, Will, Daniela, and the team painstakingly poured over 27.5 million documents, previously leaked through the Panama Papers, Paradise Papers, and more, to reveal how the region’s elites hide billions offshore. Here’s what they had to say.

What you asked

With such a large volume of data to work through, you asked: how did you determine a starting point for the analysis?

Will: "Data is never the be all and end all. One misconception some people -- even reporters -- still have is that leaked documents tell a complete story. That’s rarely true and certain isn’t true with any of these offshore leaks. So the starting point of West Africa Leaks was explaining and understanding the offshore system. We had to ask questions like:

'Why would someone want to use a Panama shell company?'
'What does it mean if the person uses a nominee shareholder?'
'What laws, if any, in Mali, Senegal, Niger or Liberia require citizens or politicians to declare financial assets?'

We needed to ask the big questions first before diving in. Otherwise, you’re going in blind and likely to get the wrong end of the stick."

Daniela: "Companies or individuals will often set up offshore structures following the same patterns. With Will in the room, we managed to analyse such patterns and those that had led to successful investigations before. For example, he instructed reporters to “check if the person you are investigating had to declare financial assets according to his/her position in X organisation”. With such advice, the journalist could benefit from the knowledge built up already."

Right, so then how did you corroborate and verify the stories that were published?

Will: "After years of reporting on these documents, we have strong confidence in them. But still, every leaked document must be corroborated and verified. So, for example, if the address of a politician was given at a certain street and city in a country, for example, we would use open records (if available) or existing media reporting to confirm that address. We also used online government registries, where available, to confirm the incorporation date of certain offshore companies.

The real takeaway is this: leaked documents are your friends, but make sure you know who your friends are before trusting them completely."

From a technical point of view, what was the biggest challenge?

Daniela: "We tried to tackle all possible complications at the moment we met in Senegal. From encryption and security of communication to searching the documents (some even in Spanish language!) everything was covered when we met. However, how can you search a database that needs three encryption methods if you have three power cuts within two hours? You can’t download the files, you have to reload every time, re-login, re-start the search and so on. It is not impossible but it is hard. That is the reality many of our journalists faced. And yet they managed to succeed."

And the tools you used?

Will: "We’re talking about millions of PDFs, emails, image files, invoices, bank statements and spreadsheets. Some go back to the 1980s and even 1970s! ICIJ has a crack team of data experts who used technologies like Apache Tika (to extract metadata and text), Apache Solr (to build search engines) and others (you can read more about it here -- I’m not one of these data geniuses!).

ICIJ’s data team built open source software called Extract that helps make documents machine-readable and searchable. That’s key -- this was how we could search for words like “Monrovia” or “Ouagadougou” and get results. We then use user-friendly web portals, such as Blacklight for the Panama Papers, to which all participating journalists receive secure logins via encrypted email.

We also use -- and provide to partners -- programs such as the Linkurious database and Neo4j, a technology that allows data to be converted into much more human-friendly graph form."

What’s next? Are there plans to extend the investigation to the rest of Africa?

Will: "Yes! ICIJ and CENOZO have plans to make sure that all of West Africa is covered. If there is anything that we have learned from major data leaks it is that the more, the merrier. Sometimes it takes that one journalist with knowledge about a certain piece of a puzzle to be able to find what hundreds of other reporters missed. ICIJ is also very keen to develop more and deeper partnerships with journalists in Eastern Africa."

Is your question missing? To read Will and Daniela’s full Q&A, click here.

Our next conversation

For our next edition, we thought we’d get visual -- we're going to hear all about your favourite type of chart and why.

Until next time,

Madolyn from the EJC Data team

World Cup data journalism

Ashley Kirk — Mon, 16 Jul 2018 16:18:00 +0200

...And the winner is France! Congrats les bleus!

Still running high on football fever, we’re bringing you a special edition of Conversations with data, to look back at a tournament full of inspiring data journalism.

What you made

Throughout the Cup, journalists have used data to answer questions about odds during penalty shoot-outs, how well teams have been playing, and, of course, who will win. At The Economist, they even looked at why these predictions will probably be wrong.

And your projects were no different. As you’ll soon see, there is no shortage of questions about the World Cup that data journalists can’t answer.

1. Which country has the most expensive squad? (link)

"To source the data, we used OutWit Hub to scrape Transfermarkt. We then crunched these figures to rank the teams, writing a couple of paragraphs one each and then producing one small visualisation for each of them. These were beeswarm plots (made in RStudio with the ggplot library) that plotted each player on the same axis. This meant that even for a reader scrolling through the page quickly, they could not only see which team is the most valuable but they could also see how each players' individual valued contributed to this." - Ashley Kirk, The Telegraph

2. How many club teammates opposed each other? (link)

"One part of the World Cup that always interests me is that players transition from teammates on professional clubs to opponents on national squads. I pitched this idea to Reuters and they were interested in supporting it. There may be simpler ways to show these overlaps, but using a Voronoi treemap creates a beautiful display and one that evokes the shapes on a football. It is produced using D3.js and the data came from a number of sources (It's trickier than might you think to find the league of every professional club in the world)." - Andrew Garcia Phillips for Reuters

Credit: Reuters

3. What’s the ideal team lineup? (link)

"As someone who loves data science and has grown up playing FIFA, it came to my realisation that I can use the data from EA Sport’s extremely popular FIFA18 video game released last year to do my analysis.

Here is the step-by-step approach I used:

Get the FIFA 18 dataset from Kaggle.
Do some exploratory data analysis and data visualisation of important player attributes.
Write functions to get the best squad according to the players' overall rating.
Apply the functions to derive results for the 10 national teams.
Compare the results and make predictions on the potential winner." - James Le for Towards Data Science

4. Which teams have the most foreign-born players? (link)

"There is a rather shallow notion the surfaces in the United States (my home country) that the World Cup features teams made just of national stereotypes. People joke about 11 stern Germans taking on 11 flashy Spaniards, or a speedy Senegalese team up against a group of tall Dutch players. But when in you dig into the data on the origins of international soccer players, you find a lot of diversity in many countries, which I wanted to show to subvert that tired narrative." - Riley D. Champine, National Geographic

Credit: National Geographic

5. Which teams should’ve made the second round? (link)

But wait, there’s more. This is just a snippet of the projects you made. For a full rundown, including in-depth details from behind the scenes, click here. "Expected Goals provides a data point that allows us to measure how well each team should have fared, based on the chances they created and allowed.

Here’s how we turned raw data into a DDJ-piece: We used an R script to process the raw data from Opta Sports, make the calculations we needed and create an output file for each graphic we needed for the article. Most of the simple graphics in the article, i.e. mostly tables are created with our own storytelling toolbox Q, the other graphics are either handcrafted in Sketch or done in R and then polished in Sketch." - David Bauer, NZZ

But wait, there’s more. This is just a snippet of the projects you made. For a full rundown, including in-depth details from behind the scenes, click here.

Our next conversation

From football to West Africa. Next time, Will Fitzgibbon, from the International Consortium of Investigative Journalists, will be answering your questions about West Africa Leaks.

Will led a team of more than a dozen reporters from 11 countries, who painstakingly poured over 27.5 million documents from previous data leaks to reveal how the region’s elites hide billions offshore.

Until next time,

Madolyn from the EJC Data team

Open data

Jonathan Stoneman — Wed, 11 Jul 2018 16:06:00 +0200

Hello again! Welcome to the 5th edition of Conversations with Data.

Each conversation, we’ll be featuring exclusive data journalism advice, sourced directly from you or an expert in the field.

Last time, we asked Journalism 360 about immersive storytelling. Today, we’re talking open data.

What you said

The term ‘open data’ might imply that it’s readily available, but it was clear from your submissions that accessing relevant data isn’t without its challenges.

Fabrizio Scrollini shared some examples from Ojo Publico, a Peruvian news agency, which created a website to monitor the country’s health service delivery. Although they were able to access data through freedom of information (FOI) laws, it took a lot of effort to gather the data through paper trails, before getting lucky with information hidden in a few pdfs. (Eick!)

When reporting on child disappearances Pinar Dağ also faced issues getting data through FOI requests. Her tip: "it is always important to create an alternative open data source".

But, how to do this?

Before you start, said Jonathan Stoneman, ask "who would have this data?", "where would they put it?", and "what might the dataset be called?".

Eva Constantaras also provided suggestions: "use global databases, find publications from local universities, check out the research being produced by local think tanks. If government data is unreliable, find alternative data sources to get the most accurate data available, just like you would when interviewing different sources."

In short, she said, "start with the data you have, not the data you would like to have".

Becky Band Jain, from the Centre for Humanitarian Data, added: "it’s helpful to first have a hypothesis of what you’re looking for in the data, i.e. the story you want the data to tell. For journalists working with the datasets available on the Humanitarian Data Exchange (HDX), we’ve tried to make it easy for anyone to search for topical datasets or find datasets specific to each country".

And then there’s the trouble with data quality. The European Journalism Centre’s own Eric Karstens knows about this all too well from his work on the YourDataStories project. He explained that "hardly any dataset is entirely error-free", but "even the tiniest inaccuracy may undermine trust in open data wholesale".

With this in mind, "don't assume the data is correct, and make follow up calls," said Paul Bradshaw.

"When we worked on the unsolved crime data a couple police forces said that the data was wrong, and resubmitted it. When The Bureau Local worked on a story on councils dipping into reserves, they found a bunch of them were 'inaccurate and unchecked'. Here's another example on ambulance waiting times."

But, he warned, "if someone says it's wrong, don't take their word for it. One of the forces who said the data was wrong couldn't really back that up."

Engaging with other journalists and the open data community can also help you overcome these challenges.

Nikesh Balami (CEO, Open Knowledge Nepal) and Giuseppe Sollazzo (founder, Open Data Camp) both recommend collaboration as a means to overcome common difficulties working with, and accessing data. Some good places to network and find potential collaborations include chapters of the Open Knowledge Foundation and open government networks.

You should never be afraid to reach out! Mollie Hanley, of OpenCorporates, sees it as the responsibility of open data projects to connect with wider audiences and share skills. This, in turn, can help them measure impact.

While we’re talking about collaboration, a lot of you mentioned the need for journalists to open up data from their own projects.

From Anastasia Valeeva, co-founder of School of Data Kyrgyzstan: "I consider it crucially important to publish our own datasets under the open data license together with the story. Not only does it make your story bulletproof, it also shows that you hold onto the principles you require from the others".

Gianfranco Cecconi went further to say that "journalists should be concerned when they are not using open data". When journalism is based on non-open data, "the reader won’t be enabled to 'check the calculations' for themselves and perhaps get to a different conclusion".

In this way, Natalia Carfi, Deputy Director at Open Data Charter, pointed out that open data can help in the fight against fake news.

Our next conversation

I don’t know about you, but we’ve been eagerly following the World Cup through all the brilliant data journalism that it’s inspired.

With the final coming up on Sunday, we’ll be putting together a special early edition of Conversations with Data next week.

Until next time,

Madolyn from the EJC Data team

Immersive storytelling: AMA with Journalism 360 ambassadors

Thomas Seymat — Tue, 03 Jul 2018 15:58:00 +0200

Welcome back to our data conversation!

As you know, each edition, we’ll be crowdsourcing your advice in an area related to data journalism, or inviting you to ask an expert for their tips.

Last time, we thought it would it be fitting to gather your thoughts on using the crowd to source data. For this edition, we gave you the opportunity to ask a selection of Journalism 360 ambassadors questions about immersive storytelling.

Journalistic uses of augmented reality (AR), virtual reality (VR), 360 video, and other techniques, have gained momentum over the past few years. Some cool data examples include The Wall Street Journal’s VR roller coaster, which follows the ups and downs of Nasdaq, Google News Lab’s Brexit project, and Lookout360°, a 6-month climate change immersive story accelerator.

While these examples highlight the potential of immersive data journalism, we know that a lot of you still have questions about using these techniques yourself. So, without further ado, let’s get immersed.

What you asked

First off, you asked: what is the most common mistake journalists make when starting out with immersive storytelling?

Thomas Seymat, Euronews: "Assuming it’s too difficult or too different from what you are used to is a common misconception before jumping in. It’s not that different, and the role of the journalist is still key with this medium. And when it comes to immersive content, be very mindful of movements -- avoid pitch, roll and yaw at all cost".

Robert Hernandez, USC/JOVRNALISM: "Waiting. You don’t need expensive equipment and you don’t need to wait for this technology to go mainstream before you start exploring this emerging field. Start now! Fail and learn from those experiences to be better prepared for when this tech does go mainstream, whether through phones, headsets or desktops".

Retha Hill, New Media Innovation and Entrepreneurship Lab: "Trying to direct the action using 2D techniques. Viewers might not look where you want them to, so it is important to onboard them -- use audio and other cues to get them to look around and follow the story".

On Retha’s last point, you also asked about the difference between creating immersive data visualisations and 2D ones.

Óscar Marín, Outliers Collective, provided the following notes:

"There are no UX/UI guidelines for immersive data visualisation. We’re still discovering them, while classic DataViz UX is very well-established."
"There are several layers of information based on the distance from the user viewpoint (intimate distance, ‘menu’ distance, background), each one with its own rules regarding usability and data visualisation. In 2D, we only have the ‘rollover’ paradigm."
"There are 3D equivalents to well-established 2D metaphors such as treemaps and networks, but there’s still some work trying to figure out what to do with the extra dimension."
"In my opinion, what works really well is ‘old’ 2D visuals -- for example, a map -- that ‘launch’ content and data on the extra dimension, thus transforming the data visualisation into a ‘platform’ that helps dig deeper into data and content."

Following on, there’s a lot of talk about whether immersive storytelling is better at fostering empathy than 2D formats. You asked if the ambassadors agree.

Robert Hernandez: "What is powerful about this technology is that you get immersed into an experience. You can put yourself in other people’s shoes and that certainly offers you a unique perspective compared to other mediums. A searchable spreadsheet can only take you so far. Even a 2D data visualisation, while effective, has its limits. The ability to immerse yourself into the data or bring data into the real work has so much potential".

Does this mean you have to be a technical person to produce immersive journalism?

"No, especially if you use self-stitching cameras. Plus more and more tools are being released that are making AR and VR more accessible to non-developers."

So then, what technology is best for journalism 360 projects?

Thomas Seymat: "I think one of the most interesting technologies for 360 journalism projects is WebXR because it helps build web-based VR or AR experiences. It’s not the easiest to master, but being readily available via browsers is better in terms of UX because the users don't need to download an app".

Óscar Marín: "On a low-level, WebVR tools like AFrame, which are similar to what HTML+CSS is for the ‘current’ web. On a high-level, tools like Fader that help you build stories without coding. My understanding is that at the very moment a good, clean, easy interface for building stories launches, 360 journalism adoption will sky-rocket".

Robert Hernandez: "Go with an affordable, self-stitching camera like the Insta360 ONE to start. Use free platforms like Google Tour or Story Spheres to make your 360 photos come alive with interactivity and sound. When you are ready for videos, I recommend higher-end cameras stitched via Mistika VR and edited with Adobe Premiere and plugs like Skybox. Then use Google Poly and Snapchat Lens Studio to start easily exploring AR. Then dive into photogrammetry with RealityCapture and Unity".

Our next conversation

Thanks to everyone for contributing questions! If you’ve been inspired to start experimenting, don’t forget that Journalism 360 is currently offering grants of up to $20,000 each to test, refine and build out an immersive project.

Next time, we’re sourcing your advice on telling stories with open data. Got a surprising success story? Or a go-to resource? Let us know.

Until next time,

Madolyn from the EJC Data team

Crowdsourcing

Stuart Shulman — Sat, 30 Jun 2018 14:06:00 +0200

Hi there!

It’s time for another data conversation.

In case you missed our first conversation, we’ll be inviting you to share your thoughts, or ask an expert questions, about anything and everything data journalism.

Each edition is essentially a mini crowdsourcing project. So, we thought we’d help ourselves out a little and crowdsource some advice, about crowdsourcing, from you.

What you said

It seems that success is all in how you ask the crowd for help.

Start by giving simple and straightforward tasks, suggests Eliot Higgins, from Bellingcat: "for example, Europol's Trace an Object Stop Child Abuse campaign asked people to simply identify objects, a clear task with an obvious ending point. The biggest failures have been when large groups of people approach a complicated task with no oversight, the Reddit Boston Marathon Bombing investigation being the prime example of this".

Andrew Losowsky, Project Lead at The Coral Project, had similar thoughts.

He reminded us that "most crowdsourcing fails, and that's ok. If you want to create the most likely conditions for success, make sure it's well publicised, easy to understand, is clearly connected to a compelling result, has an easy way for people to be notified when that result appears, and includes ways for you to address abuse by bad actors. It might still fail. Keep trying".

Timing is another factor to consider.

From Mozilla’s Vojtech Sedlak: "when soliciting ideas or opinions from our community, we found it important to identify an inflection point that attracts attention to the issue we want to learn about. Instead of just asking people out of the blue, we waited for a media story or an event that garnered public attention and only then launched our surveys".

Some other suggestions for enticing the crowd to contribute include partnering with civic society (Flor Coelho, LA NACION Data) and appealing to the crowd’s competitive instincts by presenting the data collection process as a game (Myf Nixon, mySociety’s Gender Balance project).

But be wary, says Stu Shulman, "one challenge is knowing when more is not helping".

It’s also important to remember that the way you ask for data can have implications for your analysis further down the line.

One of the frustrations with crowdsourcing, says ProPublica’s Peter Gosselin, is that "while it provides loads of sources, you can't use the group for quantitative purposes because of self-selection and non-randomness". If he could change anything about their IBM project, he’d restructure their questionnaires to cope with these problems.

Likewise, if Clare Blumer, from the ABC’s investigation into Australian aged care, could go back in time, she’d ‘fork’ their survey questions so that people who were satisfied and dissatisfied with their aged care experience would be asked a different set of questions.

Without separating their data, the ABC struggled with the data wrangling process. In the end, data was duplicated into smaller spreadsheets so that their journalists could read all the material that was submitted.

Looking for more tips? Check out this report from the Tow Center for Digital Journalism. It’s Roberto Rocha’s go-to whenever his newsroom at CBC Montreal wants to do a crowdsourced project.

Our next conversation

For our next conversation, we’re giving you the opportunity to ask a selection of Journalism 360 ambassadors questions about immersive storytelling. It’s a great time to think about experimenting in the area, with plenty of grants up for grabs.

Until next time,

Madolyn from the EJC Data team

Algorithmic reporting

Laurence Dierickx — Wed, 30 May 2018 13:59:00 +0200

Hello, data enthusiasts!

Welcome to the second edition of the European Journalism Centre's data newsletter.

Each edition, we’ll be opening up a conversation about data with you. We know that the best way to master data journalism is by doing, which means you’re in the best position to help others grow from your mistakes and learn your triumphs.

To kick off the conversation, we asked you to submit your experiences reporting with algorithms. We also gave you the opportunity to ask Nick Diakopoulous, one of our Data Journalism Handbook authors, questions on the subject.

What you said

For those of you using automation, your advice focused on the importance of defining purpose. While automation projects have the potential to find and produce stories on underreported data, these can be overshadowed by gimmicky uses of technology.

Jens Finnäs, from Newsworthy, submitted: "automation is not - or should not - only be about technology. With an engineer mindset it is tempting to develop automated solutions because you can, rather than because someone asked you to. Going forward we are taking influence from design thinking and asking ourselves what real problems can we solve."

Laurence Dierickx agreed that, before undertaking an automation project, journalists should ask themselves: "what is the added value in terms of journalistic purpose?" Once this is defined, you should consider whether a project is suitable by assessing your data’s quality, reliability, and relevance to your story.

Even if your data doesn’t meet these standards, you might still have a chance at finding a story. "If data are a total mess, this could also tell you a lot about the management of the data producer (and a good story could be hidden behind)," she said.

What you asked

Now to reporting on algorithms.

In his Data Journalism Handbook chapter, The Algorithms Beat, Nick looks at the emerging practice of algorithmic accountability reporting. For this beat, the traditional watchdog role of journalists is focused on holding powerful algorithms to account. Think ProPublica’s Machine Bias series or Gizmodo’s coverage of Facebook’s PYMK algorithm.

But, you asked, how does this compare to traditional watchdog reporting?

Nick: "Watchdogging algorithmic power is conceptually quite similar to watchdogging other powerful actors in society. With algorithms you’re looking out for wrongdoings like discriminatory decisions, individually or societally impactful errors, violations of laws or social norms, or in some cases misuse by the people operating the algorithmic system."

So, what’s the difference?

Nick: "What’s different is that algorithms can be highly technical and involve machine learning methods that themselves can be difficult to explain, they can be difficult to gain access to because they’re behind organisational boundaries and often can’t be compelled through public records requests, and they’re capricious and can change on a dime."

Does this mean that non-data journalists can’t investigate algorithms?

Nick: "Non-data journalists can absolutely participate in algorithmic accountability reporting! Daniel Trielli and I wrote a piece last year for Columbia Journalism Review that outlines some ways, like looking at who owns an algorithm and how it’s procured, as well as reporting on the scale and magnitude of potential impact."

Read Nick's full answers here.

Our next conversation

Well, that’s it for algorithms.

We learnt a lot crowdsourcing your advice and questions for our first conversation. This got us thinking about the challenges and opportunities of using crowdsourced data in more complex reporting projects—which brings us to the topic of our next conversation.

That’s right, we’re going to be crowdsourcing your thoughts on crowdsourcing.

If you’ve ever asked the crowd for data, worked with crowdsourced data, or even submitted data as part of the crowd, we want to hear from you!

Until next time,

Madolyn from the EJC Data team

Our concept

Madeleine Gandhi — Thu, 17 May 2018 07:00:00 +0200

Welcome to the European Journalism Centre’s new data newsletter!

Last month, the EJC held a special session at the International Journalism Festival called Conversations with data. We wanted to find out how data journalists are using data to ask and answer questions in their reportage.

What became clear is that data, like human sources, needs to be prodded, queried, and interrogated to unearth important stories. But it can be challenging for journalists to develop the technical know-how and skills to find, and then interview, data.

So, we thought we’d continue the conversation with you.

We want to share your top tips and tricks so that others can learn from your mistakes and your triumphs. We’ll also be giving you the opportunity to ask questions of practitioners at the top of the field.

Let’s get started!

In our first edition, we’re going to talk algorithmic reporting. From automated news writing to reporting on the role of algorithms in society; if you’ve tried it, we want to hear about it.

We’ll also be giving you the opportunity to ask Nick Diakopoulos questions on the subject. Nick is the author of the early release Data Journalism Handbook chapter on The Algorithms Beat, and Assistant Professor at the Northwestern University School of Communication where he directs the Computational Journalism Lab.