Mastering the art of data collaboration

Conversations with Data: #99

Do you want to receive Conversations with Data? Subscribe


Welcome to the latest Conversations with Data newsletter.

Before we dive into this issue, we wanted to remind you to take The State of Data Journalism 2022 Survey! Don't miss out on some cool prizes! Available in four languages, the survey will close at the end of December.

Unnamed 1

Now on to the podcast.

Mastering the art of data collaboration is no small feat. And this is even more true when conducting document-heavy investigations. But the reward for finding untold stories can make it worth your time. That's what we learned from MuckRock's data journalists Betsy Ladyzhets and Dillon Bergin in a live Discord chat a couple of weeks ago.

The pair talk about their latest long read on how data and collaboration can power public health investigative stories. We also hear how MuckRock's tools can help journalists bring about transparency to hold the government to account for document-based reporting.

Listen to the podcast on Spotify, SoundCloud, Apple Podcasts or Google Podcasts. Alternatively, read the edited Q&A with Betsy Ladyzhets and Dillon Bergin.

Tell us about MuckRock's tools and its mission.

Dillon: MuckRock is a non-profit, collaborative news site that helps give you the tools to hold the government accountable. Our tools aren't just for journalists but also for citizens, researchers or activists. First and foremost, we are FOIA nerds. The rest of what we do largely stems from that. The MuckRock website helps people file FOIA requests, while DocumentCloud helps people analyse and share government documents they get from those requests. And more recently, the editorial team now helps newsrooms use public documents and data in their reporting. We are based in the United States, and Betsy and I have collaborated with reporters in newsrooms from Utah, Missouri, Michigan, Mississippi and Louisiana.

What other resources does MuckRock provide to help people understand document-based reporting?

Betsy: The value that MuckRock provides is showing people what can be done with FOIA requests and document-based reporting. We often will publish reporting recipes taking people behind the scenes of our projects. We also publish explainers that show how to use a certain type of document or how to think about requesting a certain type of information. I believe these resources can be useful no matter where you are.

How does MuckRock's editorial team collaborate with other newsrooms?

Betsy: MuckRock's editorial team is relatively new and came out of the Documenting COVID-19 project, which Dillon and I both worked on. We're still working out the best processes for starting collaborations and ensuring that editorial expectations are clear. These collaborative stories tend to come from datasets, or specific topics that we think can be better uncovered or explained through data or records. We will typically approach local partners that seem like a good fit for a particular region or topic.

One example is the Uncounted Project about excess deaths. But another long collaboration that I've worked on has been looking at the pandemic response in the state of Missouri with the Missouri Independent, a non-profit newsroom that covers statewide politics and policy. Dillon has been very involved with a project in Chicago looking at air quality data from a new network of air quality sensors set up in the city. We've also worked with other non-profit newsrooms, like the Idaho Capital and the Salt Lake Tribune.

Tell us more about the Uncounted Project.

Betsy: This project started with an article that the Documenting COVID-19 Project published in the summer of 2021 with the Kansas City Star, a Missouri news outlet, talking about a coroner based there who told the project that he went against CDC guidance and he would write down a cause of death that did not include COVID-19, even if the person probably had died from COVID-19 if the family didn't want COVID-19 on the death certificate. This story is indicative of larger issues in the death system in the United States, where we have a very decentralised healthcare system, as the pandemic highlighted.

The United States also has a decentralised death system for investigating and recording how people die. Every state has their own process, and even within states, you can have different counties with different procedures and levels of resources. Sometimes the people doing this work for a particular county may even be elected and face political pressure. Or they might not be trained at all or have very limited training in how to do autopsies or understand how somebody passed away. All of this came to light in that story, which went viral and caught the interest of Andrew Stokes, a demography professor at Boston University's School of Public Health.


How did your collaboration with the team of demographers at Boston University happen?

Betsy: Andrew Stokes reached out to us saying he had also been looking at this problem and was trying to figure out where COVID-19 deaths might have been undercounted across the country. He found that individual story about this one coroner in Missouri to be an example of this bigger problem. Our team at MuckRock started collaborating with Andrew Stokes, and we have been working with him and his team over the last year and a half. Dillon was the lead reporter on stories that explored this undercounting problem and highlighted a couple of specific places: one county in Missouri, one in Louisiana and one in Mississippi.

This past year, I've also been working on a follow-up story that looks more at demographic patterns with excess deaths. This involves trying to understand if there are certain groups of people -- looking at race and ethnicity -- that are more likely to be undercounted. That story will come out in the next month or two.

How did this mix of collaboration with experts and local reporters help bring the story home?

Dillon: I think one of the really important parts of the story was the different systems and contexts of death investigation across the country. And that's also what made this project so important for us in terms of collaboration because while we were working with demographers at Boston University to see the high-level view of the metric of excess deaths, we were also wondering what was causing this in these different places across the country.

To be able to do that, we needed to work with local reporters in each of these places and understand how death investigation works in this area of the country. Is it a coroner system? Is it a medical examiner system? Is it something else? What are the trends in deaths in mortality in this area and nationwide? Does that trickle down into these different places across the country? We wanted to find out how that was happening and why.

Dillon, you note that your reporting often begins and ends with two questions: Who does the data serve? What does the data conceal? Tell us more about this approach.

Dillon: In many instances, investigative reporting is exactly that. I think it's about the information that some people, usually in positions of authority or power, want to remain undisclosed. So as a reporter, I often ask myself, who would it serve to have this information or who should know about this data that doesn't know? Who should know about this information that doesn't? This approach to data reporting became clear to me while working on a story about evictions in New Mexico.

Finally, what coding languages are you most comfortable working with?

Dillon: I started coding with Python, but the programming language I now use regularly is R. The R Tidyverse packages are handy for data analysis. I find it easy to do basic math, such as slicing and dicing, pivoting, and aggregating data. Increasingly, I'm using it for data analysis, mapping data, and geospatial analysis. Many great packages are maintained by different academics who study the climate, environment and geography.

Betsy: Dillon is more of a coder than I am. I'm still a novice at R and Python, although I have some familiarity with both. I mostly use Excel and Google Sheets to do basic data analysis. And then I'll work with folks like Dillon, who have more of a coding background. I'm a big fan of Flourish, Datawrapper and Tableau for data visualisation tools.

Latest from

The State of Data Journalism Survey 2022 is back for the second year in a row! Help us understand how the field is evolving by sharing your insights. This edition also includes a special section on the Russia-Ukraine conflict. Available in Arabic, English, Italian and Spanish, respondents can also win some prizes. The survey will close on 31 December 2022, and we will share the results in early 2023. Take the survey today and spread the word!

Unnamed 2

Data has never been more critical for journalists since the COVID-19 pandemic. But collaboration is also crucial for data teams to find untold stories when covering public health. In our latest long read article, MuckRock's editorial team explains how working with a mix of experts and local reporters helped them investigate excess deaths in the United States.

Unnamed 3

Newsrooms worldwide are stepping up their climate coverage by investing in resources to grow and support their reporters. In this long read article, journalist Sherry Ricchiardi examines some leading examples of data-led climate reporting.

Screenshot 2022 12 14 at 14 46 59

Want to be on the podcast or have an idea for an episode? We want to hear from you. Don't forget to join our Discord data journalism server for the latest in the field. You can also read all of our past newsletter editions or subscribe here.


Tara from the EJC data team,

bringing you supported by Google News Initiative and powered by the European Journalism Centre.

PS. Are you interested in supporting this newsletter or podcast? Get in touch to discuss sponsorship opportunities.

subscribe figure