Humanising climate data for solutions reporting
Conversations with Data: #82
Do you want to receive Conversations with Data? Subscribe
Welcome back to the Conversations with Data newsletter!
We hope you had a relaxing and restful summer and are ready to dive back in! At DataJournalism.com, we are excited to bring back our monthly live Discord chats. Our next live conversation will take place next Wednesday at 3 pm CEST with Code For Africa. Add the Discord chat to your calendar.
Now on to the podcast!
Climate data journalism offers a vital opportunity for organisations and people looking to spur action, find solutions and hold power to account. But what do data journalists need to be mindful of when covering this pressing issue?
To better understand this, we spoke with data journalist Clayton Aldern from Grist, a nonprofit, independent media organisation dedicated to telling stories of climate solutions and a just future. In this episode, he explains how climate investigations can rely on data analysis and machine learning for solutions reporting. We also hear about his go-to coding languages and digital tools for wrangling climate data.
What we asked
Tell us about Grist and its approach to solutions journalism when covering climate justice reporting.
There is often this tension in media between solutions journalism and perhaps the more traditional vein of journalism known as accountability reporting. I don't think there needs to be that tension there. At Grist, we're interested in illustrating what a better future looks like. By extension, we're interested in people reaching out toward that future, realising that it's possibly within grasp if one indeed puts a little effort in. So, I think where accountability reporting or investigative journalism and solutions journalism meet is indeed at that point of desire.
Talk to us about the data-led investigative piece you worked on covering abandoned wells that is now a finalist in the ONA Awards.
This piece was a collaboration with my Grist colleague Naveena Sadasivam and The Texas Observer's Christopher Collins. During the middle of the investigation, I joined them to provide some data support, which became an excellent analytical and visual aid to the piece. But the two of them were responsible for the deep investigative dives here looking at abandoned oil and gas wells in the Permian Basin region of the United States.
As oil and gas companies weathered volatile oil prices last year, many halted production. More than 100,000 oil and gas wells in Texas and New Mexico are idle. Of these, there are about 7,000 "orphaned" wells that the states are now responsible for cleaning up. But statistical modelling by Grist and The Texas Observer suggests another 13,000 wells are likely to be abandoned in the coming years. A conservative estimate of the cleanup cost? Almost $1 billion. And that doesn't consider the environmental fallout.
How did you use machine learning in this story?
This is a perfect problem for machine learning classification. There are two classes of inactive wells -- those that are not producing wells at any given time, and then there are proper orphaned wells that have been inactive for a number of years or even up to a decade. There is a clear distinction between these two as categorised by the states. We looked at the data to see if we could build a model that distinguished between orphaned and inactive wells.
It's important to remember there are a whole lot of characteristics that describe wells. There's the depth, the type and the region. You can imagine building a logistic regression model that seeks to distinguish between the two classes and then asking how well it does on a chunk of the data it's never seen before. We sought to distinguish between inactive wells and orphaned wells as categorised by the state. We found that there's this subpopulation of inactive wells that are statistically indistinguishable from orphaned wells.
What has been the impact of this investigation?
Well abandonment and well plugging is a hot button issue in the United States right now. If you think about the infrastructure negotiations, for example, it's come up again and again. Why is that? There are many wells in the country, and there aren't great rules on the books. Some of these plugin cost formulas have bonding formulas written decades ago. Since then, drilling technologies have really improved, allowing for more of this activity. This isn't just an issue for Texas and New Mexico. It's also an issue for California, Colorado, and the Dakotas, where there's a lot of shale drilling.
As soon as we published this investigation, many advocates, environmentalists and environmental lawyers and indeed oil and gas industry folks came out of the woodwork and said, "Hey, we're working on this issue in this other state. Can you apply your model to that state? We would like to know what the coming wave of abandonment looks like in our environment." This is an open source model that anybody can use. It takes a little bit of technical know-how. And obviously, you need a data set. But this analysis is replicable in other contexts.
Tell us about your coding skills. What tools do you rely on regularly?
Finally, what advice do you have for journalists looking to specialise in climate data reporting?
My advice for folks looking at the climate crisis is not to forget where you've come from. If you come from a traditional journalism background where you wrote about people, you ought to be doing the same thing when you're writing about the climate crisis. I think there's this tendency to imagine data journalism as a kind of island that sits off the coast of journalism. Some see it as this special thing that the programmers do, and then you end up with these nice, immersive visual features. That's all well and good, but you still want to use your traditional investigative reporting toolkit.
You should still be doing archival research, filing public records requests, and most importantly, you should still be talking to people. Remember to examine how the dataset impacts communities. The climate crisis is a story of environmental injustice and climate injustice. If you are missing those angles, you are either missing the lead or burying it.
Join our live Discord chat on 29 September at 3pm CEST.
Our next conversation will be a live Discord chat on Wednesday, 29 September at 3 pm CEST with Tricia Govindasamy and Jacopo Ottaviani from Code For Africa. Tricia is an award-winning South African scientist specialising in Geographic Information Systems and data science. Jacopo is a computer scientist and senior strategist who works as Code for Africa’s Chief Data Officer. We will discuss what the data journalism and media landscape look like in Africa. We will also hear about some regional data projects related to climate change and gender data. Add it to your calendar here.
Latest from DataJournalism.com
Our latest long read article features case studies examining how a team of researchers, journalists and students used digital recipes to delve into COVID-19 conspiracy content sold on Amazon, the world's largest online retailer. Written by Jonathan Gray, Marc Tuters, Liliana Bounegru and Thais Lobo, this walk-through piece serves as a useful guide for researchers and journalists seeking to replicate a similar collaborative investigation using digital methods. Read the full article here.
News Impact Summit is happening tomorrow! Register today!
Want to make your journalism more diverse and inclusive? Join us tomorrow for European Journalism Centre's News Impact Summit on Thursday, 23 September. Learn from industry experts about innovative approaches to bringing about equality in the newsroom. Register here.
Tara from the EJC data team,
PS. Are you interested in supporting this newsletter or podcast? Get in touch to discuss sponsorship opportunities.