How to save data journalism

Conversations with Data: #98

Do you want to receive Conversations with Data? Subscribe

Unnamed 66

Welcome to the latest Conversations with Data newsletter brought to you by our sponsor, a web hosting company that was established in Iceland to provide safe harbour for freedom of speech, free press and whistle-blower projects. You can get 15% off FlokiNET's products and servers by using the promotion code DATAJOURNALISM.

Don't forget that all registered members have FREE WEB SPACE AND DOMAIN with Just log in to your profile and visit the goodies page.

Unnamed 70

The State of Data Journalism 2022 Survey is back! Help us understand how the field is evolving by sharing your insights. Take the survey and win some cool prizes!

Now on to the podcast.

Print and broadcasting news outlets have long archived stories. But for the past decade, data visualisations and interactive content have not been preserved. So what can be done to save data journalism?

We spoke with Bahareh Heravi from The University of Surrey and Simon Rogers from Google to explore some of the challenges and solutions to this problem. The pair discuss the importance of archiving and the necessary next steps to be taken by the news industry.

Listen to the podcast on Spotify, SoundCloud, Apple Podcasts or Google Podcasts.

Alternatively, read the edited Q&A with Bahareh Heravi and Simon Rogers.

What we asked

Bahareh, you co-wrote a research paper on preserving data journalism. What is stopping news organisations from archiving data visualisations and other dynamic content?

Bahareh: The main issue is how complex interactive data visualisations are. News organisations don't have the mechanisms for preserving this content. They are very experienced in preserving and archiving text-based, image-based or video-based content. First, archiving was analogue, and then it moved to digital. Interactive data visualisations are complex objects with many interdependencies and dependencies. As we move forward, the programming languages behind them change. For example, interactives used to run on Flash, but now they run on JavaScript. The interdependencies between them mean the software, the host, and the connections between the host and the news organisation must work for preservation to happen.

All the bits and pieces of the article need to be properly connected, and if one fails, that link falls apart, and the data visualisation doesn't load properly. Preserving data visualisations is similar to preserving software or digital games. It is not easy. Nevertheless, it should still happen. What is worrying is that a lot of content has already been lost in the past ten years or so.

Simon, as data editor at The Guardian, you worked on many interactive projects. What's your perspective on archiving data journalism?

Simon: It's funny, isn't it? Almost everything I've worked on probably doesn't work now at The Guardian. I can't think of many projects that do in terms of the bigger interactive projects. And the weird thing is, I have a poster at home of the very first issue of The Guardian. I can read the stories from the very first page of the very first Guardian, but I can't see an interactive that was the biggest thing on the site for a few days seven or eight years ago. This is partly because journalists do what we want them to do, which is "live in the now". There's a scrappiness to journalism that enables us to be innovative and get stuff done on a deadline. A different approach is applied to interactive and data content than written content.

How does the newsroom cycle lend itself to archiving content?

Simon: When I was at The Guardian, one of the rooms I enjoyed was the archive room. You could get all these old issues of the paper or have digitised versions. News organisations have a well-worn archive system for generating stories. You create a follow-up and then many other follow-up stories after that. There is almost a circle of life in a news story which can carry on for years and years. Journalists think of interactives differently. They see it as something they've got to get out tomorrow, and that's it. Libraries change, code changes. It starts to be expensive to keep things going. Sometimes you have to rebuild something from scratch because a programming language or tool has been discontinued that relies on a key code library. That means you have to start from scratch if you're going to keep it for minimal return.

For instance, one of the projects we worked on at The Guardian was the MPs' expenses story. It was the first big crowdsourced news organisation project. It was switched off just before I left because otherwise, we would have had to rebuild the whole thing. It costs money to keep these things going. It's expensive, so it dies. It's not that we want a project necessary to live on, but we want to know that it was there. It should have had an imprint, and that's important for institutional memory. There's even an inspirational thing about that for later generations.

How does archiving affect the way you teach data journalism?

Bahahreh: For me, showing those early versions of data journalism is important. It has historical importance. When you want to start teaching a topic to students, you want to be able to give them a bit of history and tell them how it all started. Those data visualisation pieces are part of the history of data journalism. This can be a bit tricky because when you refer to them, they don't exist. A 404 page appears. It is useful for my session on archiving, but there's always this worry that if I open any piece in the class, it may not work.


What recommendations do you have for preserving data journalism that you mention in your research paper?

Bahareh: In the research paper, we came up with two sets of recommendations. One of them involved long-term infrastructural recommendations that are not very easy to do. They are expensive and need time and resources. News organisations could work on these in collaboration with tech organisations to reach a point where they ensure data is not lost. The other set of recommendations is easy to implement and could be done by journalists individually. These are not too far off from what Simon mentioned about not having the interactive work exactly as it does but capturing it somehow for future readers.

In our recommendations, we draw on a concept called significant properties. The idea is that the journalists could develop a set of significant properties for their story by capturing some of the properties or characteristics of a visual or interactive piece. In the case of our research, it is about the work's behaviour, its interactions with the user, and the intent and intended purpose of that specific piece. What did it want to convey to the audience?

Let's take the MPs' expenses story Simon worked on for The Guardian. The journalists could develop a set of significant properties from this specific visualisation by asking what is important for the future reader in 15 years. It could be the overall shape of the project, how it looked like, but also the types of content that were there and what the interactions were. By identifying these kinds of significant properties, then they can start capturing what is important.

What practical steps would the journalists take to capture these significant properties?

Bahareh: One of the recommendations is that journalists should try to capture several screengrabs of the interactive, which is very easy to do. Based on the significant properties you identify, it could be done by one or even five screengrabs. If screengrabs aren't enough, you could create a GIF animation which encapsulates more screengrabs. Some news organisations are already creating GIF animations to promote their content on social media. However, it would be even better if these could be preserved in the same way other image pieces are preserved in their existing archives.

If a GIF animation is not enough, record a video to show the reader an idea of what your visualisation is. What is most important is to make sure that whatever content you have captured from the interactive visualisation, make sure it is also linked to the piece. It shouldn't be an afterthought. As you publish your article, make sure you have recorded it, created it, and put it on whatever platform you use to automatically or semi-automatically archive your work. It is also important to make sure there is a link.

What steps would you like to see for dynamic content to be archived in the medium term?

Bahareh: Our research suggests that in the near future, news organisations could implement some simple "if-then" code in their CMS. For instance, the code could say, "If this visualisation did not load, load this JPEG, GIF or video instead in this piece of content". If that's too complicated to maintain, there could be a caption underneath any interactive data visualisation saying, "If this did not load, access this version as a JPEG, GIF or video through this link". This would allow you to show your reader what was here. It is not perfect. You can't see everything, but you have something that could give an idea of what was there.

Unnamed 67

How can we build on what is already archived?

Simon: One option would be to have a YouTube channel or something simple that will still be around. Whatever we build, we want to build out redundancy as much as possible. We already have The Sigma Data Journalism Awards, which has three years of data journalism entries. That will build up over time. Before that, we had the Data Journalism Awards, where screenshots were submitted with each entry. The content is there; we need to bring it together.

Bahareh: If this is done at the time of publication, it would only take 30 minutes. But this could ensure the content remains for 10 to 50 years.

How do we convince journalists preservation of their work is important?

Simon: It is a serious issue. People don't realise it's a serious issue, which is part of the problem. The answer is not necessarily that something has to work. The answer has to be that we know that it was there, what it did, and how it worked.

One thing that journalists are all interested in is immortality. That's why people love seeing their byline. You do need some incentive for this to exist. It has to be low enough work that it is manageable for people. It would be best to have people responsible for it to get things started. Librarians exist for a reason. Their job is to do that. Almost every news organisation has a set of archivists or librarians who think about keeping stuff. But this is too difficult or is outside of their work.

What else could be done to encourage the industry to put data journalism archiving front and centre?

Simon: Data journalism needs something similar to a Rock and Roll Hall of Fame. We need to create something with enough prestige for people to want to be inducted into it. Congratulations! Your piece has been chosen to be archived in the hall of fame.

Bahareh: We could also have an exhibit where people could walk through and experience the dead or lost content. It might be hard to compile the old content so it can be put on display.

Finally, what are your parting thoughts on archiving and data journalism?

Simon: This conversation has made me think we need to discuss this again this year. I'm definitely going to apply myself to this. Archiving is only becoming more and more important. Legacy is important, and data journalism has this legacy of really talented people. And to see this stuff vanish -- some of the most innovative and exciting work in journalism right now -- it can't happen. We have to make sure it doesn't.

Bahareh: We need to act quickly and collaborate. There are a lot of people who are knowledgeable and interested in this topic. If we could bring these stakeholders together as soon as possible, we could do something about this. We could involve data journalists, news organisations, tech companies, third-party data visualisation companies, digital preservation organisations, and the academic and scientific community.

Latest from

The State of Data Journalism Survey 2022 is back for the second year in a row! Help us understand how the field is evolving by sharing your insights. This edition also includes a special section on the Russia-Ukraine conflict. Available in Arabic, English, Italian and Spanish, respondents can also win some prizes. The survey will close on 31 December 2022, and we will share the results in early 2023. Take the survey today and spread the word!

Unnamed 68

How do you inject data storytelling into radio packages? Radio broadcast journalists can find this challenging given they only have a few minutes to tell the story. In our latest long read, Robert Benincasa, a computer-assisted reporting producer in NPR's Investigations Unit, provides detailed examples and treatments for merging audio and data journalism. Read the long read here.

Unnamed 69

Want to be on the podcast or have an idea for an episode? We want to hear from you. Don't forget to join our Discord data journalism server for the latest in the field. You can also read all of our past newsletter editions or subscribe here.


Tara from the EJC data team,

bringing you supported by Google News Initiative and powered by the European Journalism Centre.

PS. Are you interested in supporting this newsletter or podcast? Get in touch to discuss sponsorship opportunities.

subscribe figure