Name: DataJournalism.com
Price range: $

Home
Read
Newsletters
Reproducible journalism

October 2023

OSINT for environmental investigations

September 2023

Navigating environmental justice with data

July 2023

Connecting local and climate journalism

June 2023

Powering your climate solutions reporting with data visualisation

December 2022

Mastering the art of data collaboration

November 2022

How to save data journalism

October 2022

Inside the Uber Files
Eye on data journalism in Iran

September 2022

The state of FOI in Europe

August 2022

Making sense of data collection during conflict
Innovative storytelling for war coverage

July 2022

Navigating the Russia-Ukraine War with OSINT technologies

June 2022

The rise of data journalism in Asia
Breaking into data journalism

February 2022

Inside Outlier Conference 2022

January 2022

The history of data journalism

December 2021

Uncovering systemic inequality with data

November 2021

Vaccinating Europe's undocumented: A policy scorecard
Exploring data journalism in Brazil

October 2021

Inside the Pandora Papers
Eye on Africa's data journalism eco-system

September 2021

Humanising climate data for solutions reporting
Inside the OpenLux investigation

August 2021

Exploring the sound of data
Inside The Economist's "Off the Charts" newsletter

July 2021

Build a data hypothesis for your next story
Delta variant vs. the vaccine rollout

June 2021

Closing the data literacy gap

May 2021

Designing better data visualisations
Visual storytelling inside The Pudding

April 2021

Tackling quantitative imperialism
The story behind Datawrapper

March 2021

Better data visualisations
Finding data outliers for solutions journalism
Investigating crime and corruption with data

February 2021

What's on at #NICAR21
Vaccines and variants: What you need to know

January 2021

Privacy Day 2021: What journalists need to know
Digital verification for human rights advocacy

December 2020

Navigating vaccine hesitancy
Taking stock of open data

November 2020

Mind the map
Meet the undercover economist

October 2020

Politics and probability
FinCEN Files: Q&A with ICIJ's Emilia Diaz-Struck

September 2020

Storytelling beyond charts and graphs
Q&A with Alberto Cairo

July 2020

A.I. adoption in media
AI in the newsroom

June 2020

The power of predictive analytics
Trustworthy data

May 2020

Detecting deepfakes

April 2020

Debunking disinformation
Data uncertainty & COVID-19
Visualising stories around COVID-19

March 2020

Q&A with Simon Rogers
Closing the gender data gap with Data2X

February 2020

Verification for data
Neurodiversity and design

January 2020

Decoding the dynamics of a data team
Q&A with The Sigma Awards team

December 2019

AMA with La Nación
AMA with Steve Doig

November 2019

Sensor-based journalism
AMA with Vincent Ryan

October 2019

Following the money
AMA with Rachel Glickhouse

September 2019

APIs for journalism
AMA with Code for Africa

August 2019

Learn to code like a journalist
AMA with Kuek Ser Kuang Keng

July 2019

AMA with The Economist's data team
Bad charts
Under-reported news

June 2019

AMA with Eva Constantaras

May 2019

Parsing the European Parliament elections
Data on the crime beat
Ethical dilemmas in data journalism

April 2019

Python and newsrooms: AMA with Winny de Jong
Data visualisation trends and challenges: AMA with Data Journalism Handbook 2 authors
Launch of DataJournalism.com

March 2019

Why we love journalistic databases
Data journalism in the post-truth world: AMA with First Draft

February 2019

Award-worthy data journalism
Historical data journalism
SPIEGEL ONLINE: AMA with Christina Elmer, Marcel Pauly, and Patrick Stotz

January 2019

Reproducible journalism

December 2018

Data Journalism Handbook 2: AMA with Jonathan Gray and Liliana Bounegru

November 2018

Lying charts? AMA with Alberto Cairo
Favourite maps

October 2018

The Markup: AMA with Jeff Larson
Data scraping for stories

September 2018

Social data reporting: AMA with Lam Thuy Vo
Environmental data journalism

August 2018

Fact-checking: AMA with Africa Check, Tirto.id, RMIT ABC Fact Check, Correctiv/EchtJetzt, and Factchecker.in
Favourite chart types
West Africa Leaks: AMA with Will Fitzgibbon and Daniela Lepiz

July 2018

World Cup data journalism
Open data
Immersive storytelling: AMA with Journalism 360 ambassadors

June 2018

Crowdsourcing

May 2018

Algorithmic reporting
Our concept

Reproducible journalism

Conversations with Data: #17

Do you want to receive Conversations with Data? Subscribe

We’re back!

It’s been a while since we last spoke -- but we haven’t forgotten you. Over the past few weeks, we’ve spent some time reflecting on our first year of Conversations with Data, which saw us feature advice from 75 individual community members and 17 expert AMAs. It’s been great talking with you, and we want to keep our conversations relevant. Be sure to let us know what you’d like us to cover, or who you’d like us to chat to.

As we gear up to reproduce another great year, we thought it would be fitting to look at journalism that does just that. So, for this conversation, we got talking with Jeremy Singer-Vine, Stephen Stirling, Timo Grossenbacher, and more, about how to set your stories up for repeatability.

For background, check out Sam Leon’s Data Journalism Handbook 2 chapter here.

What you said

1. Documentation is everything

If there was ever a common theme across your responses, it was this: document, document, document.

In the words of BuzzFeed’s Jeremy Singer-Vine, “reproducibility is about more than just code and software. It's also fundamentally about documentation and explanation. An illegible script that ‘reproduces’ your findings isn't much use to you or anyone else; it should be clear -- from a written methodology, code comments, or another context -- what your code does, and why”.

To keep good documentation, Stephen Stirling from NJ Advance Media suggested reminding yourself that others need to understand your process.

“Keeping regular documentation of steps, problems, etc. not only keeps your collaborators in the loop, but provides instruction for others hoping to do the same work. Documentation on the fly isn’t always going to be pristine, so make a point to go back and tidy it up before you make available to others.”

2. Take time to develop good documentation practices

While we’re agreed that documenting is crucial, what does good documentation look like in practice? As our Handbook chapter author, Sam Leon, explained there are many ways to make data reporting reproducible.

“Often just clearly explaining the methodology with an accompanying Excel spreadsheet can enable others to follow precisely the steps you took in composing your analysis. In my chapter in the Handbook, I discuss the benefits of using so-called ‘literate programming’ environments like Jupyter Notebooks for reproducibility,” he said.

There’s no set workflow, and often it may be a matter of tailoring your approach to your project’s needs. Brian Bowling shared some thoughts from his experience:

“A change log would be a good idea for a longer project, particularly when other people will be using or viewing the workbook. When I mainly worked in Excel, I didn't do that. As I branched out into R, Python and MySQL, I started keeping Word documents that included data importing and cleanings steps, changes, major queries used to extract information and to-do lists. Since I wasn't working exclusively in one piece of software, keeping separate documentation seemed like a better idea. It also makes it easier to pull the documentation up on one screen while working on another.”

For those of you working in Excel, have a go at using Jeff South’s solution for keeping track of his workflows:

“Every time I make a significant modification to data, I copy it to a new worksheet within the same file, and I give the new worksheet a logical name ("clean up headers" or "calculate crime rates"). Also, I insert comments on columns (and sometimes on rows) to document something I've done, like "Sorted by crime rates and filtered out null values").

This system means that all changes are clearly documented in one file and you don’t have multiple files with similar names.

3. Consider your software

Timo Grossenbacher, SRF Data, warned: “Scripts may fail to execute only months after initial creation -- even worse, they suddenly produce (slightly) different results without somebody noticing. This is due to rapid development of software (updated packages, updated R / Python version, etc.). Make sure to at least note down the exact software versions you used during initial script creation.”

If you’re using R, Timo’s created an automated solution for version control, which Gerald Gartner said has helped Addendum avoid problems with different versions of packages. They’ve moved all of their analysis to R and now, he said, “the 'pre-R-phase' feels like the dark ages.”

At BuzzFeed, they also use R, along with Python. Jeremy Singer-Vine suggested going for these types of ‘scriptable’ software rather than point-and-click tools. That said, “if you're wed to a point-and-click tool, learn about its reproducibility features, such as OpenRefine's 'Perform Operations'".

4. Make use of data packages

Serah Njambi Rono, from Open Knowledge International, reminded us that datasets are constantly evolving -- and this can limit successful reproducibility.

“Outdated versions of datasets are not always made available alongside the updated datasets - a common (and unfortunate) practice involves overwriting old information with new information, making it harder to fact check stories based on older datasets after a period of time.

Frictionless Data advocates for the use of data packages in answer to the latter issue. Data packages allow you to collate related datasets and their metadata in one place before sharing them. This means that you can package different versions of the same dataset, provide context for them, publish them alongside your reporting and update them as needed, while allowing for repeatability.”

Our next conversation

Next time, we’ll be putting algorithms in the spotlight with Christina Elmer and the data team at Spiegel Online. Have a read of Christina’s Data Journalism Handbook 2 chapter on algorithmic accountability here.

Until next time,

Madolyn from the EJC Data team

Time to have your say

Sign up for our Conversations with Data newsletter

Join 10.000 data journalism enthusiasts and receive a bi-weekly newsletter or access our newsletter archive here.

Email Email

I agree that my data will be processed for sending me this newsletter. All processing will happen according to the EJC Privacy Policy*

Almost there...

Head over to your inbox and click the confirmation link in the email to complete your subscription.
If you experience any other problems, feel free to contact us at [email protected]

About

About us
The team
Blog
Partnerships
Branding
Contact

Made possible by

Write for us
Contributors
Partners

Useful links

Latest discussions
FAQ
Newsletters archive

Small print

Privacy Policy
Cookie Policy
Terms and Conditions
Code of conduct

Social media

Twitter
LinkedIn
Facebook

Created by

Supported by

Longform reads

Verification Handbook

Data Journalism Handbook 2

New course

Quality journalism

Countering hate speech

New course

Video course

Fundamental search for journalists

Popular course

Coding

Python for journalists

Time to have your say

Longform reads

Verification Handbook

Data Journalism Handbook 2

New course

Quality journalism

Countering hate speech

New course

Video course

Fundamental search for journalists

Popular course

Coding

Python for journalists

Reproducible journalism

Conversations with Data: #17

What you said

1. Documentation is everything

2. Take time to develop good documentation practices

3. Consider your software

4. Make use of data packages

Our next conversation

Time to have your say

Sign up for our Conversations with Data newsletter

Review your cookie settings for the optimal site experience.