Editorial transparency in computational journalism

Name: DataJournalism.com
Price range: $

How to build trust by enabling readers to check your work

03 September 2017

By Jennifer Stark

This article was migrated from datadrivenjournalism.net. It has been edited for DataJournalism.com, but some information may still be outdated.

As journalists, we are committed to being the watchdogs who call government and companies to account. A new field of accountability is the use of algorithms. These are employed in multiple facets of our lives -- often without our knowledge -- to determine prison sentencing, hiring, whether someone should be granted a loan, and other societal decisions.

As computational methods also become more prevalent in the newsroom, such as data journalism, curated news feeds, automated writing, social media analytics, and news recommender systems, we must hold ourselves to the same standards of accountability and transparency.

To begin this process, I presented a paper at the Computation+Journalism 2016 conference that examines how to develop standards and expectations for transparency.

Benefits of editorial transparency

Releasing documented code and data does sound like extra work, and it is. But by investing this extra time, we benefit ourselves, our field, and our readers.

Writing code that others will see, and potentially criticise or question, encourages us to write cleaner code, with appropriate commenting, logical organisation, and visualisations.

Adopting this process ensures that what we are reporting is based on evidence, and that this evidence is present and correct. And when our code and data are open, as with most open source fields, rapid development within that field occurs.

The field of journalism benefits from an educational standpoint, where journalists can learn from clear, well-documented code. Journalists can access the data and create something new from it that they had not considered, had no time for, or to tell a local story.

Editorial transparency also benefits our readers. Particularly in these times, establishing and maintaining trust with readers is tantamount to the continued success of journalism. Part of building that trust is enabling readers to check your work or see the steps that lead you to your story, facts, and conclusions.

Just as we cite our sources, we should provide evidence for our data journalism also. We allow readers to engage with our work more by providing the code and the data, as well as the opportunity for new stories, or even corrections if errors are found in our code.

Case studies

For example, below are two case studies of my own work using transparency tools that are free and open source.

Case study #1

Uber: A data journalism project investigating uberX wait times across demographics in Washington, DC.

To open up our process:

data was shared via a lab-account Google drive (Google Drive is great if the data set is too large to upload to GitHub.)
data (cleaned and processed raw data) was shared as a .csv file within the GitHub repository enabling others to pick up the data analysis at later stages if desired
data analysis code was shared in commented Jupyter notebooks and python scripts in the GitHub repository
project and code documentation, the data dictionary, and other experimental particulars were described in the readme
everything that could be achieved programmatically was done programmatically, rather than manually, to enable replicability and facilitate reproducibility
the Google drive and the GitHub repository were linked in the news article
the news article was linked in the GitHub repository.

STARK-3 — A map showing average wait times in seconds for each census tract in Washington DC for an UberX car.

In doing this we:

were accountable so that others could inspect our code, data, and assumptions -- we were even notified of a bug in our code via the ‘Issues’ affordance in GitHub
facilitated several independent policy studies in other States and cities based on our code
enabled others to conduct novel studies and data visualisations, including this one by Kate Rabinowitz.

Case study #2

@AnecbotalNYT: a Twitter bot that tweets comments from news articles shared on Twitter. The role of this bot is to surface comments and stimulate broader engagement with articles, news, and social media users.

Here’s how we ran the project to promote transparency:

code is available on GitHub
usage and customisation instructions and documentation are available in the README.md file
the tool is independent of platform (Mac, PC, Linux) -- all that is necessary to create one’s own bot is to install the required python libraries, edit the configuration file, and run it on a server like AWS.

We aimed to make our tool as accessible to others as possible, so that any newsroom or individual could make a copy of the code, and customise the configuration settings to create their own bot. It was challenging to discern how much customisation to build into the tool, while ensuring that it wasn’t so flexible as to render it too complicated or to dilute the specific goal of the bot.

Screen Shot 2019 04 02 at 6 40 50 pm 2 — Screenshot of @AnecbotalNYT with an example of a comment from the New York Times tweeted as an image.

Other news organisations’ Github examples:

Buzzfeed News often share their data analysis, libraries and tools on their GitHub account, including their recent story about spy planes
ProPublica shares many tools and stories, including one on machine bias
Economist Data Team mostly shares their JavaScript-based visualisations and data

These are just some examples of newsrooms sharing, although not all of them come with documentation or links to the articles they were published in. Remember: simply sharing code is not equivalent to transparency.

Documentation

What does documentation entail?

Journalists should comment on code, explaining what each line, code block, or function does.
Writing code in Jupyter Notebooks can be helpful if the project is in Python, R, or Julia, as HTML and Markdown text can be added in between code blocks to provide context or explanation. Graphics can also be displayed inline with the code, and it can be viewed online without requiring you to install specific software.
Write a README.md for your GitHub repository that provides context for the study, links to the article and to the data, a list of code dependencies (that is, which code libraries were used), a data dictionary, and any other information that may assist someone in following the code. Consider linking to these within the news article.
Try linking to reference material for any APIs or external software used.

This saves you from re-writing instructions on how to use these APIs. For example, the comment bot collects comments from Disqus forums and, rather than explaining how to set up a Disqus API account, I referred the reader to the Disqus documentation.

Jupter — Project Jupyter develops open-source software, open-standards, and services for interactive computing

Considerations when sharing

Each project will have its own unique considerations. Sometimes sharing the data in its raw form will not be possible because of privacy issues. In these cases, it might be possible to share aggregated or cleaned data that no longer contains personal identifiable information. In other cases, the data used may be proprietary or have been provided to you with restrictions or an agreement. Sharing this data will of course not be possible, and a statement could be made to that effect. If the data itself cannot be shared, still consider sharing the code used to clean the data and analyse it. If any graphics in the article were created using code, share that too.

Licensing

Sharing code is great, but for others to use your code, it has to come with a licence. There are many different ones to choose from depending on whether you’re sharing code, data, or a mixture of the two.

Check out these links to help you choose:

For more detail on building transparency and accountability in computational journalism, read the full research paper here.

Editorial transparency in computational journalism - How to build trust by enabling readers to check your work

6 min Click to comment

Longform reads

Verification Handbook

Data Journalism Handbook 2

New course

Quality journalism

Countering hate speech

New course

Video course

Fundamental search for journalists

Popular course

Coding

Python for journalists