Data journalism on the blockchain

If researchers can use blockchain as an investigative tool, data journalists can too

When we talk about blockchain and journalism, the focus is often on trust and sustainability. Micro-cryptocurrency payments, powered by blockchain, are being explored as a potential solution to the journalism industry's declining revenue streams. At the same time, the technology's ability to protect and verify information has been touted as an antidote to censorship and fake news -- but is this really all blockchain offers journalists?

Although it hasn't received as much attention, the transaction data that blockchain applications leave behind provides a potential goldmine for investigative journalists. Uses of blockchain range far and wide, including by premium wine makers, the tuna industry, and as a means to combat counterfeit pharmaceuticals, opening up a whole new world of data for journalists to explore.

In academia, blockchain has already been used to identify the people behind bitcoin transactions, research illegal activities, and more. Surely, if researchers can use blockchain as an investigative tool, data journalists can too.

Walid Al-Saqaf agrees. Previously a journalist, turned academic, Walid brings these two worlds together through his work as a Senior Lecturer in Media Technology and Journalism at Södertörn University in Stockholm, Sweden. He is also a Co-Founder and Vice President of the Internet Society Blockchain Special Interest Group. We spoke to him to find out more about how data journalists can use blockchain in their reporting.

In your own words, can you provide us with an overview of what blockchain technologies do, as well as some prominent applications of them?

As the underlying technology used by the bitcoin cryptocurrency, blockchain technology is a decentralised and distributed database system. I personally think that knowing the internal under-the-hood mechanism of how a blockchain works is not necessary to understand its applications and possible benefits. It may, in fact, be sufficient to say that blockchain is a new type of database that has three characteristics:

  1. It is distributed, meaning that a copy of the data contained on the blockchain is cloned on thousands of nodes.
  2. It is transparent, allowing you to see all transactions that have occurred since the blockchain was created and making any transaction traceable over time.
  3. It is immutable, meaning that it is not possible to change what is written on the blockchain. This is because every transaction is chained together with strong cryptography (hence the name 'blockchain'), creating a permanent archive.

Blockchains, such as ethereum, allow the creation and execution of advanced smart contracts that automate processes based on predefined rules, effectively ending the need for intermediaries to handle the execution manually. For example, you can use a smart contract to have an automatically executed crowdfunding campaign. In this instance, the blockchain would continuously monitor for incoming donations, automatically forwarding the total to the beneficiary once the goal is met. If the goal is not met, it would allow donors to reclaim their donations by calling the smart contract after the deadline.

But this is just one example. The applications of blockchain are many and range from use for exchange of digital assets to crowdfunding, storing and tracking real estate ownership information, electronic voting, securing healthcare records, preserving intellectual property rights, and much more.

Ethereum-3818347_640
Scott-webb-272211-unsplash

Aside from funding, what are the various use cases for blockchain in data journalism?

Blockchain technology can be used to combat fake news, preserving intellectual rights of content providers, investigating and tracking transactions, limiting bias and external influence, combating censorship, protecting whistleblowers, enhancing citizen journalism.

To use blockchain technology effectively, it is necessary to have the technology diffused and adopted widely, which may take years. Nonetheless, I believe it is just a matter of time until we get to the point of having blockchain accepted in the mainstream.

On this last point, why are journalistic uses of blockchain contingent on it becoming more widespread?

In order to understand and deal with blockchains, journalists have to invest time and energy by getting the training, education and expertise necessary to use and analyse blockchain data. However, since blockchains remain at the very early stage of development, they are not yet scalable and certainly not yet ready for mass adoption. In order to reach critical mass, the technology needs to go through some structural changes and enhancements. Until that happens, journalists can still benefit from exploring some blockchains (like bitcoin and ethereum), but they may have to wait until blockchains become more widely spread to effectively use it for day-to-day activities.

What kind of stories would most benefit from leveraging blockchain technologies?

Data contained in blockchains is well-structured and relatively easy to access through a number of APIs and open-source tools. This means that all kinds of stories could benefit from using blockchains as a data resource. Once journalists have extracted this data, they can use it to undertake investigative journalism in the traditional sense.

This means that it is possible for data journalists to extract and analyse blockchain data for many different stories that may be relevant to the public. For example, journalists could look at the rise in the number of cryptocurrency purchases in countries suffering from economic turmoil. Since cryptocurrencies are not linked to any particular government or central bank, they can theoretically survive a global financial crisis. So, an increase in purchases may indicate that people are purchasing bitcoin as a store of value or that people's trust in government is diminishing.

Another scenario worth considering is the anticipated wider application of smart contracts to facilitate a plethora of automated services. Such a development can theoretically lead to the replacement of centralised platforms, such as Facebook and Airbnb with alternative blockchain-based systems that use smart contracts instead.

Data journalists can also write stories describing overall activities of certain bitcoin wallet addresses by aggregating the information obtained and analysing thousands of transactions recorded on the public blockchains -- these are open and easily accessible for free via services, such as blockchain.info.

Blockexplorer

Can you list some examples of data journalism that have already used blockchain technologies as a reporting tool?

Instead of giving a list of examples, I'd rather give an example of the two main types of data journalism stories that one can do by using blockchain as a reporting tool: micro and macro.

In the micro case, a journalist would zoom in and follow the trail of transactions sent from or received by a particular wallet address. One example of such an approach is manifested in the reporting done for Quartz by Keith Collins on the WannaCry ransomeware attack.

Collins was able to know when any bitcoin address sent an amount to the three WannaCry addresses, which received more than $140,000 in bitcoin. He also identified when those hackers started cashing out by forwarding the received funds to other accounts. The fact that public blockchains are trackable makes it possible to trace back any payment to its original source address.

WALID-4

A screenshot from Keith Collins’ blockchain-powered investigation into ransomware attacks. Live version here.

This is a feature that makes investigative journalism quite exciting, and also has the advantage of making it possible to prove the findings reached in the investigative story.

The other type of reporting is macro, which tries to provide readers with an overview of aggregated information, instead of doing a deep analysis focused on only a few addresses. An example of this type is a report by Camila Russo for Bloomberg, which wrote that the criminal activity on bitcoin in 2018 constituted roughly 10 percent compared to 90 percent in 2013. The report noted that this drop may have been due to the realisation that criminal bitcoin transactions can be tracked (as was the case when US law enforcement units succeeded in tracking and taking down the dark web marketplace called 'Silk Road').

Going to the micro use case, what else can transaction data be used to reveal? It is limited to cryptocurrencies and 'follow the money' investigations, or can journalists use it as a source for a wider array of stories?

Depending on the blockchain and the type of data stored in the transaction, it can be both financial and non-financial information. Initially, it is logical to use bitcoin for tracking financial transactions and relationships between various nodes in the network since that is the most dominant use case for that blockchain.

100-abundance-achievement-730564

There is also potential to explore smart contracts created on Ethereum to identify the level of activity and progress achieved for a particular crowdfunding campaign, for example. Looking at the utility of a Ethereum ERC20 token more broadly, it's possible to know how many smart contracts were created and how they were used over time simply by accessing the blockchain data. This can provide journalists with very useful insights on usage patterns regardless of the amounts being sent or received.

Furthermore, the bitcoin blockchain allows you to embed a small piece of text (maximum 40-bytes) using the OP_RETURN operator. This OP_RETURN operator is used to add text related to a particular transaction that could point to a URL, for example, or provide some clues or meaningful information as a permanent note connected to the transaction. In some research that I am doing to understand the breadth and reach of the WannaCry malware attack, for example, I am investigating all the transactions that involved sending funds to three WannaCry addresses. To this end, I looked for any leads in the OP_RETURN values to identify if there were any unusual messages. Sure enough, I discovered one of the transactions had the OP_RETURN string "Caution! WannaCry Address!!", which indicated that this particular sender, who sent a very small amount, wanted to warn other potential victims to not send money to the wallet address and keep a permanent timestamped record on the blockchain with this information. You can find that transaction directly on the blockchain here.

In short, every blockchain is different and has unique ways of embedding text or calling functions of smart contracts, making it necessary to know exactly what a journalist needs to get before going about the data extraction and analysis.

Turning to the fake news use case, how can blockchain be used as a data verification tool? Do you have any examples where it has been used by data journalists in this way?

Yes, I have completed an extensive study about Civil, one of the most popular blockchain-based journalism projects. Civil aims to prevent disinformation by using cryptoeconomics to incentivise users so that they take action when they discover any form of fake news or other malpractices by a Civil newsroom. The dynamics of how this is possible are described in the Civil Constitution. In my research, I've taken a critical look at the project to see if the Civil newsroom provides a relative advantage over traditional newsrooms in this way. My conclusion is that it does, but on the condition that Civil users act rationally and predictably and that abuse of the platform cannot corrupt the whole system.

One of the other promising uses of blockchain to combat fake news and disinformation is in the ability to record original content with an immutable proof of creation. Aside from the usual doctoring of images, one area that is causing headaches nowadays is the ease of manipulating videos using 'deepfake' technology. However, blockchain's ability to timestamp original content immutably makes it a viable way to help address this problem as demonstrated by Truepic, which is an app that uses blockchain to notarise images and videos when they are taken. This is done by storing metadata to certify the authenticity of an original image/video taken using a particular mobile device. I think that such technologies can provide more confidence to readers, assuring them that what they are viewing is not fake. If someone attempts to manipulate the original copy, that could easily be detected as fake since it will either not have an entry on the blockchain or its entry will come at a later time because it is not possible to change the original entry.

What level of technical skills do journalists need to start using blockchains in their reporting?

I believe it all depends on what level of sophistication journalists need to reach their objective. If the objective is to just identify trends in the cryptocurrency and initial coin offering (ICO) space for example, then they can utilise simple web-based API queries to extract the needed data that could thereafter be analysed using MS Excel or any other spreadsheet software.

Data journalists may also consider enhancing their programming skills in order to tap into blockchains that can give access to their content via APIs. One example of a blockchain that may be of interest in this regard is Steemit, a blockchain-based platform that allows users to monetise their own content through direct cryptocurrency donations.

WALID-7

As illustrated in this Steemit white paper, authors get paid when readers upvote their posts.

A study by Mike Thelwall investigated whether Steemit works as an effective social news platform by rewarding users for social content and curation. This research investigated 925,092 posts to understand how much they earned and what drives members of Steemit to send reward payments to certain posts.

Data journalists with the right programming skills might consider undertaking similar research. But if the intention is to analyse millions of connections between nodes in bitcoin and how they formed over time, a higher coding skill level (perhaps in Python or PHP) is needed to communicate with the blockchain and extract the data and store it in a database efficiently. To analyse nodes, journalists also need to have a deeper understanding of network analysis, and the use of Gephi or other network visualisation/analysis tools may be necessary.

How can data journalists start using blockchains as a data source?

Stemming from my belief that it is useful for data journalists to consider using public blockchains as a data source, it is high time to have journalists acquire the basic sets of skills that allow them to access, extract and analyse blockchain data. That's why I have personally started to work on an open source library, called the Data Journalism Github repo, or DDJBlocks (still in development) that could reduce the time and learning curve for journalists starting to look into bitcoin blockchain data for identifying potential stories.

For demonstration purposes, I have put together a Google map showing where donations to Wikileaks have come from over the years. This was done by using DDJBlocks to extract the transactions that include payments to one of the earliest known Wikileaks wallet addresses on the bitcoin blockchain. Thereafter, those transactions with relay IP addresses had their Geolocation identified in the form of a city. Then, I calculated the sums of all amounts coming from each city and stored these on a spreadsheet, which in turn was directly inputted into the Google Maps web interface to overlay the data on the world map.

Wikileaks donations

Walid’s map of Wikileaks donations, powered by blockchain data. Darker blues represent larger donations.

I have also used the tool to do a network visualisation on the address that was used to pay 10,000 bitcoins in 2010 (worth around $41 at the time) for two pizzas from Papa John's. Hanyecz's 10,000 bitcoins have spread over the last eight years, ballooning to a worth of over $65 million today. Marked by bloomberg.com as a milestone in the decade-long history of bitcoin, that pizza purchase can arguably be considered as the first ever proof that a cryptocurrency with no central authority behind it could be used as a means of peer-to-peer payment system. It will be quite valuable for data journalists to detect and cover other milestones for bitcoin and the other cryptocurrencies in the years to come.

Any final thoughts?

I predict that this form of data journalism will be increasingly relevant as bitcoin and other public blockchains become more accepted in the mainstream and as blockchain adoption reaches new heights. The main challenge is to allocate sufficient time and resources, particularly by journalism educational institutes, so that students and researchers look into this field. It’s important that J-schools get ahead of the curve to equip the next generation of data journalists around the world. During my lectures at Södertörn University, I regularly bring up blockchain as a technology that students need to be aware of alongside traditional centralised database systems.

subscribe figure