Inside the Pandora Papers

Conversations with Data: #84

Do you want to receive Conversations with Data? Subscribe

Unnamed 50

Welcome to the latest Conversations with Data newsletter!

Last week we hosted a live Discord chat about the Pandora Papers with Pierre Romera, Chief Technology Officer of the International Consortium of Investigative Journalists (ICIJ).

In our hour-long conversation, he took us behind the scenes of the Pandora Papers investigation. He explained how hundreds of journalists trailed through almost 12 million leaked files revealing hidden wealth, tax avoidance and, in some cases, money laundering by some of the world's rich and powerful.

With questions from the audience, we discussed everything from data tools to digital security along with the processes of running this massive investigation.

If you missed the live chat, listen to the edited podcast on Spotify, SoundCloud, Apple Podcasts or Google Podcasts

Alternatively, read the edited Q&A with ICIJ's Pierre Romera below.

What we asked

Tell us about the Pandora Papers. Why are they called that, and why do they matter?

The Pandora Papers is the biggest collaboration ever in the history of journalism, with hundreds of journalists working on publishing the best stories you can imagine. This is a very impactful investigation based on almost 12 million documents leaked from 14 offshore providers -- firms that offer to create wealth through offshore companies. The project is called the Pandora Papers because this collaboration builds upon the legacy of the Panama and Paradise Papers, and the ancient myth of Pandora's Box still evokes an outpouring of trouble and woe. Inside this huge amount of data, we found a lot of hidden elements that make Pandora the perfect analogy.

Talk to us about your role at ICIJ. What does a chief technology officer do in an investigative newsroom?

ICIJ is a very small organisation with only 40 people. What makes it stand out as a news outlet is that half of our organisation comprises data and tech people. We have about 20 data journalists, developers, system administrators and UX designers. That is a lot of people that use technology to do journalism. A big part of my work is to coordinate this effort and help produce a platform for the investigation, dig into the data, help the journalists find stories, and get in touch with the source. ICIJ is a very tech-centric organisation, and so that's why we need so many resources, people and lean technology. I believe that's also why we can produce these kinds of investigations unlike others before us. We have the team to do it.


Talk us through the process behind the Pandora Papers investigation.

It all started two years ago when we got in touch with the source. In this case, the source is anonymous and wants to stay anonymous. That was a source we knew about, and when we had the chance to talk with that person, we started to discuss the files and understand what they could send to ICIJ. Progressively, we began to meet with the source and get the files. That was the very first step of the investigation.

Once we had the file, that's when everything began. We have to make everything available for all the journalists. We started to operate on our servers securely, and we started to index it. We put it into a search engine to make it easy to search for the huge amounts of text in these documents. Then, before starting to investigate, our team started to create what we call a country list. So the country list is at the very beginning of every ICIJ investigation. This is the point where we look into the files and try to look for matches and occurrences related to each country in the world.

After we have a good list and enough information to share with our partners, we contact them and tell them, "Well, we have information related to that person, so maybe you could be a good partner for this project." That's when the partner joins the project. Once we have them on board, we train them to use our platform and use the security layers we have. We also try to point them to interesting stories inside the documents.

Then we start a very long process of sharing the findings on our digital newsroom. This is called the iHub. This is where we try to sort out some structured data from the document. As you know, all those documents are unstructured, and they are PDFs, excel docs, other files -- so we need to find a way to extract sense from these documents -- that's what we call structured documents. That's also a very long process that lasts until publication.

Then, right before communication, we fact check everything. We submit it for legal review, and we ensure that everything we publish is bulletproof and verified. That's a very important part of the process, and it's also our secret sauce to avoid being sued and publishing wrong information. Finally, one month before publication, we reach out for comments from the people involved in our story. Then we publish.

Unnamed 49

What are the essential tools you used to coordinate this investigation?

One of the most important platforms at the centre of everything is called Datashare. It is a tool to distribute text extraction from documents to many servers. So imagine that you have a leak that is as big as the Pandora Papers. With Datashare, you can ask dozens of servers to work on the document to put them into a search engine and extract the text from the document. It is also able to provide an interface to explore and search the documents.

For the leaked documents, we also have another very important platform called the iHub. So the IHub is our digital newsroom and a platform that allows journalists to share findings, leads, testimony, videos, whatever they produce related to the investigation. On the iHub, we develop the philosophy of radical sharing. This means we encourage all partners to share everything they found. Even if it's bad or there is not enough proof, we encourage them to publish it. And that's also one of the reasons why ICIJ can publish stories in many countries simultaneously, even if there is some censorship. If a journalist cannot publish a particular story in their country because of political pressure, other journalists will be able to publish it somewhere else.

How important is digital security for ICIJ?

Digital security is a very important part of our work, and that's one of the only mandatory things to know before joining an investigation. You have to be able to encrypt your emails, and we use PGP to do that. After a user can use PGP, they send us their key to import it into the system and securely communicate with them.

Confidentiality is also very important to ICIJ, and it is a part of our security. Every media organisation signs a non-disclosure agreement, and we try to ensure that this privacy remains until publication. So far, we've been very good with that, and I think it's also a reason why our security stays strong throughout the investigations. Just one day before publication, we got a DDoS attack on our website. Luckily, our system team was able to block the attack.

How did ICIJ's work on the Pandora Papers differ from the Panama Papers investigation?

The big difference is the size of the project, but the methodology is almost the same. The steps were the same - we indexed the data, built the countries list, and contacted all the partners. All of those processes we're already applied to the Pandora Papers and all the investigations we do. That's also one of the reasons why we were able to involve so many reporters because we didn't try to reinvent the wheel. Instead, we tried to capitalise on the methodology we had already implemented at ICIJ.

From a reporting point of view, one of the Pandora Papers' challenges was to keep the audience's attention on offshore finance. We needed to find groundbreaking stories to maintain some attraction for the readers.


What did you learn from this investigation?

Even if ICIJ has a lot of experience with huge collaborations, we were not prepared to tackle so many journalists. That was hard, and it takes time, a lot of resources, and ICIJ is about the same size now than it was one or two years ago. We also have to improve our platforms to work with so many journalists, which was one of the lessons we learnt from this investigation. Even if the technology we used for the Pandora Papers was very similar to our other ICIJ investigations, improvements were still necessary. For instance, we still had to improve the servers to accept a lot of requests from users and handle a significant amount of data.

We also learned a lot about the fact-checking process during this investigation. There are a lot of stories that we did not publish because we realised that we didn't have enough proof. We may have more proof in the future to release more big stories in the coming weeks. We realised that we had a lot of potential inside this leak that could prolong for many years after publication. It seems we will continue to produce stories for a while with this investigation.

Finally, what advice do you have for aspiring data journalists?

My number one piece of advice is always to learn to program. If you want to work on interactive visualisation or interactive storytelling, learn JavaScript, learn D3. But suppose you are more interested in data scraping and data analysis. In that case, you should learn Python or R. I guess that will make a massive difference between you and other journalists who don't have any programming skills. I also think that will help you to work with developers. If you can better understand their work, the collaboration will be much easier.

It is also essential to develop a data mindset in your own newsroom. For instance, when there is a story, are your colleagues interested in having some sort of data angle? If you want to develop your own skills in data journalism, you also have to educate your colleagues to ask you for help or think from a data perspective.

Latest from

Breaking into the data journalism field can often be challenging. In Duncan Anderson's blog post, learn from those who did it. Explore experiences from academia and the newsroom with interviews from Prof Bahareh Heravi, Michelle McGee, Florian Stalph, Roberto Rocha and Natalie Sablowski. Read the full blog here.

Screenshot 2021 10 27 at 14 31 04

Do you want to broaden your horizons in data journalism while on the go? We’ve got a suggestion for you! Stay up to date with our list of the best data journalism podcasts you should be listening to right now. Some top picks are from podcast hosts Andy Kirk, Simon Rogers and Alberto Cairo. Read Andrea Abellan's full blog here.


Apply for EJC's latest journalism grant

The Global Health Security Call is supporting up to 20 journalistic projects to be delivered by freelancers or teams of freelance and/or staff journalists. The call will provide grants of up to $7,500 USD per project and is aimed at journalists publishing stories in opinion-forming media organisations across France, Germany, the UK, the Netherlands, Italy, Norway and Sweden. The grant is powered by the European Journalism Centre and funded by The Bill and Melinda Gates Foundation. Applications close on 5 November 2021 at 17:00 CET. Find out more here.

1 5 D Ij U5i BPN4f Ab2 Bw O Tg8 A

Photo credit: Mansi Thapliyal

As always, don't forget to let us know who you would like us to feature in our future editions. You can also read all of our past editions here or subscribe to the newsletter here.


Tara from the EJC data team,

bringing you, supported by Google News Initiative.

PS. Are you interested in supporting this newsletter or podcast? Get in touch to discuss sponsorship opportunities.

subscribe figure