Write a response

Building data journalism in investigative newsrooms

An interview with Mar Cabra


Despite the relative maturity of data journalism and the growing application of data in editorial workflows, there is still a lot to learn about the systematic, seamless, and effective integration of data and computational tools in newsrooms. It is time for a holistic assessment of this emerging field by interrogating the ways that newsrooms across the world have adopted data in their day-to-day workflows, the formation of their data teams, their best practices for producing high quality data-based investigative work, their success and failure stories, and emerging training requirements.

The Global Data Journalism study aims to bridge this gap. It is a multi-faceted project, investigating newsroom practices in terms of their data workflows and requirements, as well as the educational and academic aspects of data journalism. This project is from Bahareh Heravi, at University College Dublin, and Mirko Lorenz, Innovation Manager of Deutsche Welle and co-founder of Datawrapper. As a part of this study, Bahareh conducted a series of interviews with industry experts in order to learn about current practices in data journalism and the discipline’s future directions.

For the second interview in our series, following our discussion with Megan Lucero, we spoke to Mar Cabra, Head of Data and Research Unit at the International Consortium of Investigative Journalists (ICIJ) -- the organisation renowned for the Panama Papers, Swiss Leaks and Luxembourg Leaks, amongst other investigative work.

Full audio of Bahareh’s interview with Mar. The below transcript has been edited for clarity.

How do you describe your role at the ICIJ?

I am an editor or project manager. Basically, I always describe my role as being the air controller of this very powerful team that uses data and technology to help us do better investigations. My role is the role of a typical editor in a newsroom. I hear the stories that we are working on, I suggest things that we should do from the data perspective, I coordinate the work of the team when they start working on stories, and then there is an interactive process throughout the work of the story. I’m also the one that kicks people’s butts if they don’t do the work on time and reminds them to meet deadlines. I ensure everything happens on time. At the end of the process, I fact-check and review the team’s actions.

How would you describe data journalism to someone who has never heard the term?

Data journalism is about filtering the electric reality around us -- the electronic records, especially the data around us -- to tell stories about this new world we live in. We live in a world that is electronic. Everything goes through a computer. In journalism, we survey our reality, analysing mounds of information or data, and then tell stories about trends or unexpected patterns -- these are condensed stories about the best data we’ve discovered -- to the people.

What do you see as the main benefits and added values of data journalism?

The main value is that it helps us understand the reality we live in today, in the era of big data. It is impossible to do good journalism today without examining electronic records, which contain data, figures and quantities that reveal detailed information about the actions of individuals and companies. This helps us tell better stories.

Also, I like that data journalism is a very broad field. Not only does it allow us to find unexpected stories by gathering, condensing and analysing data, it also helps us tell stories with much more in-depth. For example, one can publish interactive databases or use interactive graphics or applications, creating several layers of exploration inside the story and allowing us to have a much more in-depth connection with the reader.

What is the most important aspect of data journalism?

One of the best ways to convince editors to use data journalism in newsrooms is through clicks and interactive visualisations. In Spain, for example, I have seen that interactives -- such as stories with maps -- end up being the top story. At the ICIJ, our interactive applications are more viewed and utilised by our readers than our stories. So, visualisation is the key to seizing the attention of readers and editors alike. Accordingly, many newsrooms are creating interactive graphics teams or interactive apps. This is good news to an extent, but data journalism is not just about visualisation or interactivity.

We must not forget the analysis aspect of data journalism. At its crux, it is the analysis of complex realities hidden within thousands and millions of rows of data, which we can crunch collaboratively thanks to technology, to help us tell better stories.

One of the best ways to convince editors to use data journalism in newsrooms is through clicks and interactive visualisations.

Would you say that data analysis and visualisation are of equal importance?

Yes, they are both important. However, I believe the analysis part is forgotten in many newsrooms by many editors and that’s why it requires a higher emphasis.

What is the relevance of data, statistics, visualisations, and coding in newsrooms?

Let me tell you a story. Two and half years ago the ICIJ did not have a data team. Two and half years later we are 50% of the ICIJ staff. Every single investigation that we do has a data component. There’s no investigation that we do -- whether a leak or an investigation based on public records -- that does not have a data component. Because there is no way for us to do investigations around systemic issues without data.

As the organisational structure of ICIJ demonstrates, data journalism is crucially important. These changes are occurring in many other newsrooms too, just as journalists once viewed social media as the ‘new thing’ and, today, no-one disputes that community managers are essential in newsrooms. Data journalism is here to stay. Whichever newsrooms don’t employ data journalists are going to lag behind.

How does the distinction between data, statistics, visualisation, and coding impact newsrooms?

Data skills can be used in many ways in the newsroom. Obviously, they are important to analyse analytics, monitor traffic, implement SEO techniques, and so forth. But the field I am well versed in is content production: using data skills to tell better stories. To me, your training background doesn’t really matter. The main goal of everyone in my team is to tell good stories and do great investigations. Someone in my team has a data analysis background, another is more of a web developer, and another is a data journalist, who knows how to use data and learnt how to code. I don’t mind where people come from, as long as they can add value to the stories. I do think, however, that there is a value in having a coordinated team that works across the newsroom and assists different desks, teams, and projects in different ways. I don’t know if I would separate coders and journalists; I believe we really need integration. A team hailing from different backgrounds and disciplines can meet newsrooms needs in a transversal way.

If you were the editor-in-chief of a national newspaper, what would be your first action in developing its data journalism?

Firstly, I would do a survey or a study about the people already in my newsroom to ascertain their skills, and whether any data skills are being used. Sometimes editors-in-chief do not recognise the value of their newsroom, or whether there are any hidden talents. Has anyone done any data stories? How did they originate? Then I would try to organise a team around that person or foster the skills of that person.

After that, it depends on the budget. If no one has data skills, perhaps I would need to bring somebody from the outside. But one person can only get so much done -- I always believe in teams. Whenever it comes to setting up data teams, I always push for a team, even if it’s just two or three people. The initial team would likely consist of a data journalist, and maybe a programmer with a coding background, or maybe a web developer or a front-end developer. If I had money, and those skills did not exist within my newsroom, I would try to hire this small team. In the first year, they would spend a lot of time training others in the newsroom and advocating for more data journalism.

Data journalism is about filtering the electric reality around us -- the electronic records, especially the data around us -- to tell stories about this new world we live in.

How should we teach data journalism to others -- whether younger or older, less experienced or more experienced?

Many ways. When I came back to Spain five years ago from the United States, where I first learned what Computer-Assisted Reporting was, I only knew Excel. I was trying to find other journalists in Spain who were using Excel to tell stories -- and I couldn’t find anybody. I did find programmers and activities that were very interesting. So, we started doing sessions in a place called Medialab-Prado in Spain, where we would demonstrate our skills. I knew Excel so I taught what I knew in Excel, although I was not an expert. Others knew how to extract information from PDFs and so forth. Excel and the basic analysis of spreadsheets should be in any journalist's backpack, even if you are not a full-time data journalist.

Our sessions were very informal -- people would come and learn for 30 minutes, then the videos go online. Teaching skills was an effective way to build a community. I also co-created a Master’s degree in Investigative Journalism, Data Reporting and Visualisation with two other journalists in Spain, which has been running for four to five years. Because we had the undivided attention of 15 journalists for a year, we could teach them about investigative reporting, data visualisation, coding, Excel, and databases. Of course, having a Master's degree is also important because it cultivates a higher level of skills than one-afternoon session, and allows the opportunity to develop projects. The Master's degree included an internship component and, during their internships in newsrooms, many of the students planted a seed by showing why data journalism is important and -- now that most media organisations in Spain have a data team -- they are leading the charge. So, a Master’s education in data journalism is very important, and undergraduate training is helpful, too.

There are many ways to teach. Online tutorials like the European Journalism Centre’s courses on data skills are very helpful because they help people around the world to get trained remotely. There is no singular definitive way to be trained in data journalism. What is definite is that universities must begin adding it to their curriculum, because otherwise, they are failing their students.

Screenshot 2020 03 10 at 19 28 59

Have you made any surprising observations in the last few years? Any unexpected areas of application or barriers to adoption?

One of the most exciting discoveries was how technology helps us to work collaboratively across borders. Around three years ago I was assigned at ICIJ to work on a leak that we had -- our first leak of 260 Gigabytes, which developed into the Offshore Leaks investigation. When I was assigned to work on that project, I was instructed, “Hey, here’s your hard drive, search through it, and send journalists the documents around the world”. Within two weeks, I was already overwhelmed. I was thinking, I am not scalable -- we need to hire more people like me. We started thinking about how technology could help us do this work in a scalable way, such as cloud-based platforms that would allow us to share the data with reporters, so they could log in and make the searches themselves.

Three years later, we broke the Panama Papers -- the biggest leak in journalism history. We used very sophisticated platforms, whereby reporters could search through documents or even do social network analysis. Clearly, technology is moving very fast and becoming more and more accessible.

We had this platform called Linkurious, a software by a French start-up, which enabled reporters to simply log in, enter a name, and see graphs of people connected to companies offshore -- who is connected to who. In a way, they did data journalism. Of course, some other people could actually write queries to interview that data, but it is fascinating how these tools are making data more accessible and making it accessible to people who are not tech-savvy.

It makes me wonder how much more technology, algorithms, and data analysis techniques will assist in future stories. One of the dreams I have is to be able to apply advertising recommendation techniques -- based on previous purchases on search engines like Amazon -- to advise journalists. For example, the algorithm would know what I have searched in the Panama Papers data, and they would tell me “Hey, you found the Prime Minister of Ireland, maybe you are interested in this other name, who seems to be the president of Argentina”. Right? Algorithms would help us find stories that we didn't know were there. Although I don’t know exactly where we are going, I think there is a very exciting future ahead if we keep applying technology to journalism. To continue developing, we should apply techniques that are used in the corporate world for analysing big data to journalism.

subscribe figure