Inside the Uber Files
Conversations with Data: #97
Do you want to receive Conversations with Data? Subscribe
Welcome to the latest Conversations with Data newsletter brought to you by our sponsor FlokiNET.is, a web hosting company that was established in Iceland to provide safe harbour for freedom of speech, free press and whistle-blower projects. You can get 15% off FlokiNET's products and servers by using the promotion code DATAJOURNALISM.
Don't forget that all registered Datajournalism.com users have free web space and domains through FlokiNET.is.
Now on to the newsletter.
This episode focuses on the Uber Files, an investigation published in July 2022 that revealed how Uber flouted laws, duped police, exploited violence against drivers, and secretly lobbied governments during its global expansion.
First leaked to The Guardian, the Uber Files include a database of more than 124,000 files from 2013 to 2017 shared with the International Consortium of Investigative Journalists (ICIJ) and 42 other media outlets.
To understand how ICIJ navigated this large database, we spoke with data journalist Karrie Kehoe, and data & research editor Emilia Díaz-Struck.
What we asked
Tell us about the Uber Files.
Emilia: The Uber Files is an interesting leak covering more than 124,000 records between 2013 and 2017 at a time when Uber was expanding around the world. The communications include 83,000 emails showing the tactics Uber used to gain access to the markets around the world, to influence regulations, and to influence people. The files also reveal how Uber would deal with some cases tied to law enforcement in a couple of countries. Regionally speaking, most of the investigation was connected to Europe along with some connections with Asia, Latin America, and other parts of the world. Instead of examining the offshore financial system, ICIJ looked at how Uber influenced and lobbied governments to gain access to new markets.
How did the investigation begin?
Emilia: Troves of files were first leaked to The Guardian. After an initial exploration, they discovered the files were not only tied to the UK but also to other countries. The Guardian then reached out to ICIJ. We thought this could be an interesting project for our network. We started the investigation by reviewing which countries were part of this set of files and started involving partners in a lot of countries. This is when the magic happens. The power of collaboration allowed us to mine these files, and do the reporting.
There are so many angles for this story. How did you go about sifting through the information?
Karrie: First of all, I absolutely love this story because so many of ICIJ's investigations are told from the offshore service provider aspect, where we see what services are being provided for companies. But this is the first time, we got to the heart of the company and saw it behaving badly from the inside. The communications revealed how they spoke to each other and the culture around them.
Our technology team created this wonderful product called Datashare. The 124,000 files were all loaded into Datashare and then OCRed. Some were in English, and others were in French -- they built a translation tool to help us. The technology team created a file pack for the data so we can see what's in different folders and navigate it -- like moving through rooms and seeing the contents of each one. When we first get leaks like this, we sit down and read just like everybody else. While reading, we start looking for patterns and names to figure out what's going on. For instance, I remember putting in Macron's name, and we got two thousand hits.
Who were the stakeholders, and how did you navigate those?
In this investigation, we figured out who contacted who and the purpose of that communication. Our programmer extracted the email addresses, the names of the people behind those email addresses, and the domains. This helped us see where these stakeholders were from -- government, academia, politicians, think tanks, and citizen groups. Combining that information, we found over 1850 stakeholders in about 29 countries. Different parts of our team were looking at different angles. Karrie focused on lobbying, and our other colleague looked at academics. The next step was to figure out how to organise and connect this data and connect it to public records.
Tell us about how you examined and investigated the lobbying aspect of Uber.
Karrie: Lobbying is an opaque thing and can be such a black box. While lobbying could have a massive impact around the world, it's really difficult to see from the outside how it's actually done. The most interesting thing about this investigation was getting to see the heart of this huge lobbying machine. There were three different types of calendars in the files. After we extracted them all, we created a huge spreadsheet. Part of this relied on programming and part of involved just reading through the PDFs to make sure that we hadn't missed anything. We created this long list of meetings between uber executives and public officials. Everyone from a mayor up to a European commissioner to a vice president, if there was a meeting scheduled, we pulled it out.
Because meetings don't always happen, we spent a lot of time verifying this and looking for evidence. We went through all the correspondence, the text messages, and the emails, and looked for proof that Uber executives were in a room with that person. So that was the first part of it -- to make sure the meetings actually happened. The second part was to see if they were declared. The European Commission has a transparency register where you have to declare meetings. This is all publicly available data. After we verified all the meetings that happened, we then went through the public register and checked whether they had been declared or not. Overall, we found over a hundred meetings had happened that we could definitely confirm between 2013 and 2017.
What other interesting things do you find? How did you approach it?
Karrie: With this type of investigation, you want to start at the top. You want to try to find the prime ministers and other top figures. We saw that Uber did a lot of planning around Davos, the World Economic Forum in 2016. We had their spreadsheets and their planning docs, and we also went through all their correspondence and text messages over that time period. That was fascinating. We saw that there was an undeclared meeting between Joe Biden and Travis Kalanick, who was CEO at the time. There was another meeting with Enda Kenny, the former Irish prime minister. They also met with the president of Estonia and Benjamin Netanyahu, who was then the prime minister of Israel. They also had meetings with the prime ministers of Luxembourg, Norway and the Netherlands. The communications helped us see their relationships develop.
Emilia: In total, we had six world leaders in these communications with Uber. The files gave you a snapshot into seeing how Uber was planning to go around the world and expand. We also saw how Uber managed to thwart government regulations and law enforcement with a kill switch. This meant the authorities wouldn't be able to gather information on them.
What has been the impact of this investigation so far?
Emilia: We saw taxi drivers in Europe going out and demonstrating. There have also been demands for more investigations into lobbying efforts.
Karrie: It would be fantastic if they could harmonise lobbying regulations across Europe so that we could look and see what these multinationals are doing in multiple countries and see how their tactics are changing from country to country. I also hope that one of the impacts is going to be that you have more people like Mark MacGann coming forward. As head of Uber's lobbying for Europe, he sat down and thought that maybe he had done harm and wanted to rectify it several years later. He became a whistleblower and came forward and allowed us to do these compelling stories.
What advice do you have for journalists about to embark on a data-intensive investigation like the Uber Files?
Emilia: You need a lot of coffee, patience and a good playlist to help you through it. When you have 124,000 files, like the Uber Files, it is easy to feel overwhelmed by the data. The first thing is to put the mess in order and try to identify what this is about. Spend some time diving into the data. Use keywords to search and identify the key documents. Then you need to identify the stories and the journalistic questions you want to answer. Based on those questions, ask yourself what relevant data you have to work with. It is also important to understand what are the problems with this data. Next, you can come up with a plan and get started. I also advise journalists to devote a lot of time to fact-checking their investigation.
Finally, as data journalists, what is in your toolbox?
Karrie: We regularly use Excel and Google Sheets. I also use a bit of Python, Jupyter Notebook and pandas. I often structure information in my head and write it down.
Emilia: What matters is that you are developing a mindset critical for analysing the data. Balancing automation and manual work with different tools and skills is essential. One person may fact-check with different tools than another person. The magic happens when you combine multiple skills in a data journalism team.
Latest from DataJournalism.com
How do you inject data storytelling into radio packages? Radio broadcast journalists can find this challenging given they only have a few minutes to tell the story. In our latest long read, Robert Benincasa, a computer-assisted reporting producer in NPR's Investigations Unit, provides detailed examples and treatments for merging audio and data journalism. Read the long read here.
How can data journalism combat the spread of disinformation covering the Russia-Ukraine war? In our latest long read by Sherry Ricchiardi, she explores the digital tools to help journalists fight back. Read the article here.
Movies are a great source of inspiration when it comes to seeing how data impacts daily life. And for data journalists, entertainment can be an excellent way to learn without the number crunching bogging us down. On the heels of NICAR's mailing list post about this very subject, Andrea Abellan came up with her own favourite films to share with you. Read the blog here.
Want to be on the podcast or have an idea for an episode? We want to hear from you. Don't forget to join our Discord data journalism server for the latest in the field. You can also read all of our past newsletter editions or subscribe here.
Tara from the EJC data team,
PS. Are you interested in supporting this newsletter or podcast? Get in touch to discuss sponsorship opportunities.