Reassembling Public Data in Cuba: Collaborations When Information Is Missing, Outdated or Scarce
Written by: Yudivián Almeida Cruz Saimi Reyes Carmona
Abstract
How a small data journalism team in Cuba fights against the lack of data.
Keywords: Cuba, scarce data, data learning, artificial intelligence, data journalism, researcher–journalist collaborations
Postdata.club is a small team. We started as four journalists and a specialist in mathematics and computer science, who, in 2014, decided to venture together into data journalism in Cuba. Until that moment, there was no media outlet that was explicitly dedicated to data journalism in Cuba and we were interested in understanding what the practice entailed.
Today we are two journalists and a data scientist working in our free time on data stories for Postdata.club.
Data journalism does not feature in our daily jobs. Saimi Reyes is editor of a cultural website, Yudivián Almeida is a professor of the School of Math and Computer Science at the University of Havana, and Ernesto Guerra is a journalist at a popular science and technology magazine. Our purpose is to be not just a media organization, but an experimental space where it is possible to explore and learn about the nation we live in with and through data.
Postdata.club lives on GitHub. This is because we want to share not just stories but also the way we do research and investigations. Depending on the requirements of the story we want to tell, we decide on the resources we will use, be they graphics, images, videos or audio. We focus on journalism with social impact, sometimes long-form, sometimes short-form. We are interested in all the subjects that we can approach with data, but, above all, those related to Cuba or its people.
The way we approach our investigations depends on the data that we have access to. Sometimes we have access to public and open databases. With these, we undertake data analyses to see if there may be a story to tell. Sometimes we have questions and go to the data to find answers that could constitute a story. In other cases, we explore the data and, in the process, find interesting leads or questions which may be answered by data we do not yet hold.
Other times—and on more than a few occasions—to support our analysis and investigations, we have to create databases ourselves based on information that is public but not properly structured. For example, to report on the Cuban elections, we had to build databases by combining information from different sources. We started with data published on the site of the Cuban Parliament. This data, however, was not complete, so we complemented it with press reports and information from Communist Party of Cuba websites.
To report on the recently designated Council of Ministers, it was also necessary to build a database. In that case, the information provided by the National Assembly was not complete and we used press reports, the Official Gazette and other websites to get a more comprehensive picture. In both cases, we created databases in JSON format which were analyzed and used for most of the articles we wrote about the elections and the executive and legislative powers in Cuba.
In most cases we share such databases on our website with an explanation of our methods. However, our work is sometimes complicated by the lack of data that should be public and accessible. Much of the information we use is provided by government entities, but in our country many institutions do not have an online presence or do not publicly report all the information that they should. In some cases we went directly to these institutions to request access to certain information, a procedure which is often cumbersome, but important.
For us, one of the biggest issues with the data that we can obtain in Cuba is its outdatedness. When we finally get access to the information we are looking for, it is often incomplete or very outdated. Thus, the data may be available for consultation and download on a website, but the most recent date covered is from five years ago. In these cases we identify other reliable websites which provide up-to-date information or resort to documents in print, scans or human sources.
Collaborations with students and researchers are one of the ways we approach situations where information is missing, outdated or scarce. Since 2017, we have taught a data journalism course to journalism students at the University of Havana School of Communication. Through our exchanges with these future journalists and communication professionals we have learned new ways of working and discovered new ways to access information.
One of the things we do in these classes is to involve students in the construction of a database. For example, there was no single source in Cuba to obtain the names of the people who have received national awards, based on their life’s work, in different areas. Together with students and teachers, we collected and structured a database of the recipients of more than 27 awards since they began to be granted until today. This information allowed us to reveal that there was a gender gap in awarding prizes. Women received these prizes only 25% of the time. With this discovery we were able, together, to write a story that encouraged reflection about gender issues in relation to the national recognition of different kinds of work.
In 2017 we had another revealing experience. This experience helped us to understand that, in many cases, we should not to settle for existing published databases and we should not make too many early assumptions about what is and is not possible. As part of their final coursework, we asked students to form small teams to carry out an investigation. These were composed, in each case, by one of the four members of the Postdata.club team, two journalism students and a student of computer science. One of the teams proposed tackling new initiatives of self-employment in Cuba. Here, these people are called cuentapropistas. What was a few years ago a very limited practice, is now rapidly growing due to the gradual acceptance of this form of employment in society.
We wanted to investigate the self-employment phenomenon in Cuba. Although the issue had been frequently addressed, there was almost nothing about the specificities of self-employment by province, the number of licenses granted per area of activity or trends over time. Together with the students, we discussed which questions to address and came to the conclu- sion that we lacked good data sources. In places where this information should have been posted publicly, there was no trace. Other than some interviews and isolated figures, not much information on this topic was available in the national press.
We thought that the data would be difficult to obtain. Nevertheless, journalism students from our programme approached the Ministry of Labour and Social Security and asked for information about self-employment in Cuba. In a few days the students had a database in their hands. Suddenly, we had information that would be of interest to many Cubans, and we could share it alongside our stories. We had wrongly assumed that the data was not intended for the public, whereas the ministry simply did not have an up-to-date Internet portal.
Coincidentally, the information came into our hands at a particularly convenient moment. At that time, the Ministry of Labour and Social Security decided to limit license issuing for 28 of the activities authorized for non-state employment. We were thus able to quickly use the data we had obtained to analyze how these new measures would affect the economy of the country and the lives of self-employed workers.
Most of our readers were surprised that we were able to obtain the data and that it was relatively easy to obtain. In the end it was possible to access this data because our students had asked the ministry and until today Postdata.club is the online place that makes this information publicly accessible.
Doing data journalism in Cuba continues to be a challenge. Amongst other things, the dynamics of creating and accessing data and the political and institutional cultures are different from other countries where data can be more readily available. Therefore, we must always be creative in looking for new ways of accessing information to tell stories that matter. It is only possible if we continue to try, and at Postdata.club we will always strive to be an example of how data journalism is possible even in regions where data can be harder to come by.