Reassembling Public Data in Cuba: How Journalists, Researchers and Students Collaborate When Information Is Missing, Outdated or Scarce

Written by: Yudivián Almeida Cruz Saimi Reyes Carmona is a small team. At the beginning, we were four journalists and a specialist in mathematics and computer science, who decided in 2014 to venture together into data journalism in Cuba. We also wanted to investigate the issues related to that practice. In Cuba, until that moment, there was no media outlet that was explicitly dedicated to data journalism. was the first.

Right now we are two journalists and a data scientist working in our free time in In none of our jobs we directly perform data journalism, because Saimi Reyes is editor of a culture related website, Yudivián Almeida is a professor of the School of Math and Computer Science at the University of Havana and Ernesto Guerra is journalist in a magazine about popular science and technology. Our purpose is to be more than a media organization, an experimental space where we want to explore and learn about the nation we live in with and through data.

We set out to use open and public data and wanted to share both: our research and the way we do it. That's why we started using Github, the platform where lives. Depending on the requirements of each story we want to tell, we decide on the extension of our texts and the resources we will use, be they graphics, images, videos, audios. We focus on journalism with social impact, sometimes long form, sometimes short form. We are interested in all the subjects that we can approach with data, but, above all, those related to Cuba or its people.

For our investigations we work in two ways, depending on the data. Sometimes we have access to public and open databases. With these, we undertake data analysis to see if there may be a story to tell. Sometimes we have questions and go straight to the data to find the answers for a story. In other cases, we explore the data and in the process find elements that we believe may be interesting or questions arise whose answers may be relevant and which may be answered by that data source or by another.

If the information we get from these databases looks interesting, we complement it with other sources such as interviews and comparing with other information sources. Then, we think how to narrate the research with one or more written texts about the subject accompanied by visualizations to present insights from the data.

Other times – and on more than a few occasions – we have to create databases ourselves based on information that is public but not properly structured and use these as the basis for our analysis and inquiry. For example, to address the topic of Cuban elections, we had to build databases based on information from different sources. For this, we started with data published on the site of the Cuban Parliament, however, these were not complete, so we completed our databases with press reports and information published on sites related to the Communist Party of Cuba. Later, in order to approach the recently designated Council of Ministers, it was also necessary to build another database. In that case, information provided by the National Assembly was not complete and we used press reports, the Official Gazette and another informative sites to get a fuller picture. In both cases, we created databases in JSON format which were processed and used for most of the articles we conceived about the elections and the executive and legislative powers in Cuba.

In most cases we share such databases on our website with an explanation of our methods. However, our work in Cuba is sometimes complicated by the lack of some data that should be public and accessible. Much of the information we use is provided by government entities, but in our country many institutions are not properly represented on the Internet or do not publicly report all the information they should. In some cases we have gone directly to these institutions to request access to certain information, a procedure which is often cumbersome, but important.

For us, one of the biggest issues obtaining data in Cuba lies in its outdatedness. When we finally have access to information we are looking for, it is often not complete, or it is very outdated. Thus, the data may be available for consultation and download on a website, but the last date corresponds to five years ago. In some cases, we must complete the information by looking at different sites that are reliable. In others cases, we must go to printed documents, images or live sources that help us to work with recent information. This has made our way of working different depending on each investigation and the available data. These are the particularities of our environment and this is the starting point from which we set out to offer our readers good journalism that has a social impact. If the information we share is useful for at least one person, we feel it's worth it.

In addition to maintaining website, where we place the articles and stories that result from our research, we also want to extend this way of doing data journalism to other spaces. Thus, since 2017, we have taught a Data Journalism course to students of the journalism programme at the School of Communication of the University of Havana. This subject had barely been taught in our country and this therefore requires ongoing learning and preparation, while receiving feedback from students and other colleagues.

Through our exchanges with these future journalists and communication professionals we have learned many new ways of working and, surprisingly, we have found out new ways to access information. One of the things we do in these classes is to involve students in the construction of a database. There was no single source in Cuba to obtain the names of the people who have received national awards, based on their life’s work in different areas and activities. With all of the students and teachers, we collected and structured a database of more than 27 awards since they began to be granted so far. This information allowed us to reveal that there was a gender gap in the awarding of prizes. Women received these prizes only 25% of the time. With this discovery we were able, together, to write a story that encouraged reflection about gender issues in relation to the national recognition of different kinds of work.

In 2017 also, we had another revealing experience that helped us to understand that, in many cases, we should dare not to settle for existing published databases and that we should not make too many assumptions about what is and isn’t possible. As part of their final coursework, we asked students to form small teams to carry out their task. These were composed, in each case, by one of the four members of the team, two students of journalism and a student of computer science, who had integrated the course to achieve an interdisciplinary dynamic. One of the teams proposed to tackle new initiatives of self-employment in Cuba. Here, these people are called “cuentapropistas”. What was a few years ago a very limited practice, is now rapidly growing due to the gradual acceptance of this form of employment in society.

We wanted to investigate the self-employment phenomenon in Cuba. Although the issue had been frequently addressed, there was almost nothing about the specificities of self-employment by province, the number of licenses granted per area activities, or trends over time. Together with the students, we discussed which questions to address and came to the conclusion that we lacked sources with usable data. In places where this information would have been posted publicly, there was no trace. Nor was there any information in the national press that contained a significant amount of data. Beyond some interviews and isolated figures, nothing was published widely.

We thought that the data would be difficult to obtain. Nevertheless, journalism students from our programme approached the Ministry of Labor and Social Security and asked for information about self-employment in Cuba. In the Ministry they were informed that they could give them the database and in a few days the students had it in their hands. Suddenly, we had all the information that interested many Cubans, and we could also share it, because, in fact, it was meant to be public. The Ministry did not have an up-to-date internet portal and we had wrongly assumed that the data was not accessible.

Students, along with the future computer scientist and journalist from, prepared their story about self-employment in Cuba. They described from the data, and in a detailed way, the situation of this kind of employment in the country. Coincidentally, the information came into our hands at a particularly active time on this subject. For those months, the Ministry of Labor and Social Security decided to limit the delivery of licenses for 28 activities of those authorized for non-state employment. We were thus able to quickly use the data we had to analyse how these new measures would affect the economy of the country and the lives of self-employed workers.

Most of our readers were surprised that we had obtained the data and that it was relatively easy to obtain. In the end it was possible to access this data because our students had asked the ministry and until today it's only in where this information is public, so everyone can consult and analyze it.

Doing data journalism in Cuba continues to be a challenge. Amongst other things, the dynamics of creating and accessing data and the political and institutional cultures are different from other countries where data can be more readily available. Therefore we must always be creative in looking for new ways of accessing information and, from it, to tell stories about issues that matter. It is only possible if we continue to try, and at we will always strive to be an example of how data journalism is possible even in regions where data can be harder to come by.

subscribe figure