Write a response

The best data journalism stories of 2022

Here are 9 excellent data journalism initiatives from across the globe

Back by popular demand, welcome to 2022’s Best Data Journalism Projects Roundup.

Over the past 12 months, we’ve done our best to keep track of some of the exceptional work that data journalists have put out from around the world. From Caracas to Berlin and Kyiv, there has been no shortage of impressive pieces. While it is impossible to pick just one article, here is a list of some stellar data journalism published in 2022.

This has been another year of transformative events, and data journalism has once again been at the forefront of this coverage. From the war in Ukraine to climate change and the cost of living crisis, data journalists have analysed, visualised and interpreted these disparate issues into informative and compelling stories.

This year, we did something new, too. We also asked our readers for their favourite projects. Thank you to those who participated and took the time to point us to some amazing work.

Reporting on the war in Ukraine, Novaya Gazeta

Captura de pantalla 2022 12 14 a las 13 41 18

Our list begins with what is the most prominent and unprecedented event of 2022: the war in Ukraine. With the rise of disinformation amongst conflict and all the dangers it entails, accurate and data-driven reporting has proven to be even more essential for journalists. From the very beginning of the conflict, the world has been exposed to a deluge of footage on social media platforms. As a result, journalists have been turning to traditional reporting and open source intelligence (OSINT) techniques to find out, verify and explain what is happening.

With so many news outlets covering this conflict, we felt it prudent to highlight the work of Novaya Gazeta, an independent Russian newspaper renowned for its accurate and critical coverage of Russian political and social affairs. As a consequence of holding the Russian government to account, Novaya Gazeta has become a target of censorship, prosecution, and has resulted in the deaths of several of their journalists. Its editor-in-chief, Dmitry Muratov, was awarded the Nobel Peace Prize in 2021.

While the news outlet has covered many angles but really sets its work apart is their focus on the human impact of the Russia-Ukraine war. On a daily basis, its Important Stories' data department updates the figures of soldiers and civilians killed in Ukraine, the rising death toll in regions and the oppression of those who oppose the war. Given how much of an uphill battle it is to report under such censorship, this coverage is all the more impressive. Take a look at this article by Alesya Marokhovskaya showing the number of military casualties showcasing the effort made by those working without official government permission to build their own databases.

We would be remiss to not mention this special edition of the Data Vis Dispatch by Lisa Charlotte Muth of Datawrapper, which features the best war reporting projects from around the world.

Uninhabitable, Berliner Morgenpost

Captura de pantalla 2022 12 14 a las 13 44 57

With the impact of climate change weighing heavy on our minds, the Berliner Morgenpost published ‘Uninhabitable’. This piece was born from the collaboration between data journalists Ida Flik, André Pätzold and Benja Zehr and features guided scrollytelling interactive full of context and the most important information which is accompanied by an interactive globe that allows the readers to jump across the planet and see the different effects and severity region by region.

The story and visualisations rely solely on real facts and hard data collected from the Intergovernmental Panel on Climate Change (IPCC) report and a whole host of research by climate change scientists.

Even more impressive, this article goes even further by raising climate awareness and educating its audience with an extensive FAQ to clarify the science behind their findings. This seamless blend of interactive and engaging design with relevant and sobering data from real research makes it an outstanding piece. It’s definitely not the first data journalism project to highlight the severity of the climate crisis but it’s surely one of the most poignant we’ve seen this year. Another plus: it is available in German and English.

Corredor Furtivo, Armando.info and El País

Captura de pantalla 2022 12 14 a las 13 50 52

“Corredor Furtivo” is a stunning crime investigation that used satellite imagery, artificial intelligence, databases, and on-the-ground reporting to create the most complete map of crime in the Venezuelan jungle.

This six-part series was published simultaneously by Armando.info and El País, with help from the Pulitzer Center's Tropical Forest Research Network and the Norwegian organisation, EarthRise Media.

The first step in this project was to develop an algorithm together with experts that serve to identify and link images similar to overhead shots of open-pit mines and secret airstrips allowing to find patterns in vast jungle. Then, this data was cross-checked by Planet.com, Google Earth and DigitalGlobe before the results from this database were finally transferred to a MapBox basemap that made it understandable for the audience. Later on, this same algorithm, written in open source, served as the beginning of Amazon Mining Watch, a project launched by the Pulitzer Center, which uses this methodology to study the Amazon as a whole.

When asked about the changes that this investigation has helped bring about, Joseph Poliszuk, one of the co-founders of Armando.Info and authors of the pieces, pointed to the Bolivarian National Armed Forces’ (FANB) unprecedented announcement of destroying illegal mining tracks in the Amazons.

The impact a project like this can have is what all reporting should aspire to. This is a great example of data journalism in that it relies on an open source algorithm and its methodology lends itself to further investigative, climate-based work and prompting governments to act.

When Women Make the Headlines, The Pudding

Captura de pantalla 2022 12 14 a las 14 02 56

The Pudding never disappoints with their creative and incisive articles and this publication about the misrepresentation of women in the news is another prime example.

To show how the language of women-specific headlines has (or hasn’t) changed over time, Sahiti Sarva and Leonardo Nicoletti developed this impressive and informative article entitled "When Women Make The Headlines." Using keywords linked with the word “woman” they compiled and analysed 382,139 headlines from the most prominent Indian, South African, British and American newspapers published in English between 2005 and 2021.

The pair manually curated two dictionaries to classify words used in headlines as gendered: first, those that are explicitly gendered in English, and second, those that denote social and behavioural stereotypes about women. We asked them about the process and they acknowledged that creating these dictionaries to match topics to headlines was one of the most time-consuming tasks. Of course, getting the interactive visualisations to work properly with such a large amount of data and juxtaposing it with a concise article that puts the findings into historical perspective was no small task either.

Despite all this, they rose to the challenge and wrote an enveloping article that presents a bevy of shocking findings without overwhelming the reader. We’re not the only ones to notice either, as the article has been shortlisted for the Information is Beautiful Awards 2022.

Just to scratch the surface, they’ve found that crime and violence, gendered language, empowerment, people and places, and racial/ethnic/identity categories are the prevalent themes for language in headlines. Many use words that are either explicitly gendered, such as "mother," "waitress," or "policewoman," or words that are associated with a particular gender or social sex because of stereotypes, such as "sexy" or "beautiful." They also found that not only does sensationalism increase, but it is consistently higher in headlines with keywords associated with women than in headlines about other topics.

This Github repository contains all the data used in this essay which was categorised and visualised using Python and d3.js.

Twitter, Tesla and Copious Emojis: What and When Elon Musk Tweets, The Wall Street Journal

Captura de pantalla 2022 12 14 a las 14 10 29

There has been a lot of talk lately about the impact of Elon Musk's purchase of Twitter, especially in the journalism community. We have seen a lot of debate on the platform about what that means, and many journalists have moved on to other channels. So we wonder what will happen to a community that has been very active in data journalism and data visualisation.

But who is Elon Musk, the person causing all this trouble?

This article in The Wall Street Journal answers this question. Penned by James Benedict, Rebecca Elliott and Inti Pacheco, the piece provides a useful analysis of Musk's whopping 15,000 tweets made over ten years, broken down by topic, time of day and other factors. The article shows what motivates and excites one of this CEO.

What’s even more interesting is that, this article was published in April of this year, six months before Musk’s splashy Twitter takeover. It gives an interesting perspective on how a rather volatile public profile has changed over these months and provides some pretext for what’s to come from a person who clearly doesn’t think twice before posting a tweet. Spoiler alert: content referring to electric cars and rockets were, of course, among his favourite topics at the time.

We spoke with one of the graphics reporters involved to learn how an analysis like this was created. Python was essential for this process and was used to scrape data, create charts and then layout the data in combination with pandas and Altair libraries. They also used some in-house tools to set up the scrolling text and design visualisations.

What were the challenges behind creating an article like this one? For Benedict, the hardest part was setting up a system that allowed for a quick and straightforward visualisation of the data. The raw data was very messy and it took a great deal of trial and error to get it right. Benedict’s biggest piece of advice was to incorporate the visualisation early in the analysis as a means to help out the entire reporting team.

The piece provides a sobering if not entertaining look into the mind of one of the biggest personalities of the year.

Mapping Iran’s Unrest: How Mahsa Amini’s Death Led to Nationwide Protests, The Guardian

Captura de pantalla 2022 12 14 a las 14 11 55

Sparked by the death of Mahsa Amini, a 22-year-old Iranian woman of Kurdish descent who was detained and allegedly killed while under arrest, a massive wave of protests in Iran has continued with little to no respite this year. With internet shutdowns and online censorship throughout Iran, media outlets and journalists were given a challenge to cover and convey a new wave of political unrest to the outside world.

This article by Niels de Hoog and Elena Morresi, published by The Guardian, resonated with us. The piece alternates between maps, videos and real-world footage depicting the public outcry and retaliation across the country.The written text in the article gives context to help readers understand what was happening on the ground while also revealing the evolution of the protests since they began in September.

Data from Armed Conflict Location and the Event Data Project were used to create the maps which were then developed by MapLibre, a free and open-source alternative to Mapbox. The interactive elements of this article were built using Svelte.

When asked about what’s most demanding in creating such a story, one of the authors, de Hoog, advised: “get the story out as soon as possible while it's still relevant.” He also praised his fellow reporter Elena Morresi for sifting through the available footage to find the most relevant content to the narrative. This, along with his previous work with the scrollytelling component, allowed them to compile the piece quickly and with such stellar results.

If you want to learn more about the situation in Iran, listen to our podcast interview with Marketa Hulpachova from Tehran Bureau, an independent investigative outlet focusing on data journalism and how internet shutdowns and limited access to freedom of information impact the work of journalists inside the country.

Wordle, 15 Million Tweets Later

Captura de pantalla 2022 12 14 a las 14 15 24

If there is one trend that defined 2022 we can all probably agree on Wordle. Even if you weren’t one of the few that never played it, you couldn’t deny it was the talk of the town with everyone tweeting those cryptic yellow and green grids all over the world, and in many different languages no less. According to Google's annual report, it's even one of the most searched words. But how did it grow and what makes one Wordle harder than others?

This analysis done by Robert Lesser provides some insights into those questions. Lesser had to start by analysing the mountains of data of tweeted results from the game using a semi-automated JavaScript programme. As the number of Wordle tweets posted grew exponentially so did the time spent running the script. Eventually he got access to Twitter API and could write a fully automated programme. The entirety of the data exploration, analysis, visualisation, and publication was done in Observable.

If anyone reading the piece is interested in working with the full dataset, collaborating on related data projects, or has entirely new project ideas they want advice or help on, you can contact Lesser here.

Violência eleitoral: noite da votação teve pico de assassinatos, Agencia Publica

Captura de pantalla 2022 12 14 a las 19 31 16

Brazil dominated headlines when far-right president Jair Bolsonaro was defeated in Brazil’s general election by left-leaning Luiz Inácio Lula da Silva Lula won 50.9% of the vote in one of the country's most tense elections. The results were closely followed not only in Brazil, but around the world. As the biggest economy in the southern hemisphere which has more rainforest within its borders than any other country, people wondered how these political changes would affect the world.

The results of these polarising elections led to protests, riots and clashes between opposing supporters and an unexpected increase in political violence. "Violência eleitoral: noite do voto teve pico de assassinatos" gets to the bottom of these acts of violence. The team at Agencia Publica knew that political violence was a problem in this election, but there was no official database that recorded these incidents. So they set out to create such a database which identified and mapped the crimes reported by local and national media. This involved setting up a Google alert and scraping the internet using the Buzzsumo app. Next they created a spreadsheet with all the cases. The team, which included more than 13 reporters and editors, followed its own methods and standards for defining political violence in elections and identifying cases. But that was just the beginning of the project. Reporters then spoke with those involved to confirm the facts to ensure accurate reporting.

The organiser of the initiative, Giulia Afiune, told us about the difficulties and psychological strain of working on such a project and facing such violent incidents. "It was difficult for us to communicate with the people who were supposed to confirm the alarm. Many of these sources were still scared and it was terrifying to grasp their condition and the extent of violence in the country".

Over 300 cases of violence were investigated, with 40% of the perpetrators being Bolsonaro supporters. Meanwhile, 7% of the attacks were committed by Lula supporters.

This project shows what is possible when merging investigative journalism and databases. What's more, this article provides a public service to its audience and society thanks to the tireless work of Agencia Publica.

You can read the story in English here.

What Qatar Built for the Most Expensive World Cup Ever, Bloomberg

Captura de pantalla 2022 12 14 a las 19 36 13

One big news event that captured the world’s attention was the commotion made over the World Cup in Qatar. One of the most talked about controversies to surround the event was the poor working conditions of the labourers the who built the infrastructure for the tournament. FIFA’s decision to hold the competition in a country that has a failing human rights record was also highlighted in the media by fans, critics and activists.

This Bloomberg article by Simone Foxman, Adveith Nair and Sam Dodge is one of the most thorough examinations of the sheer scale of renovations and logistics that Doha underwent to host the event. Amidst a changing background of aerial photographs chronicling the progress and eye-catching visuals backed by smart research and satellite data, this piece breaks down the $300 billion spent in preparation for the kickoff. As you scroll, you see the construction of the eight stadiums, seven of which are brand new and one of which has been renovated, and other massive efforts that have transformed the city to accommodate fans and staff necessary to host an event for over 3 million ticketholders.

Now that the competition is coming to an end, it is time to reflect on its influence and impact on the host city and its population. You'd be hard-pressed to find a piece of news that does it better than this.

We hope you’ve enjoyed this roundup. If we missed anything or you want to comment on any of the projects, come join us on Discord or let us know in this form and we'll consider it for the next roundup.

Happy holidays and enjoy the fantastic journalism to come in the new year.

subscribe figure