The 2018 World Cup might be over, but that doesn’t mean we can’t enjoy the data journalism that it inspired.
Throughout the tournament, journalists used data to answer questions about odds during penalty shoot-outs, how well teams have been playing, and, of course, who will win. At The Economist they even looked at why these predictions will probably be wrong.
To explore these questions further, we asked readers of our Conversations with Data newsletter to nominate their favourite projects. And they delivered! Based on their detailed responses, we collated a full roundup of data journalism that dissected, predicted, and queried the World Cup.
1. Which teams should have made the second round? (NZZ)
‘We wanted to give our users a fresh, data driven perspective on the old drama each World Cup creates: Which teams go through to the knockout stage, which ones need to pack their bags and go home?
Expected Goals provides a data point that allows us to measure how well each team should have fared, based on the chances they created and allowed. Here’s how we turned raw data into a DDJ-piece: We used an R script to process the raw data from Opta Sports, make the calculations we needed, and create an output file for each graphic we needed for the article.
Most of the simple graphics in the article, i.e. mostly tables, are created with our own storytelling toolbox Q, the other graphics are either handcrafted in Sketch or done in R and then polished in Sketch. The process meant that most of the article and most of the graphics could be produced before the final data was in, which then made it easy to get such a large piece online shortly after the group stage had finished.’
-- David Bauer, NZZ
2. Which country has the most expensive squad? (The Telegraph)
‘As well as doing larger projects such as our World Cup forecasting game, we have produced a series of more specific data-led stories such as this one where we ranked each 23-man team by their total estimated transfer value.
To source the data, we used OutWit Hub to scrape Transfermarkt. We then crunched these figures to rank the teams, writing a couple of paragraphs one each and then producing one small visualisation for each of them. These were beeswarm plots (made in RStudio with the ggplot library) that plotted each player on the same axis. This meant that, even for a reader scrolling through the page quickly, they could not only see which team is the most valuable but they could also see how each players' individual valued contributed to this.’
-- Ashley Kirk, The Telegraph
3. What does World Cup history tell us about the tournament? (Spiegel Online)
‘Profound analyses of football matches often requires data that’s hard to obtain and a lot of domain knowledge in order to come to really meaningful conclusions. But there are also World Cup related stories that can be told with open data and without in-depth football knowledge. Our story on the history of football clubs and the world cup squads is such an example. We analysed and visualised which clubs send most players to the World Cup, who dominated in which decade and how big the share of players from the big five European leagues as well as the share of players playing abroad are over time. All the data behind the story came from Wikipedia sites like this one.
The structure of these sites is pretty consistent, so scraping the information was no big deal. For presenting the story we’ve chosen to include a data table that enables to user to search for his/her favorite team as well as bar charts and a slope chart to show change over time. The latter proved really helpful to show how English clubs prospered in the last two decades, while Dutch and Belgian teams lost a lot of their reputation.’
-- Patrick Stotz, Spiegel Online
4. What is the ideal team lineup? (Towards Data Science)
‘With the World Cup 2018 happening this summer in Russia, every soccer fan around the world is eager to make their prediction on what team will win this year. Another looming question for the fans is how their favourite national teams should line up: What formation should be used? Which players should be chosen? Which ones should be left on the bench or eliminated from the tournament?
An enthusiastic soccer fan myself, I started thinking: Why shouldn’t I build my own dream formation for my favourite teams at the World Cup? As someone who loves data science and has grown up playing FIFA, it came to my realisation that I can use the data from EA Sport’s extremely popular FIFA18 video game released last year to do my analysis.
Here is the step-by-step approach I used:
- Get the FIFA 18 dataset from Kaggle
- Do some exploratory data analysis and data visualisation on important player attributes
- Write functions to get the best squad according to the players' overall rating
- Apply the functions to derive results for the 10 national teams
- Compare the results and make predictions on the potential winner
I had great fun playing with Python's various utilities for this project. Definitely a must-know tool for any data journalists looking to bring the magical power of statistical thinking to the sports realm.’
-- James Le, Towards Data Science
5. What are the chances of the best teams winning? (Wired)
‘There is a lot of noise and narratives around the World Cup. Everyone has an opinion. Analysis of the data can either prove or refute these opinions, while at the same time providing a new perspective on something everyone knows so well.
The other great thing about data in sport is that it also allows you to make testable predictions -- the results are there for everyone to see. In this article I utilised our models to make forecasts about how results impacted the chances of some of the best teams' chances of winning the World Cup.’
-- Omar Chaudhuri, Wired
6. How many club teammates opposed each other? (Reuters)
‘One part of the World Cup that always interests me is that players transition from teammates on professional clubs to opponents on national squads. I pitched this idea to Reuters and they were interested in supporting it. There may be simpler ways to show these overlaps, but using a Voronoi treemap creates a beautiful display and one that evokes the shapes on a football.
It is produced using D3.js and the data came from a number of sources (It's trickier than you might think to find the league of every professional club in the world). Some excellent editors at Reuters -- Matthew Weber and Simon Scarr -- helped to shape a narrative structure to lead readers into the concept.’
-- Andrew Garcia Phillips, Reuters
7. What does historical data tell us about each group? (Mundo Deportivo)
‘The first step was to put together an exhaustive compilation of all historical data from the World Cups. Since this resulted in a huge database, we used new narratives to showcase data that otherwise could have been difficult to analyse and understand. The outcome is an illustration of historical data behind all of the teams participating in the Russia World Cup, filtered by groups.
Data was sourced from the FIFA Archives, The Rec.Sport.Soccer Statistics Foundation and Los Mundiales de Fútbol. Once all the data was gathered, I researched specific data that needed to be depicted in order to achieve the final structure.
It is produced using D3.js and the data came from a number of sources (It's trickier than you might think to find the league of every professional club in the world). Some excellent editors at Reuters -- Matthew Weber and Simon Scarr -- helped to shape a narrative structure to lead readers into the concept.’
-- Ferran Morales, Mundo Deportivo
8. Which teams have the most foreign born players? (National Geographic)
‘There is a rather shallow notion the surfaces in the United States (my home country) that the World Cup features teams made just of national stereotypes. People joke about 11 stern Germans taking on 11 flashy Spaniards, or a speedy Senegalese team up against a group of tall Dutch players. But when you dig into the data on the origins of international soccer players, you find a lot of diversity in many countries, which I wanted to show to subvert that tired narrative.
My initial idea was to make a map that connected every country that used a foreign-born player during qualifying for 2018 with that player’s place of birth. Ultimately, the map was too complicated and required too much space in the magazine. Instead I found a circular “chord diagram” featuring only the 32 teams in the 2018 World Cup focused the graphic effectively and could fit on one page. In the graphic, 32 nodes with 26 connections representing 97 foreign-born players had a compelling tangle of information but was not so aggressive that it was too much for a reader to untangle. With the circular approach, I also benefited from having room on the corners to provide additional pieces that fed into the main diagram.
Although the superlative of the dataset, France (the origin of 35 qualifying players), is purposefully oriented to the top left of the graphic as a starting point, I hoped readers could enter the graphic from any angle because contextual information surrounds chord diagram. While these smaller pieces serve as keys for the main graphic, they are also meant to convey their own addition to the reader’s understanding of World Cup soccer. For example, the map describes the colours used in the diagram, but also teaches a reader about the regional qualification phase.
These side nuggets of information were critical because the graphic needed to be understandable and relatable to an audience that was not necessarily soccer-savvy.’
-- Riley D. Champine, National Geographic
9. Who are the most valuable players? (freeCodeCamp)
‘What could I do with access to a register of every time a player touches the ball during one of the World Cup games? Previously I created a visualisation showcasing where each goal had come from, but this time I wanted to use all the data available (not just the goals). Step one was to find out how many times each player had touched the ball, revealing who had more play time. That was interesting, but results got even better by looking at where in the field they were each time - revealing if they were attackers, defenders or in between. With BigQuery I was able to construct this query pretty quickly, and Data Studio helped me produce an interactive dashboard ready to be posted online.’
-- Felipe Hoffa for freeCodeCamp
Like what you read? Subscribe to our Conversations with Data newsletter for more tips like these and the exclusive opportunity to ask other experts questions about data journalism.
9 questions about the World Cup, and how data journalists answered them - Dissecting, predicting, and querying through data
9 min Click to comment