Politics and probability
Conversations with Data: #60
Do you want to receive Conversations with Data? Subscribe
Welcome to our 60th edition of the Conversations with Data newsletter. With the 2020 US presidential elections less than a week away and millions of advanced ballots already cast, the world is waiting to see who will take the White House on November 3rd.
To help journalists navigate the political twists and turns of the election, this week's Conversations with Data podcast features Micah Cohen, FiveThirtyEight's managing editor. He talks to us about the power of probability and the uncertainty in election polling data.
What we asked
Tell us about yourself and your work at FiveThirtyEight.
I'm the managing editor at FiveThirtyEight where I run the newsroom, plan our coverage and help a really talented group of journalists do the best work they can do. I've been working with FiveThirtyEight since 2010. Before that, I worked at The New York Times when Nate Silver, who is the founder of FiveThirtyEight, came there. The Times had the licence to FiveThirtyEight for three years, then ESPN bought it in 2013. That was the point at which we began growing and went from a staff of two to 30 people. After I graduated from Tulane University, I went to The New York Times and then FiveThirtyEight.
Talk to us about FiveThirtyEight. How was it founded, and why is it called that?
Nate started blogging about politics during the 2008 cycle, and eventually, he called his blog FiveThirtyEight, which is the number of electors in the Electoral College. So that's why to win the presidency, you have to have 270 electoral votes, which is a majority of 538. Nate started it because he was getting frustrated with how the Obama Clinton Democratic primary was being covered. He started writing about some common mistakes the media was making in terms of dealing with the 2008 delegate race.
FiveThirtyEight shows Biden is favoured to win the US election. What exactly does that 'favoured' status mean?
We have a statistical model that takes in data polls, the economic indicators and spits out win probabilities for each candidate. Those win probabilities are 0 to 100 percent. Biden has an 84 percent chance. The issue is how do you communicate what that means to people. Even very numerate people don't always have a really good sense of what a probability means. For example, right now, Biden has an 84 to 85 percent chance of winning. A lot of people will just round that up to 100 percent and will interpret that as Biden is certain to win the election, even though that is a one in six chance. That's about the same odds as if you're playing Russian roulette and you would get the bullet right.
In 2016, many polls showed Hillary Clinton was favoured to win the U.S. presidential election. Are there any indicators that show the 2020 polls are a repeat of what happened in 2016? Has the forecast's methodology changed or the way you communicate about it?
The answer to the first question about the methodology is no. Our forecast works largely the same way it did in 2016 and largely the same way it did in 2012. We make refinements and improvements every election cycle. There are some differences in how the forecast handles economic data and there's this new uncertainty index along with COVID specific elements such as mail-in ballots as a source of uncertainty. But the bulk of the forecasting methodology is the same.
The reason we didn't need to really change anything is that we thought the forecast largely worked the way it was supposed to in 2016. It gave Trump about a one in three chance of winning by election day. It identified his most likely route to winning, which was overperforming in the Electoral College relative to the popular vote.
And by the way, all that was was evident in the data. National polls had Clinton up by a few percentage points, and she won by a couple of percentage points. But polling at the time showed Trump performing better in the swing states than he did nationally. So there was reason to think he might overperform in the Electoral College. Now, there were some states where the polls were just off. But that was more of a state-specific problem than a national polling problem. The model is largely the same, but we are communicating what it shows differently this year.
How is FiveThirtyEight communicating that uncertainty differently?
Our forecast starts with a series of maps that are meant to represent the range of likeliest outcomes. So if you go to the page, you'll immediately see it says "Biden is favoured". Then you have a bunch of red and blue maps. As of right now, there are more blue maps (Democrat) than red maps (Republican). So right away, you get a sense that Biden is favoured. But these different maps are meant to show you there is a range of possibilities that are consistent with the data. And that's really the key.
How does this uncertainty impact the way you visually design the forecast?
The whole reason we do a forecast is that polls are our best way of measuring public opinion. They're also our best tool for understanding the state of an election. But they aren't exactly right. There's a margin of error and other sources of error. If the polls were perfect, we wouldn't need a forecast at all. We would just take the polling average in every state and say Biden's going to win this. The forecast is really designed to measure how likely it is that the polls will be wrong.
We are leaning more into what are the sources of uncertainty. What are the chances that the polls are going to be wrong or something unexpected is going to happen? And how could that happen? I think the way we write about the forecast, talk about the forecast and the way our visual journalists design the forecast, leans into that uncertainty and explains the range of possibilities. We're trying to present a more nuanced picture, not a binary one.
How do independent voters play out? Is there any data showing how they might impact the election outcome?
Biden is really crushing Trump with Independent voters. And that is one way Biden's polling is really different than Hillary Clinton's from 2016. There are way fewer undecided voters in 2020 than in 2016. For instance, there was a ton of undecided voters up until basically the last couple of days of the campaign in the last election.
Every election, we ask what group is going to swing this election and determine the outcome. Is it independent voters, NASCAR dads or soccer moms? The way elections normally work is a candidate will do better or worse largely across the board. Trump right now, according to the polls, is losing by a wide margin. That's in part because independent voters largely favour Biden. But it's also because every other group of voters has soured on Trump a bit. If you'd look at Trump's approval rating, it is worse now among suburban white women and among Latino voters than it was four years ago. So, yes, Biden is winning independent voters, but he's winning a bunch of other demographic groups, too.
Finally, what do you think journalists need to be mindful of when reporting the results on election day?
The first thing I would say is to be careful. Take a moment to think about what you are writing. What are you showing? If it's TV, what do you show graphically? What is the potential to confuse things?
At FiveThirtyEight, we are trying to get our readers at least used to the idea that it might take a while to count the vote and not to assume that if it takes a few days or even a couple weeks to count the vote, that it is inherently a sign of a rigged process or a botched election. There's this pandemic, and the country has to adjust its election administration accordingly. It's therefore going to take us longer to count the vote this year than it has in other years. So that's one thing.
On another note, it is also important to be more careful about how you are talking about the results. Think about how you are showing the results and be much more specific in describing for readers and viewers what kind of results we are getting. And most importantly, explain what we still don't know about the election results and lead with that. Taking those precautions better prepares readers and viewers for what could be a confusing night and keeps everybody on the same page.
Latest from DataJournalism.com
The more journalists know about polls, how they work and how to evaluate their quality, the closer they come to clarity and accuracy in reporting. In Sherry Ricchiardi's latest Long Read article for DataJournalism.com, she provides useful resources and expert advice on how to cover opinion polling data in the upcoming 2020 presidential elections. Read the full article here.
The final News Impact Summit has been pushed back to 24-26 November. Data Journalism: Build Trust in Media is a free live-streamed event held on YouTube hosted by the European Journalism Centre and supported by Google News Initiative. The full programme with confirmed speakers will be announced soon. Register here.
Our next conversation
In the next episode of our Conversations with Data podcast, we will speak with economist and journalist Tim Harford about his latest book "How to Make the World Add Up: Ten Rules for Thinking Differently About Numbers". As one of the UK's most respected statistical journalists, he is best known for his long-running Financial Times column "The Undercover Economist" and his BBC radio show More or Less.
As always, don’t forget to let us know what you’d like us to feature in our future editions. You can also read all of our past editions here.
Tara from the EJC Data team,
P.S. Are you interested in supporting this newsletter as well? Get in touch to discuss sponsorship opportunities.