Data journalists get their ideas in a range of ways — from questions and tip-offs to news events and data releases.
But if you’re new to the field you can often struggle to come up with inspiration. If you’re looking for data journalism ideas, then here’s a guide to different ways to generate them — and the types of stories they might produce.
Probably the best way to get started with data journalism is to work from scheduled data releases.
These are datasets typically published by public bodies, such as a national statistics body, ministry or local government, open data portal, or international organisation (the World Bank and UN are two examples).
New data releases solve two challenges with data journalism story ideas:
- The “what’s new?” question (“new data says X” is the obvious new thing, even if the time period covered was last year); and
- Getting hold of the data
(Other sources of ideas, outlined below, will require you to work harder on both challenges).
The downside of data releases as a source of ideas is that the release will also be seen by lots of other journalists — so you’ll have less time to turn that data into a story before it becomes ‘old news’.
For that reason data release-driven stories need to be relatively simple: you’re likely to be analysing data to find out the scale of something; how much it has changed; or how different areas or categories rank in terms of a particular issue (e.g. which area or category is worst affected or where does your local area rank).
Data releases are typically published to a pre-announced schedule, which can be found on the organisation’s website by searching for “data release calendar” or similar terms, along with the organisation’s name.
It’s likely that there will be previous releases of the same data that you can look at, for example.
This will help you anticipate what information will be contained, what sort of shape it will be in, and what sorts of techniques you’ll need to use with it.
You’ll also get an idea of the language involved — including any terms you need to better understand (e.g. how is “homelessness” defined? What is a “frequent caller“?). This can lead to background reading and research.
You can make sure that you understand how the data was gathered (for example is it based on a sample), and what it can and cannot say as a result.
Finally, you can research the topic itself, too, to understand what might be the most newsworthy dimensions of the data — and which people (experts, politicians, charities, representatives, etc.) you might approach for interviews.
All of this will mean you are better prepared to turn that data into a story more quickly — and do it better — than the competition.
A news story like this can trigger a data-driven follow-up to establish the scale or trend of similar events
Many data story ideas are driven by a news event. This prompts the journalist to look at data to put that event into context. Typical examples include:
- A physical event occurs (e.g. a particular type of crime; an environmental event; an emergency event; a protest)
- A verbal event where a claim is made (e.g. a politician blames a problem on a particular cause; concerns are raised about something)
- A political event where an announcement is made (e.g. a new policy, new funding, or new laws to tackle a problem)
Different types of news events can generate different questions. Physical events can prompt questions like “How common is this type of event?” or “Are there more of these events than there used to be?”
Similar questions might be asked in relation to claims and announcements, but might also focus more on the basis for those statements.
For example: “Does the data support the claim/concerns?”; “Does the problem justify a new law/policy/funding?”; “Is that funding enough to make an impact?”; “How effective have similar policies/laws been before/elsewhere?”
A news event-driven data story is an example of “moving the story on” — it’s often done in newsrooms by seeking new reaction to a news event from someone newsworthy (e.g. a politician, industry representative, spokesperson or celebrity), or looking for new action being taken (e.g. by politicians or political bodies, charities or businesses), or new information being revealed.
In this case data is providing the new information that moves the story on.
In 2016, for example, five dead sperm whales were washed up on beaches in the east of England within days of each other. That prompted the BBC England Data Unit to look at data on how many of the sea creatures die every year on UK shores.
This data-driven story was a follow-up to the news story above on a beached whale.
The data story was completed within 24 hours, and that’s important: the window of opportunity for most news-driven ideas is small and the story will need to be completed quickly because any data journalism story is only likely to be newsworthy during the few days following the event — and in some cases, only on the same day.
Most news event-driven data journalism stories, then, are likely to be technically simple and focused on similar calculations to data release-driven stories: finding out the scale of the type of issue that’s been in the news; whether events related to that issue have increased or fallen (or “failed to improve”); or an angle relating to ranking.
Anything more complex means the story will take too long, and it will no longer be newsworthy because that event, claim or announcement will have slipped off the news agenda.
That doesn’t mean the end of your idea. Events can recur, of course, and claims or announcements will often be followed up later by further announcement or claims, so you can prepare data in advance in anticipation of those events.
If you want to spend more time on a data journalism story, one way to come up with ideas is to look for an example of data journalism that can be applied to a different place, category, or time period.
Freedom of Information (FOI) stories are a good example of this.
If someone wrote an FOI story a few years ago about the amount of bike thefts in your area, for example, you could:
- Update the same idea by repeating the FOI request, this time asking for the most recent few years
- Apply the story to a different area (you could look at the wider region as a whole, for example, or see how your area ranks nationally)
- Adapt the idea to a different category of crime, such as car theft (different category)
Many other data journalism stories can be adapted in the same way.
You might see a story about the increase in floods, for example, and consider adapting it for earthquakes, or bringing it up to date, focusing it on a particular country, or expanding it to a global angle.
The challenge, however, is to make that story newsworthy. Unlike data release-driven ideas, where the newness of the data gives you a news hook, or like news event-driven ideas, where the event gives you the topical interest in the subject, an example-driven idea may not necessarily be newsworthy.
FOI stories have an advantage here: they are exclusive, and their news hook tends to be based on the fact that they “reveal” something. A typical intro might use the phrase “figures from X reveals” or “analysis/an investigation by [your publication] has revealed”.
But if you are using other examples as a template for your own it’s important to ask why the original story was published.
It may be that there was some news event, or new dataset, that prompted it.
It may be that the issue is much more important in that area or category than in the one you plan to apply it to.
Understanding what made it newsworthy in the first place can help you to identify if it’s newsworthy now — or what might make it so again.
For example, it might lead you to find out that some new data is out soon, or to prepare a story for the next time the particular type of event occurs (if it’s an event that happens regularly enough to be confident that will happen).
One useful strategy when the data isn’t new enough to be newsworthy on its own is to focus the story on some reaction to what you find.
For example you might get an interview with a charity or politician to comment on what you find, so that your story can lead on their ‘call for action’ or ‘concerns raised after data shows X’. (Occasionally it might go even further and action will be taken after you share your findings, which makes an even stronger story)
If you can’t find something new about the issue you’re about to to dig into — rethink your story. It might be that there’s not a strong enough justification to do it.
Put it to one side and look at other ideas instead — you’ll probably come up with a better one.
A story like this may well start with the headline phrased as a question.
Some of the best ideas in data journalism come from a simple question: are women’s pocket sizes really as ridiculous as they seem? How widespread is discrimination against people in benefits in the rental sector? How have Europe’s prisons fared in the Covid-19 pandemic?
The resulting story is likely to be the answer to that question as a news story about what you reveal (e.g. “X% of properties won’t accept people receiving welfare”), or an exploratory feature (e.g. “Here’s how prisons were affected by the pandemic”).
Questions have the benefit of originality, and can be quirky too — but they come with the risk that no data exists to answer that question.
The most important stage with a question-inspired idea, then, is to scope what data exists, and how practical the story might be.
- How would information be recorded about this activity?
- Who would record it?
- Is that one organisation or multiple ones? (For example different private companies)
- How likely would they be to share that data?
- Has anyone conducted surveys about the activity based on a representative sample?
- If direct information doesn’t exist, what proxy data might exist? (i.e. data which indirectly measures that activity, such as people mentioning it or searching for it online, or data about secondary effects of the activity)
- Would I be able to collect enough of this data myself if I have to?
If your question is “How much wrapping paper is thrown away every Christmas?“, for example, you are likely to quickly come up against the problem that when a person throws away wrapping paper no one records it.
The next step might be to identify if an organisation has collected survey data which answers your question.
You might contact charities who have campaigned on the issue of waste (e.g. environmental charities); or search Google Scholar for any academics who have researched the field and might be aware of any data.
Be prepared for any data that you do find this way to be unsuitable for your purposes (at least on its own). Surveys may have been conducted some time ago, or with a very small sample, or in a different country. They might still have some relevance to your story idea (they might be worth a sentence), but you’ll have to admit they’re not strong enough to carry the story on their own.
Another option is proxy data. Proxy data has been widely used to tell different stories about the impact of COVID-19. As I wrote during the first months of the pandemic:
“Air pollution data, for example, can be a proxy for transport activity; energy consumption data can be a proxy for economic activity; waste collection data is a proxy for people moving away or working elsewhere. A spike in people dying at home can raise questions about what that indicates. Social media chatter and search trends are regularly used as proxies for behaviour, too.”
An alternative to proxy data is to collect the data yourself.
In the case of the story about pocket sizes, Jan Diehm and Amber Thomas made measurements themselves of “80 pairs of blue jeans from the most popular and widely available brands in the US … at brick and mortar stores in Nashville, New York, and Seattle.”
For the story on discrimination in the housing market I scraped data from thousands of adverts for rental properties.
When ITV News wanted to do a story on the impact of funding cuts in schools they asked the National Association of Head Teachers to distribute a survey to thousands of school heads, and then asked students on the MA in Data Journalism at Birmingham City University to help analyse the results.
In each case it’s important to collect enough data: spend time considering what is going to be a big and representative enough sample of the phenomenon you’re trying to tell a story about (a survey of a few teachers via your social media isn’t going to cut it, for example).
You might also consider focusing your story on the lack of data itself. In 2016, for example, I worked on a story about how many schools were failing to publish transparency data, while one British Medical Journal investigation focused on the fact that "medical schools are failing to monitor racial harassment and abuse of ethnic minority students". Missing Numbers contains dozens of stories about gaps in public data, and the books Invisible Women and Data Feminism provide many examples of stories focusing on how a lack of data has real world impacts.
None of these are small projects, so it’s important to decide if the story justifies the effort involved (and will still be newsworthy by the time you’ve got the data you need), and if you’re passionate enough about the project to see it through to completion.
Tip-offs are essentially ideas that come from other people: “You should look into X” or “It would be great if someone found out Y” or “My organisation has seen a big increase in Z”.
They are similar to question-driven ideas but have a couple of key qualities which lead me to treat them separately:
- First, tip-offs tend to suggest that there is something of note to find (whereas questions may not lead to something newsworthy); and
- Second, the person doing the tipping off may also be able to provide advice in obtaining data or interviews (see exclusivity-driven ideas below).
But the problems facing tip-offs are also the same: it may be difficult, or not possible, to get hold of the data that you need.
You will need to ask the same questions as with question-based ideas: about how and where such data might be recorded (including surveys); who may hold the data; how likely they would be to share it; what proxy data might exist; and whether it’s practical for you to compile data yourself if none exists.
Sometimes ideas will come from a dataset that you have obtained exclusively. This can happen in a number of ways:
- An organisation approaches you with some data that they’d like to share exclusively
- An individual approaches you with some data that they’ve obtained that no other journalists has access to (for example, a leak, or data that they’ve scraped, or data from some unpublished research)
- You approach an organisation or individual to obtain data for a story
Strictly speaking, in this last type of exclusivity the story idea has already been identified — it’s not generated by the exclusivity of the data. But the exclusivity does give the idea extra value.
But in the other types of exclusivity you may still need to find story ideas in the dataset you’ve been exclusive access to. The provider may think the data is inherently newsworthy — but its exclusivity alone won’t mean it contains a newsworthy story, or justify the time and effort involved in transforming it into one.
But exclusive data from an academic study, or market research by a corporation, will require careful consideration before you commit resources to a story.
And even apparently juicy leaks from a whistleblower can turn out to reveal nothing new — only information that is already public.
The key questions to ask when being offered exclusive data are: “What does this tell me that is newsworthy?” and “What’s their motivation for giving it to me exclusively?”
It’s worth bearing in mind, for example, that exclusivity is sometimes used to try to manipulate journalists (a reporter may be offered an exclusive interview on condition of “copy approval”, for example).
Exclusivity can also be created artificially in order to appeal more to journalists.
Worst of all, it can cause a reporter to treat the material less critically, as the exclusivity of the story transforms it into something ‘owned’ by the news organisation themselves, and that they must defend.
In some cases a sunk cost fallacy can mean effort continues on a story even when it becomes apparent that it has significant flaws.
Suppliers of exclusive data are inevitably going to attach more value to it than the casual reader, either because it’s in their specialist field, or — in the case of a large cache of data scraped by a civic hacker — the skill involved in obtaining it (again, a potential sunk cost fallacy).
The Telegraph spent a week exploring the expenses data before deciding to invest time in reporting it.
The data journalist’s role is to remain sceptical about that value, and focus instead on what value it might have to the reader, in terms of the stories that it contains.
When the Telegraph newspaper obtained a massive leak of information on politicians’ expense claims, for example, they spent a week inputting that data to identify its news value before deciding to commit to it as a source of stories (the data went on to provide so much material that the newspaper dominated the news agenda for six weeks).
Drawing on the 7 common angles for data stories can help you assess the potential editorial value of the dataset. What stories can this data reveal about the scale of something newsworthy? About change?
Can it reveal where places, categories or organisations rank? Unfair variation between different parts of the country or society? Relationships that raise questions? Individual leads?
The dataset may provide the basis for an exploratory feature: when Art UK approached the BBC England Data Unit with an exclusive dataset on 200,000 of the nation’s oil paintings, we wrote a story on different dimensions of art that the data revealed (most were focused on ranking — but it also included the scale of paintings donated in lieu of tax, and the scale of works by unknown artists).
What would a map of literary road trips look like? What if we could test out different conditions for a pandemic outbreak? Could we make a calculator so people know how much food they really need to stockpile?
These are ideas driven by a sense of play — of curiosity, experimentation, and exploration.
Play-driven ideas often draw on the interactive possibilities of data-driven storytelling: the fact that, once information is structured into data, it can be used to create tools, personalised experiences, ergodic storytelling, exploratory maps, simulations and games.
So ideas in this category tend not to revolve around stories as such, but formats where the story is generated by the user themselves through their own play — their user journey.
This is reflected in the way such stories are pitched to users through their headlines — a relatively new approach which you’ll need to get to grips with when explaining your idea.
It might raise a question, rather than answer it, for example. What Happens Next? invites you to explore COVID simulations; Based on a True True Story? invites users to explore visualised timelines of films true-ness; and Where My Money Dey? allows you to explore public spending.
It might issue a call to action (sometimes after raising a question), as MIT Technology Review do with Can you make AI fairer than a judge? Play our courtroom algorithm game. Or the BBC do with Check NHS cancer, A&E, ops and mental health targets in your area. The LA Times’s Every shot Kobe Bryant ever took. All 30,699 of them, in contrast, includes an implied call to action (“explore”).
It might simply hint at the journey that you are about to take. Reuters Graphics’s “Breaking the wave“, for example — itself a headline which implies a question — leads on “Measuring the death toll of COVID-19 and how far we are from stopping it.” Similarly Disease modelers are wary of reopening the country. Here’s how they arrive at their verdict suggests we will be following an exploratory path.
Some simply describe what the format is: Can it pass the Senate? Interactive Australian voting calculator, for example. Or Evacuating Afghanistan: a visual guide to flights in and out of Kabul, for example (note how it still frontloads the headline with a topical hook). Or People of the Pandemic: a hyperlocal cooperative simulation game.
Because of the interactivity involved, play-driven ideas are often (although not always) resource-intensive and technically demanding.
For that reason they tend to suit fields that the author (and audience) are passionate about, such as sport and music — or issues that are long-running (climate change; the pandemic) or regular (and major) enough to justify the investment.
This is why elections, budgets and quadrennial global sporting events tend to feature heavily in this category.
These areas are a good place to start if you want to generate a play-driven idea. How can you make an election or World Cup the basis of an interactive tool? How can you create a new way of looking at, or exploring, a long-running issue? How can you turn your passion for music into a data-driven feature that others can explore?
Pick the idea that suits your skillset and time
Ultimately there are pros and cons to all the sources of ideas outlined here. Some are quick but less impressive; others are complex but potentially eye-catching. The key is to pick a project which is achievable given the time and skills you have at your disposal.
If you’re just starting out with data journalism, data release-driven ideas are one of the best ways to get started: you can plan ahead, work with previous releases, and demonstrate core data journalism skills.
Once you’ve built those core skills you can challenge yourself further in a direction that fits with your professional objectives.
If you’re interested in coding and interactivity, try a play-driven idea; if your focus is more on exclusive newsgathering, try adapting an example of a previous idea using FOI, or cultivating sources who might provide you with exclusive access to a dataset.
And if you’ve already tried a number of the approaches listed here, use the list to identify one you’ve never tried before.