Data Journalism at the BBC
Written by: Andrew Leimdorfer
The term ‘data journalism’ can cover a range of disciplines and is used in varying ways in news organizations, so it may be helpful to define what we mean by ‘data journalism’ at the BBC. Broadly the term covers projects that use data to do one or more of the following:
-
Enable a reader to discover information that is personally relevant
-
Reveal a story that is remarkable and previously unknown
-
Help the reader to better understand a complex issue
These categories may overlap and in an online environment can often benefit from some level of visualization.
Make It Personal
On the BBC News website we have been using data to provide services and tools for our users for well over a decade.
The most consistent example, which we first published in 1999, is our school league tables, which use the data published annually by the government. Readers can find local schools by entering a postcode, and compare them on a range of indicators. Education journalists also work with the development team to trawl the data for stories ahead of publication.
When we started to do this there was no official site that provided a way for the public to interrogate the data. But now that the Department for Education has its own comparable service, our offering has shifted to focus more on the stories emerging from the data.
The challenge in this area must be to provide access to data in which there is a clear public interest. A recent example of a project where we exposed a large dataset not normally available to the wider public was the special report Every death on every road. We provided a postcode search allowing users to find the location of all road fatalities in the UK in the past decade.
We visualized some of the main facts and figures emerging from the police data and, to give the project a more dynamic feel and a human face, we teamed up with the London Ambulance Association and BBC London radio and TV to track crashes across the capital as they happened. This was reported live online, as well as via Twitter using the hashtag #crash24 and the collisions were mapped as they were reported.
Simple Tools
As well as providing ways to explore large data sets, we have also had success creating simple tools for users that provide personally relevant snippets of information. These tools appeal to the time-poor who may not choose to explore lengthy analysis. The ability to easily share a ‘personal’ fact is something we have begun to incorporate as standard.
A light-hearted example of this approach is our feature The world at 7 billion: What’s your number? published to coincide with the official date at which the world’s population exceeded 7 billion. By entering their birth date the user could find out what ‘number’ they were, in terms of the global population, when they were born and then share that number via twitter or Facebook. The application used data provided by the UN population development fund. It was very popular, and became the most shared link on Facebook in the UK in 2011.
Another recent example is the BBC budget calculator which enabled users to find out how better or worse off they will be when the Chancellor’s budget takes effect — and then share that figure. We teamed up with the accountancy firm KPMG LLP, who provided us with calculations based on the annual budget, and then we worked hard to create an appealing interface that would encourage users to complete the task.
Mining The Data
But where is the journalism in all this? Finding stories in data is a more traditional definition of data journalism. Is there an exclusive buried in the database? Are the figures accurate? Do they prove or disprove a problem? These are all questions a data journalist or computer assisted reporter must ask themselves. But a great deal of time can be taken up sifting through a massive data set in the hope of finding something remarkable.
In this area we have found it most productive to partner with investigative teams or programs which have the expertise and time to investigate a story. The BBC current affairs program Panorama spent months working with the Centre for Investigative Journalism, gathering data on public sector pay. The result was a TV documentary and online the special report Public Sector pay: The numbers where all the data was published and visualized with sector by sector analysis.
As well as partnering with investigative journalists, having access to numerate journalists with specialist knowledge is essential. When a business colleague on the team analyzed the spending review cuts data put out by the government he came to the conclusion that it was making them sound bigger than they actually were. The result was an exclusive story, Making sense of the data, complemented by a clear visualization which won a Royal Statistical Society award.
Understanding An Issue
But data journalism doesn’t have to be an exclusive no-one else has spotted. The job of the data visualization team is to combine great design with a clear editorial narrative to provide a compelling experience for the user. Engaging visualizations of the right data can be used to give a better understanding of an issue or story and we frequently use this approach in our story-telling at the BBC. Heat-mapping data over time to give clear view of change is one technique used here in our UK claimant count tracker.
The data feature Eurozone debt web explores the tangled web of intra-country lending. It helps to explain a complicated issue in a visual way, using colour and proportional arrows combined with clear text. An important consideration is to encourage the user to explore the feature, or follow a narrative, and never feel overwhelmed by the numbers.
Team Overview
The team that produces data journalism for the BBC News website is comprised of about 20 journalists, designers and developers.
As well as data projects and visualizations the team produces all the infographics and interactive multimedia features on the news website. Together these form a collection of story-telling techniques we have come to call ‘visual journalism’. We don’t have people who are specifically identified as ‘data’ journalists, but all editorial staff on the team have to be proficient at using basic spreadsheet applications such as Excel and Google Docs to analyze data.
Central to any data projects are the technical skills and advice of our developers and the visualization skills of our designers. While we are all either a journalist, designer or developer ‘first’ we continue to work hard to increase our understanding and proficiency in each other’s areas of expertise.
The core products for interrogating data are Excel, Google Docs and Fusion Tables. The team has also, but to a lesser extent, used MySQL and Access databases and Solr for interrogating larger data sets and used RDF and SPARQL to begin looking at ways in which we can model events using Linked Data technologies. Developers will also use their programming language of choice, whether that’s ActionScript, Python or Perl, to match, parse or generally pick apart a dataset we might be working on. Perl is used for some of the publishing.
We use Google and Bing Maps and Google Earth along with Esri’s ArcMAP for exploring and visualizing geographical data.
For graphics we use the Adobe Suite including After Effects, Illustrator, Photoshop and Flash, although we would rarely publish Flash files on the site these days as JavaScript, particularly JQuery and other JavaScript libraries like Highcharts, Raphael and D3 increasingly meet our data visualization requirements.