Write a response

The history of data journalism

A historical take on every critical breakthrough from the 1950s until today

It all started with trying to predict the outcome of a US presidential election.

Many practitioners date the beginning of computer-assisted reporting and data journalism to 1952 when the CBS network in the United States tried to use experts with a mainframe computer to predict the outcome of the presidential election.

That’s a bit of a stretch, or perhaps it was a false beginning because they never used the data for the story. It really wasn’t until 1967 that data analysis started to catch on.

In that year, Philip Meyer at The Detroit Free Press used a mainframe computer (known as big iron) to analyse a survey of Detroit residents for the purpose of understanding and explaining the serious riots that erupted in the city that summer. Decades later, The Guardian in the United Kingdom used some of the same approaches to look at racial riots there and cited Meyer’s work.

Meyer went on to work in the 1970s with Philadelphia Inquirer reporters Donald Barlett and James Steele to analyse sentencing patterns in the local court system, and with Rich Morin at The Miami Herald to analyse property assessment records.

Meyer also wrote a book called Precision Journalism that explained and advocated using database analysis and social research methods in reporting. Revisions of the book, now called New Precision Journalism, have been published since then.



Still, only a few journalists used these techniques until the mid-1980s, when Elliot Jaspin in the U.S. received recognition at The Providence Journal Bulletin for analysing databases for stories, including those on dangerous school bus drivers and a political scandal involving home loans.

Jaspin, who had won a Pulitzer Prize for traditional reporting on labour union corruption, also had taken a fellowship at Columbia University to learn how to use data. This was the same university where a journalist and professor Steve Ross had been teaching data analysis techniques for years. By the late 1980s, about 50 other journalists across the U.S., often consulting with Meyer, Jaspin, or Steve Doig of the Miami Herald, had begun using data analysis for their stories.

The use of data by journalists has vastly expanded since 2015.

Aiding the efforts of the data journalists of the 1980s were improved personal computers and a much-needed software—Nine Track Express—that Jaspin and journalist-programmer Daniel Woods wrote to make it easier to transfer computer tapes (that contained nine “tracks” of data) to personal computers using a portable tape drive.

This was a remarkable breakthrough because it allowed journalists to circumvent the internal bureaucracies and delays involved in using mainframes at newspapers and universities and instead do their work at their desks.

In 1989, U.S. journalism recognised the value of computer-assisted reporting when it gave a Pulitzer to The Atlanta Journal-Constitution for stories on racial disparities in home loans. The project was one of the first collaborations on data stories that involved an investigative reporter, a data reporter and college professors.

During the same year, Jaspin established at the Missouri School of Journalism what is now known as the National Institute for Computer-Assisted Reporting (NICAR). Then, in 1990, Indiana University professor James Brown held the first computer-assisted reporting conference in Indianapolis, Indiana and continued them for several years.

Car book


In the 1990s through early in the 21st Century, the use of computer-assisted reporting blossomed, primarily due to the seminars conducted at Missouri and worldwide by Investigative Reporters and Editors (IRE) and NICAR.

IRE held its first computer-assisted reporting conference in 1993 and after that, the conferences were a project of IRE and NICAR. The growth of computer-assisted reporting was aided by the publication of my book in 1996, the first on doing CAR, "Computer-Assisted Reporting: A Practical Guide,” now in its 5th edition.

I wrote the book so that it could be used as a textbook for university classes, but also for the lone and lonely practitioner in newsrooms that did realise the power of data and thought having a “nerd” in the corner of the newsroom sufficed for what was an ongoing revolution in journalism.



After NICAR was created in 1994, training director Jennifer LaFleur and I initiated an ambitious on-the-road programme that eventually included up to 50 seminars a year with the help of colleagues across the country who volunteered their expertise and their time.

The creation of the on-the-road training was bolstered by the advent of the World Wide Web, which helped journalists immensely in their understanding of, and comfort with, the digital world and data. By 1996 word of the U.S. successes had reached other countries, and foreign journalists began attending the “boot camps” (intense, week-long seminars) at NICAR.

In addition, IRE, with the support of the McCormick Foundation, had set up a programme in Mexico City that did data training in Latin America, which was led by the programme’s director Lise Olsen, who travelled and trained throughout the continent of South America.


Going global

While journalists outside the U.S. at first doubted they could obtain data in their own countries in the 1990s, the training showed them how international or U.S. data could be used initially for stories in their countries, how they could build their own datasets, and how they could find data collected and stored by their governments.

As a result of the extensive training efforts, journalists had produced stories by 1999 involving data analysis in an array of countries, including Finland, Sweden, New Zealand, Venezuela, Argentina, the Netherlands, Norway, Brazil, Mexico, Russia, Bosnia, and Canada.

Meanwhile, in London in 1997, journalism professor Milverton Wallace began holding an annual conference called NetMedia that offered sessions on the Internet and classes in computer-assisted reporting led by NICAR and Danish journalists.

The classes covered the basic uses of the Internet, spreadsheets, and database managers, and they were well-attended by journalists from the UK, other European countries, and Africa.



In Denmark, journalists Nils Mulvad and Flemming Svith, who had gone to a NICAR boot camp in Missouri in 1996, organised seminars with NICAR in 1997 and 1998 in Denmark.

They also wrote a Danish handbook on computer-assisted reporting, created the Danish International Center for Analytical Reporting (DICAR) in 1998 with Tommy Kaas as president. This led to them also co-organising the first Global Investigative Journalism Conference with IRE in 2001.

CAR also became a staple of conferences in Sweden, Norway, Finland, and the Netherlands, with Helena Bengtsson from Sweden and John Bones from Norway.

In Brazil, the investigative journalism association, Abraji formed in 2002 with training in data journalism as part of its core mission. Two key leaders in data journalism training by Abraji in Brazil were Jose Roberto de Toledo and Marcelo Soares.

Data journalism comes of age

The early years of the 21st century also saw the Global Investigative Journalism Network begin to play a crucial part in the movement, starting with its first conference in 2001 in Copenhagen that offered a strong computer-assisted reporting track and hands-on training in conjunction with sessions on traditional investigative reporting.

Through the global investigative conferences, the use of data quickly spread across Eastern Europe. In Eastern Europe, Drew Sullivan, one of the original NICAR trainers and data administrators, formed the Organized Crime and Corruption Reporting Project, which has become a leader in data journalism.

By 2009, the increasing number of computer programmers and coders in journalism resulted in creation of Hacks/Hackers.

He and Romanian journalist Paul Radu were strong proponents and organisers of data training sessions and projects. Seminars also were given initially in China through the University of Missouri and in India through the World Press Institute, led by John Ullmann, who had been IRE’s first full-time executive director.

Ullmann also oversaw training in Latin America, recruiting me and other NICAR trainers to assist him.

During the same period Doig, a pioneer in CAR and later the Knight Chair in Computer-Assisted Reporting at Arizona State University, travelled internationally to teach CAR, as did additional NICAR training directors — Sarah Cohen, Andy Lehren, Jo Craven McGinty, Tom McGinty, Ron Nixon, Neil Reisner, and Sarah Cohen -- all practising journalists who went onto work at The New York Times, The Wall Street Journal, The Washington Post, and the Miami Herald.


Visualisation of data increases

Visualisation of data in charts and maps had been on the rise for some time, inspired by a map by Doig in 1992 for the Miami Herald. Showing the deep value of data visualisation for analysis, Doig created a map of hurricane wind speeds and building damage in the Miami area after Hurricane Andrew.

The map revealed a pattern of severe property damage where wind speeds had been low. Following up on that revelation, reporters found that shoddy construction and sloppy building inspections had led to the damage.

In 2005, the visualisation of data for news stories got another boost when U.S. programmer Adrian Holovaty created a Google mash-up of Chicago crime data. The project spurred more interest in journalism among computer programmers and in mapping.

Holovaty and his team of coders then created the now-defunct Every Block in 2007, which used more local data for online maps in the U.S., but the project later ran into criticism for not checking the accuracy of government data more thoroughly.

In 2007, the open data movement in the U.S. began in earnest, spawning other such efforts worldwide. The movement increased accessibility to government data internationally, although the need remained to have freedom of information laws to get data not released by the governments.

The use of data by journalists has now become so prevalent it is easier to keep track of the progress.

By 2009, the increasing number of computer programmers and coders in journalism resulted in creation of Hacks/Hackers, which would encourage more sharing between journalists and coders and ease some of the culture clash between the two groups.

Aron Pilhofer, then of The New York Times and now at Temple University, and Rich Gordon from Northwestern University’s Medill School of Journalism, had pushed for creation of “a network of people interested in Web/digital application development and technology innovation supporting the mission and goals of journalism.”

At the same time in Silicon Valley, Burt Herman brought journalists and technologists together. The three then joined to create “Hacks/Hackers.” The result has been an increasing technological sophistication within newsrooms that has increased the ability to scrape data from Web sites and make it more manageable, visual, and interactive.

Another outcome of the journalist-programmer mashup was the new respect among coders for knowing how flawed databases are, and for ensuring the integrity of the data.

As was well-said by Marcos Vanetta, a Mozilla OpenNews fellow who worked at The Texas Tribune: “Bugs are not optional… In software, we are used to making mistakes and correcting them later. We can always fix that later and in the worst case, we have a backup. In news, you can’t make mistakes -- there is a reputation to take care of. The editorial team is not as used to failure as developers are.”



More breakthroughs

The years 2009, 2010, and 2011 also were breakthrough years for using data for journalism. In Canada in 2009, Fred Vallance-Jones and David McKie published “Computer-Assisted Reporting: A Comprehensive Primer” with a special emphasis on CAR in Canada.

This was also the year that journalist Simon Rogers launched The Guardian's data blog.

The European Journalism Centre began its data-driven journalism programme that has organised workshops throughout Europe. This led to the establishment of DataJournalism.com for online training courses and other resources.

Journalist Paul Bradshaw became recognised as a pioneer in data journalism in the United Kingdom. In 2009, Wikileaks released its "Afghan War Diaries", composed of secret documents and then the Iraq War Diaries, requiring journalists throughout the world to deal with enormous amounts of data in text.

Primercar book


This was followed in 2011 by The Guardian’s impressive series using data and social media to analyse city racial riots in the United Kingdom. Journalist and author Brigitte Alfter then founded the first Dataharvest conference, which is now led by the Arena for Journalism in Europe.

The same year work began in London on the first Data Journalism Handbook (now in a second edition and available in several languages) it was written by a consortium of contributors from around the world.

Also in the United Kingdom, the Centre for Investigative Reporting, led by Gavin MacFadyen, which teamed up in its early days with IRE to offer classes in data journalism during its summer school, ran a strong programme on its own with the assistance of CAR veteran trainer David Donald.


Data journalism in the global south

Meanwhile, at Wits University in South Africa, Anton Harber and Margaret Renn substantially increased the data sessions at the annual Power Reporting Conference, now the African Investigative Journalism Conference.

Code For Africa founder Justin Arenstein and his team also paved the way for data journalists on the continent. In 2012, he launched Africa's largest data journalism and civic tech lab covering stories involving, environment and climate change, women, gender and health/science.

In Asia, journalists in countries including India, Malaysia, the Philippines, and South Korea began using digital tools, especially those for visualisation, and data stories for high impact stories and exchanging techniques and story ideas at GIJN’s biannual Asia conferences.



Journalists also began incorporating social media into their investigations more often. One striking story, using social network analysis, was done by journalists in South Korea, who uncovered an attempt by intelligence officers to undermine elections through social media propaganda. Inspiring data-led interactive pieces also came out of publications like South China Morning Post.

In the Middle East, Egyptian data journalist Amr Eleraqi set up the Infotimes in 2012 followed by the Arab Data Journalists' Network five years later. He began teaching the first Arabic-led data journalism training programme of its kind in the region. Meanwhile, IJNET and Arab Reporters for Investigative Journalism (ARIJ) have continued to engage in offering training opportunities to Arab journalists in recent years.

In Latin America, Giannina Segnini, now at Columbia University, led a team of journalists and computer engineers at La Nación in Costa Rica to produce stories by gathering, analysing, and visualising public databases.

Meanwhile in Brazil, Natalia Mazotte from Open Knowledge Brasil launched Escola de Dados (School of Data Brazil chapter), in 2012 to train journalists with their data literacy programme. By 2015, Abraji had created online courses in data journalism. A year later, Brazil's Coda Festival (Coda.Br) launched and grew to become the largest data journalism conference in Latin America.

Across the global south, data journalist Eva Constantaras began to develop training curricula for investigative and data journalism in high-risk environments with limited data access on behalf of Internews. Journalists have benefited from this data journalism training in a range of countries, including Afghanistan, India, Kenya, Kyrgyzstan and Myanmar.

By 2020, the COVID-19 pandemic revealed the breadth and depth of the skills journalists had accumulated.


A revolution in journalism

The use of data by journalists and in the digital tools journalists use has vastly expanded since 2015. Journalists have probed deeper into the analysis of unstructured data -- text, video, and sound -- and woven those media into compelling investigative stories.

They have more routinely managed gigabytes of data for stories and organised massive data leaks with agility and become more sophisticated in visualising data through maps, social network analysis or change over time in both newsgathering and presentation.

They have conducted both traditional and innovative surveys to collect data to uncover social injustice. And the education and training -- and the syllabi and curricula at universities -- have become more focused and rigorous, thus producing new generations of data-savvy journalists.

The result has been an ever-growing stream of data-driven stories by small and large newsrooms -- often in collaborations -- that provide not only context and depth to stories, but also real facts, tips, surprises and epiphanies for journalists and their audiences.

By 2020, the COVID-19 pandemic revealed the breadth and depth of the skills journalists had accumulated and throughout the world journalists collected, analysed, and visualised pandemic data on a daily basis, often far exceeding what public health officials offered and, in fact, exposing the shortcomings of the data on which policy and practice were being decided.

The use of data by journalists has now become so prevalent it is easier to keep track of the progress and new directions by major categories. Among them:

  • The collaborations of journalists, sometimes with universities, who use huge datasets, including leaks of data. The collaboration has become nearly a standard practice and the high-profile International Consortium of Investigative Journalists (ICIJ); the Big Local News Project in the U.S. or Connectas in Latin America are just a few examples of ongoing collaborations.

  • The achievements allowed by free software that breaks the income barriers to entry into the field. The software includes a variety of tools to scrape, analyse and/or visualise data and include Google tools, Tableau, Datawrapper and many others.

  • The exploration of using Artificial Intelligence or Machine Learning to discern patterns and outliers for further reporting such as Reuters’ project called Lynx Insights, which uses an automation tool designed to help reporters accelerate the production of their existing stories or spot new ones.

  • The melding and analysis of satellite imagery, open-source video and photographs, social media, crowdsourcing and data for what are sometimes called visual or forensic investigations that are done by groups like Bellingcat.

  • Surveys through emails or mobile phones, such as the "Forced Out” project that a mobile phone survey to create a database from interviewing thousands of displaced people across South Sudan.

These advances, however, are not replacing but augmenting the original uses of computers and data for journalism that began by applying social science methods and statistical and data analysis to government and business corruption, health and environmental stories, and societal issues.

The use of data has broadened over the years from counting instances of incidents and accidents in spreadsheets to using database managers to match apparently unrelated datasets to mapping data geographically and in social networks, to web scraping, to more efficient data cleaning, to better surveys, crowdsourcing and audience interaction, and to text mining with algorithms.

But all of the work is still in the service of finding patterns, trends and outliers that lead to new knowledge and better news stories in the public interest. Over the decades, there also has been much discussion on what to call the use of data for high-quality journalism and various branding efforts to label it.

But whether it is called “precision journalism,” “computer-assisted reporting,” “data journalism,” ‘data-driven journalism,” or “computational journalism,” the good news is that it is not only here to stay, but will continue to become more critical to revealing truths, holding the powerful accountable, and protecting those who otherwise would be exploited.

subscribe figure