Data on the crime beat

Conversations with Data: #27

Do you want to receive Conversations with Data? Subscribe

Conversations W Data header

If it bleeds, it leads…or so they say.

While shocking crimes may carry headlines and capture the public’s imagination, the crime beat encompasses so much than murder mysteries and whodunnit stories.

In this edition of Conversations with Data, we’ll be looking at other important angles for crime coverage and the ways that data can inform them in each phase of the reporting cycle.

What you said about story angles

As Albert Bowden told us: “Criminal Justice data is inherently limited, your stories don't have to be”. After speaking to crime reporters across the globe, we have to agree. Here’s a snapshot of four story angles, and how to approach them.

1. Hate crimes

In the United States, there is no reliable national data on hate crimes or other prejudice-driven incidents, despite anecdotal evidence of their prevalence. Step in: ProPublica’s Documenting Hate project. Bringing together a coalition of news organisations, the project aims to build a comprehensive database on hate incidents. Rachel Glickhouse, Partner Manager for the project, told us more about reporting on this angle:

“ many cases, it’s up to officers to check the correct box or pick the correct dropdown, and those internal numbers eventually make their way to state and federal agencies. That means there’s a lot of bad data -- a lot of undercounting, especially since more than half of US hate crime victims don’t report to police at all -- but also overcounting if they check the wrong boxes. Here’s an example of that problem. We also know that many police aren’t trained on how to investigate these crimes, and that some police aren’t necessarily invested in taking these crimes seriously.

So, when investigating hate crimes, ask questions like:

  • Does your department receive training on how to investigate and track hate crimes?
  • How do you track hate crimes internally? Is there a box to check on police reports?

She also suggests requesting data for the previous year: “Based on their response and how difficult it is for them to pull that information, you’ll get a sense of how they’re tracking these crimes”.

2. Cyber crime

In 2016, Norwegian outlet VG began investigating a sinister dark web website and its nordic users. By monitoring traffic and obtaining information from police, they eventually uncovered an Australian-led, international operation into the world’s largest online community of child sexual abusers. Håkon Høydal, one of the journalists behind the project, shared his learnings from Breaking the Dark Net:

  • “Closely collaborate with data scientists. While you as a journalist may know the target and the goal, they may know where to find and how to extract the data you need to get there.
  • Find 'live' data if possible. This allows you to follow trends and operations in real time. This was helpful in our contact with sources during the investigation.
  • Keep looking. The first couple of months of research led us to a case we couldn’t follow. But it gave us valuable knowledge that led us to the Australian operation.”

3. Witness intimidation

Sometimes murder isn’t the story. After noticing that cases were being dismissed because of witness no-shows, Ryan Martin, from the Indianapolis Star, started wondering: Could witness intimidation be to blame? To answer this question, Ryan and his reporting partner started collecting court and police records for every homicide investigation over a three-year period.

“We input the interesting and relevant data into two spreadsheets. Then we ran our own analysis. Our reporting led us to a four-part series called Code of Silence. After its publication, the city created a $300,000 witness protection program to finally address the problem of witness intimidation,” he explained.

His tip: “Finding your next great investigative story on the crime beat begins with collecting your own data.”

4. Treatment in prisons

In another angle from ProPublica, and their partner The Sacramento Bee, the Overcorrection project looks at resources, safety, and crowding in California’s jails. Although the project has only begun publishing this year, reporter Jason Pohl shared some insights from their progress so far:

“California collects death-in-custody data and details, and this information is maintained in a database online (a records request yielded a more complete and updated version). This was our starting point. Unfortunately, many states do not collect this information at all, so some reporters have had to create innovative ways to get records and build databases of their own. From here, we worked to clean, code, and analyse state-wide and county-by-county. Those findings will help inform our reporting going forward.”

What you said about crime data

No matter how good your angle is, your story’s success depends on obtaining, understanding, and accurately representing its underlying data.

Phase 1: Obtaining data

So, where’s the best place to start looking for data? Sylke Gruhnwald, from the Swiss magazine Republik, sent us an excellent checklist for following paper trail in white collar crime investigations:

  • “First things first: Check the international press archive for past coverage, and note the names of persons and organisations mentioned.
  • Check these names against local and national commercial registries.
  • If you have access to international business registries, check there. If not, you may get access through your university library. And don't forget to check ICIJ's databases, e.g. the Offshore Leaks Database here.
  • Gather all available public records on the persons and/or organisations that you are investigating.
  • Read and learn to understand financial statements, financial reports by auditors, and so on.
  • Build your own database. Helpful insights on how to build your own database are shared by the authors of Story-Based Inquiry here.”

More often that not, crime stories will rely on public records or access requests. If you come up against challenges, SpotCrime’s Brittany Suszan, reminded us that “journalists (and the public) have the right to access data in a timely manner and machine readable format -- especially if a government vendor already has access”.

“Don’t let a vendor control your public crime data, don’t let your police agency give preferential access to a private vendor (more here),” she continued.

Phase 2: Analysis

As a first piece of advice, Paul Bradshaw highlighted the importance of understanding “the difference between the two key pieces of crime data: 1) data on recorded crime, and 2) data on experiences of crime. The second type of data is important because many crimes don't get reported, and particular types of crime are more under-reported than others.”

In differentiating these data-types, Wouter van Loon, from Dutch outlet NRC, provided some examples of crimes that are, and aren’t, as likely to be reported. Take the drug trade: “none of the concerned people, sellers or buyers, have a reason to go to the police. So we only have information about the drug trade when the police find an armload of drugs.”

And then there are crimes that are easier to measure: “Murders, for example: there is always someone missing, and most of the time there is a body. However, even then there are pitfalls. There is a jungle of terminology around crime statistics. It is easy to confuse the number of suspicions, convictions, convicted perpetrators, victims, registered crimes, attempted crimes, and crimes which are actually committed.”

Alain Stephens, West Coast Correspondent for The Trace, agreed. To ensure you’re understanding data correctly, he recommends learning the lingo or getting an insider to translate for you.

“Without knowing know their vernacular, numbers can lie to you: Increased drug arrests can look like a spike of a dangerous new narcotic, when really you’re looking at a concentrated enforcement effort on the side of the police. High conviction rates may have little to do with a prosecutor’s skill, and more to do with them dropping cases they feel won’t tick up their win ratio”, he explained.

Another strategy, used by Clementine Jacoby of Recidiviz, is to physically map out how a particular flow through the justice system is modeled in the dataset that you’re working with. When they looked as technical revocations, which is where a person is sent back to prison for a technical rule violation, they “picked a single person in the dataset, looked at all of their historical records, and manually walked through their history. This simple exercise helped us get clear on how the revocation process worked in that state and how the data represented it".

Phase 3: Reporting

Being able to understand and explain your data is also important to defend your story’s findings -- as Lauryn Schroeder, the San Diego Union-Tribune’ Watchdog Reporter, knows all too well:

“Our Crime Counts project was based on records that came directly from local police agencies, but we analysed it in a much more granular way. This led to a first-of-its-kind series on violent crime at nearly the street level, but, as a result, we encountered a lot of push back from police and local officials who either couldn’t understand what we did or weren’t able to reproduce it -- or both. During our reporting phase, I spent most of my time defending our methodology and explaining how we got the results we did to convince sources to contribute or comment. There was a lot of ‘if your findings are right’ and ‘we can't comment on something we don't know is true’. Providing original files and the code isn’t always enough, and every data journalist should be prepared for that.”

And, finally, going back to the ‘bleeds/leads’ adage, journalists should be wary of sensationalising findings. Focus on “news not narrative”, said Andrew Guthrie Ferguson, author of The Rise of Big Data Policing.

“Scandal may sell but it does not tell the story readers deserve. Too many news stories about big data policing rely on fear-inducing tropes and not facts. By recycling the same metaphors (‘Minority Report’) and examples (the same few studies), the narrative of data-driven danger generates clicks and criticism, but not critical thinking. The real stories are more nuanced and harder to frame as good or bad, transformative or dystopian.”

For more tips, or to source the community’s help with an upcoming crime project, we’ll be continuing the discussion in our forums here.

Our next conversation

This week, more than 400 million voters across Europe will have the opportunity to select the next European Parliament. The election is the first since the rise of populism and, given the Parliament’s long term trends of voter disengagement, we thought we’d take a look at how data journalists are informing the public’s vote. Submit your favourite examples of data election reporting or visualisations, or tell us about a piece you’ve produced.

As always, don’t forget to let us know what you’d like us to feature in our future editions.

Until next time,

Madolyn from the EJC Data team

subscribe figure