Making Data with Readers at La Nación
Written by Flor Coelho
Abstract
Using civic marathons and the open-source platform Vozdata to collaborate with readers, universities and NGOs around large data-driven investigations.
Keywords: civic marathons, crowdsourcing, investigative journalism, open source, researcher–journalist collaborations, open data
At La Nación we have produced large data-driven investigations by teaming up with our readers. This chapter takes a look behind the scenes at how we have organized reader participation around some of these projects, including through setting goals, supporting investigative communities, and nurturing long-term collaborations with our readers and other external organizations and partners.
In such projects often our goal is to tackle the “impossible” by using technology to facilitate large-scale collaborations, enabling users to engage with investigative journalism and the process of making official data public.
For example, we spent around five years transcribing 10,000 PDFs of Senate expenses, two years listening to 40,000 intercepted phone calls and a couple of months digitizing more than 20,000 hand-written electoral forms.1
For these kinds of crowdsourcing initiatives, we relied on the online collaborative platform Vozdata. The platform was inspired by The Guardian’s MPs’ expenses and ProPublica’s “Free the Files'' crowdsourcing campaigns and was developed with the support of Knight-Mozilla OpenNews and CIVICUS, a global alliance of civil society organizations and activists. The software behind Vozdata was open-sourced as Crowdata.2
Organizing Participation
For these projects our collaborators were mainly journalism students, civic volunteers, transparency NGOs and retired citizens. They have different motivations to participate depending on the project. These may include contributing to public interest projects, working with our data team and getting to know other people at our meetups.
Vozdata has teams and live ranking features. We have been exploring how these can enhance participation through “gamification.” We had excellent results in fostering civic participation in this way around Argentina’s national holidays. Participation in the construction of collaborative databases is mostly undertaken remotely (online).
But we have also encouraged users to participate in “offline” civic events held at La Nación or during hackathons at various events. Sometimes we have built open (i.e., freely reusable) databases with journalism students at partner universities.
While hackathons are events that usually take one or two days, our online marathons can continue for months. The progress bar shows how many documents have been completed and the percentage that remain to be completed.
Setting Big Goals
The main role of collaborators in the Senate Expenses and Electoral Telegrams projects was to gather specific structured data from the documents provided. This involved over a thousand unique users. As well as extracting these details, readers also had the opportunity to flag data as suspicious or unacceptable and leave a comment with additional information.
The latter feature was rarely used. When you have a deadline to finish a crowdsourcing project, you may not reach your target. That happened to us in the Electoral Telegrams project. The election day was approaching and we needed to publish some conclusions. While some provinces reached 100%, many had only completed 10% to 15% of the files, which we acknowledged when we published.
Supporting Investigative Communities
For Prosecutor Nisman’s 40,000 files investigation, we worked with a trusted network of a hundred collaborators. Many audio files related to private conversations (e.g., family dialogues) held by the Iranian agent whose phone was tapped by a federal court. A group of six volunteers got really deep into the investigation. We created a WhatsApp group where anyone could suggest leads and curiosities.
One of our volunteers resolved a mystery that kept us busy for a couple of months. We had flagged several conversations where two people talked in code using nicknames and numbers (e.g., “Mr. Dragon, 2000”). Many volunteers had heard and transcribed such recordings. We thought about making a separate database to analyze the code behind them. One day, a volunteer discovered that the conversations were about betting on horse races! A quick Google search confirmed many names as racing horses.
You always have power users. But, depending on the scale of the project, many volunteers collaborating with a few documents each usually exceed the “superuser contribution.”
Nurturing Collaborations
Our advice for journalists and organizations who want to involve their readers in data investigations is to appoint a dedicated community manager to organize and deliver communications through collaborative spreadsheets (e.g., Google Sheets), mailing lists and social media.
Large collections of documents can be a good place to start: The learning curve is fast and participants feel part of something bigger. It’s also valuable to support collaborators with video tutorials or contextual introductions in-house at your organization or at dedicated events.
When we won an award related to these collaborative projects, we hosted a breakfast to share the prize with the volunteers. These are long-term relationships with your readers, so we made sure to dedicate time and energy to meeting up at events, visiting universities, giving interviews for student projects and so on.
Regarding partnerships with universities, professors usually act as nodes. Every year they have a new class of students who are usually eager to team with us in collaborative projects (plan for this in advance!).
Transparency NGOs can also demonstrate the benefits of these projects. In our platform, every task can be registered, so they can easily showcase projects and media recognition for their donors.
When publishing outputs and stories, we recommend acknowledging the collaboration process and participant organizations in every platform (print, online and social media) and in mailouts. Emphasizing the collective character of such projects can send a stronger message to those who we want to hold accountable.
Conclusion
To make data with readers it is vital to allocate time and resources to engage with your community, deal with requests, analyze outputs, enjoy interactions and participate in events.
Volunteers classify documents because they think a project matters. For governments and those being reported on it is a sign that the project is not only a press concern, but also affects civil society. Through such projects, participants can become passionate advocates and online distributors of the content.
Footnotes
1. blogs.lanacion.com.ar/projects/data/argentina, blogs.lanacion.com.ar/projects/data/prosecutor, blogs.lanacion.com.ar/projects/data/vozdata
2. blogs.lanacion.com.ar/projects/data/vozdata--ii-civic, theguardian.com/news/datablog/2009/jun/18/mps-expenses-houseofcommons, propublica.org/series/free-the-files, github.com/crowdata/cr...