Since the start of the COVID-19 pandemic in early 2020 there have been widespread concerns around what the World Health Organisation has described as an “infodemic” of misinformation and conspiracy claims.
Many journalists and media outlets have reported on how problematic claims have spread online, including on major social media platforms such as Facebook, Twitter, YouTube and Spotify, as well as emerging “alt tech” platforms.
We have recently been exploring how journalists, researchers and students can work together to use digital methods and data for investigating the infodemic.
Over the past year we’ve been working with institutions associated with the Public Data Lab – an interdisciplinary network exploring what difference the digital makes in attending to public problems – on projects and investigations into conspiracy content on Amazon. This resulted in stories such as:
- “Conspiracy theories run wild on Amazon” (POLITICO Europe),
- “Amazon is pushing readers down a 'rabbit hole' of conspiracy theories about the coronavirus” (BuzzFeed News)
- “How Amazon became an engine for anti-vaccine conspiracy theories” (Fast Company)
- “Amazon is helping fund conspiracy theories” (Media Matters)
These collaborations draw on approaches that are documented in the new edition of the Data Journalism Handbook (in a section on “investigating data, platforms and algorithms”) which two of us co-edited, as well as in the Public Data Lab’s Field Guide to “Fake News”.
In this piece, we’ll take a behind the scenes look at how digital methods and data can be used to investigate troubling content on Amazon. As many of the lines of inquiry below are based on data that has been obtained through manual analysis and scraping of content and interfaces rather than through Application Programming Interfaces (APIs), this may be taken as a contribution to “post-API” investigations, as well as how journalists, researchers and students can collaborate around digital investigations.
These projects were undertaken with a group of researchers and students at the Department of Digital Humanities, King’s College London; the Digital Methods Initiative, the Open Intelligence Lab and the Media Studies Department at the University of Amsterdam; DensityDesign Lab in Milan, and other institutions associated with the Public Data Lab, as part of an ongoing initiative on “engaged research-led teaching”.
The collaboration process involved researchers providing initial “project briefs” and “project pitches” in consultation with journalists and others interested in investigating infodemic, each proposing steps and approaches for following different lines of inquiry such as de-platforming, monetisation, conspiracies and spirituality, creative hashtagging and conspiracy aesthetics.
These briefs served as the basis for a first round of student projects at King’s College London in autumn 2020. The Amazon books project was further developed at the Digital Methods Winter School 2021 in Amsterdam.
We then provided journalists with packages of materials from these projects, including slide decks, visualisations and observations, along with a folder of documented datasets and research notes. Our journalistic collaborators then built on and incorporated these materials into their reporting, sometimes with further exchanges and follow-on research. We have indicated below how the different recipes surfaced in reporting with links and quotes.
What counts as a COVID-19 conspiracy book?
We began the project by looking for books with promoted COVID-19 conspiracy claims. But what counts as a COVID-19 conspiracy book?
The team was lucky enough to be working with conspiracy researcher Peter Knight from the University of Manchester, who suggested five key features of conspiracy based on research in this area:
Five key features of conspiracies from Peter Knight:
- Nothing happens by accident (deliberate secret planning)
- Nothing is as it seems (appearances are deceptive, official version is a lie)
- Everything is connected
- Tone/style of conspiracy theories (e.g. apocalyptic, manichean)
- Assumption of going against received wisdom
While conspiracy books are a thriving literary genre, how might one identify books which are mainly about promoting COVID-19 conspiracy claims? We looked for the presence of pandemic related keywords and themes, the presence of actors or entities that were related to COVID-19 conspiracy theories, and how books were being read as connected to the pandemic.
Based on this we considered three ways in which books can be considered to be about COVID-19 conspiracies:
1. According to writers – books which are explicitly written about COVID-19 conspiracies;
2. According to readers – books which are (sometimes retrospectively) read as being related to COVID-19 conspiracy claims;
3. According to algorithms – books that are algorithmically associated with COVID-19 conspiracies through an interplay between recommendation features and user practices.
As we wanted to focus on books that were explicitly conspiratorial, we took the decision to not list books that were sceptical of the pandemic, vaccines or lockdowns. This was because this scepticism did not qualify as conspiratorial as such, as per the features mentioned above. However, such books made plenty of appearances during the course of our investigation.
What kinds of COVID-19 conspiracy books can be found on Amazon?
Based on this narrower, writerly understanding of a COVID-19 conspiracy books, we queried for keywords, followed “related books” and made an initial list of 18 conspiracy books on Amazon.com.
Six of these have subsequently been taken down or are no longer available. Over the course of the project, we were subsequently able to identify others by following this initial set of books.
What are these books about? We experimented with several techniques for highlighting key themes based on the analysis of the full texts of the books, such as through word trees (which can be generated using this free tool):
While these graphics give an indication of prominent themes and concerns, there is no substitute for reading the books. So a group of researchers and students read the full texts of the books, created in-depth profiles of each of the books, as well as examining all of their comments and reviews on Amazon sites.
We also explored which kinds of themes were most prominent across the books, drawing on these readings and profiles, as well as thinking along with a list of COVID-19 conspiracy themes which had been identified in the course of the Infodemic research project. This was used to create a network showing which of the books shared which themes.
These lists and the approach of identifying and discovering COVID-19 conspiracy books through the following recommendations were taken up by POLITICO Europe:
To determine how widespread disinformation was on Amazon, POLITICO Europe worked with researchers from King's College London and the University of Amsterdam who started with 16 widely available QAnon and COVID-19 conspiracy books on the e-commerce giant. The academics then relied on the company's own recommendations — based on automated algorithms that serve up other titles that may be interesting to its customers — to compile a list of more than 100 books with ties to disinformation and conspiracy theories.
POLITICO Europe also conducted a separate review of Amazon's U.S., British, German and French sites by searching for books associated with QAnon and COVID-19, and similarly used the company's own recommendations to put together a list of 70 different books.
While the English-language online marketplaces had the most conspiracy theory content, Amazon's German and French versions also listed reams of such material, often associated with local groups like Germany's right-wing identitarian movement and Didier Raoult, the French doctor who promoted an antimalarial drug to treat COVID-19.
- Query for COVID related keywords and make an initial list of problematic/conspiracy books.
- Look at recommendations associated with these books on Amazon sites to find more books.
- Qualitative analysis to classify them.
- Identify key themes and visualise using Gephi, an open source network analysis tool.
How and where does COVID-19 conspiracy content appear on Amazon?
How do COVID-19 conspiracy books appear on Amazon websites? In order to explore this, we created multiple fresh browser profiles on multiple devices and used multiple web proxy servers so that we could also understand the extent to which results were being personalised based on location, IP, cookies and previous browsing activity. We queried for several topical keywords such as “covid”, “covid-19”, “coronavirus”, “vaccine” and “lockdown” and recorded what we found in the first page of results (normally returning a maximum of 16 items).
As well as identifying a number of conspiracy books listed in the first page of these search results (📕), we also found that the most common way to encounter these books was through Amazon’s recommendations from the other listings (👉🏼). There were also many conspiracy themed comments (💬), including some which indicated the book which was not about COVID-19 conspiracies according to the writer was nevertheless being read as a source book by conspiracy book readers.
Drawing on this work, BuzzFeed News reporters Craig Silverman and Jane Lytvynenko commented in their piece:
[...] COVID conspiracy books have appeared on the first page of search results for basic terms like “covid,” “covid-19,” and “vaccine.” Amazon also recommended conspiracy books when the researchers browsed non-conspiratorial books about the virus and related topics.
- Query for COVID-19 related terms.
- Obtain a list of books per website.
- Interface analysis to see whether there is conspiracy content, taking notes and screenshots on book content, recommendations, comments and other material.
Which platforms are mentioned in the book reviews?
We were curious whether the comments and reviews on COVID-19 conspiracy books might give a sense of where readers were coming from or going to. This might also suggest which kinds of platforms are hosts to more lively COVID-19 conspiracy communities.
We queried the comments of our list of COVID-19 conspiracy books for the names of various social media platforms, websites and alt-tech platforms, and found that YouTube was most frequently mentioned.
- Start with a seed list of conspiracy books.
- Obtain associated metadata.
- Explore and query for platform names and other online spaces / alternative platforms.
To what extent does the availability of COVID-19 conspiracy books differ among national Amazon marketplaces and subsidiaries?
We also looked into differences in availability amongst the list of books that we started with across Amazon marketplaces and subsidiary companies. We found that there were significant differences in availability across these websites and services.
While in some countries none of the books was available (e.g. UEA, China), in others more than two-thirds were still available (e.g. Canada, France, Germany, Italy, Japan, Spain, United States). All of the books were available on GoodReads, and none were available on Audible.
When books were taken down, we found that Amazon sites and services displayed different pages and messages. Sometimes this would include recommendations for other COVID-19 conspiracy books or query terms.
As commented in the BuzzFeed News piece:
[...] this feature is not consistent across Amazon’s international stores. Of its English-language stores, Amazon Canada and Singapore did not display government resources when searching for “covid” or “vaccine.” The company’s store in the United Arab Emirates showed them when searching for “covid” but not for “vaccine.”
Fast Company incorporated the graph into their piece together with the comment that “the result is a moderation process that often appears haphazard and unclear”.
- Start with a list of conspiracy books.
- Query for each of them on Amazon national sites.
- Analyse differences in moderation practices and what pages are displayed to visitors per book per site.
- Check its availability on Goodreads and Audible.
- If the books are unavailable, check the recommendations.
How do Amazon recommendations bring users to and from COVID-19 conspiracy books?
Given the role of recommendation algorithms in displaying COVID-19 conspiracy books in relation to innocuous query terms (e.g. “covid”, “coronavirus”), we decided to explore this in more depth.
Starting with the query "covid" on Amazon.com, we made a list of all of the books that appeared in the search results, and then expanded it to include all of the books that appeared in the recommendations associated with these books, and then again to include all of the recommended books associated with those books.
Based on these lists, we made a network showing how books were associated through their recommendations. Conspiracy books and other books sceptical of COVID-19 appeared at the centre of the resulting network of recommendations.
This further suggests that Amazon's recommendation features and algorithms play a significant role in facilitating access to conspiracy books, taking the first page of search results for “covid” on Amazon.com as a starting point.
Drawing on this work, POLITICO Europe commented:
The company's recommendation engine, an automated tool that offers up other titles people may be interested in, based on others' purchasing histories, similarly pushes people towards conspiracy theories and disinformation.
In their piece, BuzzFeed noted:
The problem highlights how Amazon’s search and book promotion mechanisms often direct customers to COVID-19 conspiracy titles.
- Take a list of books appearing in search for “covid” on Amazon.com.
- Obtain books suggested through “customers also viewed” recommendations.
- Obtain recommendations associated with a list of books from the last step.
- Combine and organise data for visual network exploration (Table2Net may be helpful here).
- Visualise with Gephi, an open source network analysis tool.
How do Amazon-owned websites facilitate engagement with COVID conspiracy content?
We also noticed that there were other features of Amazon websites that were involved in the promotion of conspiracy books.
One prominent feature was the “best-seller” ranking. We looked at which of the COVID-19 conspiracy books featured in best-seller rankings and discovered that quite a few were included in the top 50 rankings across a number of categories.
As the BuzzFeed News piece commented:
Despite being filled with misinformation about the pandemic, Icke’s book "The Answer" at one point ranked 30th on Amazon.com’s bestseller list for Communication & Media Studies. Its popularity is partly thanks to the e-commerce giant’s powerful recommendation algorithms that suggest "The Answer" and other COVID conspiracy theory books to people searching for basic information about the coronavirus, according to new research shared exclusively with BuzzFeed News.
One top-ranked book that promises “the other side of the story” of vaccine science is #1 on Amazon’s list for “Health Policy.” Next to it, smiling infants grace the cover of the top-selling book in “Teen Health,” co-authored by an Oregon paediatrician whose license was suspended last year over an approach to vaccinations that placed “many of his patients at serious risk of harm.” Another book, Anyone Who Tells You That Vaccines Are Safe and Effective Is Lying, by a prominent English conspiracy theorist, promises “the facts about vaccination — so that you can make up your own mind.” There are no warning notices or fact checks—studies have shown no link between vaccines and autism, for instance—but there are over 1,700 five-star ratings and a badge: the book is #1 on Amazon’s list for “Children’s Vaccination & Immunization.”
Tips for doing digital investigations together
Beyond the specific outcomes and outputs from these digital investigations, we are interested in how researchers, students and journalists can work together on digital investigations, which is also discussed more extensively in the Data Journalism Handbook. We have also have been exploring this as part of work on “engaged research-led teaching”.
We conclude with a few things that we learned through the process of organising collaborative digital investigations this year:
Following many years of organising and contributing to Digital Methods Winter and Summer Schools in Amsterdam, we often use “data sprints” or workshops in order to set aside time for focused group work, accompanied by moments for collective reflection and feedback (e.g. sharing slides with collaborators to see what would be of most interest to them).
Rather than assuming that questions and problems are fixed from the outset, it is highly likely that they will change over the course of your project.
There is much to be gained from taking a qualitative and interpretive approach towards your material rather than mainly focusing on statistical or computational techniques. This is especially when it comes to thinking about how things are sorted out on the web and social media (e.g. what counts as a conspiracy theory).
You may wish to consider keeping research diaries and documenting datasets so it is possible for others to retrace your steps later on.
In complement to API-based tools and applications, manual analysis of user interfaces and gathering your own data can provide many insights for digital investigations.
Having worksheets and "recipes" for doing digital investigations on various platforms can help to get things started and can be adapted as needed and shared with collaborators.
Visualisations can be part of the process of data exploration rather than just to summarise or present findings. Simple visualisations can be prototyped in spreadsheets (e.g. rankings, emoji graphs) or with free open source tools such as Raw Graphs, Datawrapper or Gephi.
This approach aims to support making space in universities and classrooms for experimental, creative and critically engaged digital investigations, without taking for granted the questions, data, methods, materials and means through which they are produced.
Further reading and resources:
This piece builds on “engaged research-led teaching” activities with the Public Data Lab network, in collaboration with the AHRC-funded Infodemic project. This includes projects with researchers, journalists and students at the Department of Digital Humanities, King’s College London as well as at the Digital Methods Winter School 2021 at the University of Amsterdam. Thanks to all of the students, researchers, journalists and workshop participants who contributed to the development of these investigations.