11a. Case study: Attributing Endless Mayfly

Written by: Gabrielle Lim

Gabrielle Lim is a researcher at the Technology and Social Change Research Project at Harvard Kennedy School’s Shorenstein Center and a fellow with Citizen Lab. She studies the implications of censorship and media manipulation on security and human rights.

In April 2017, an inauthentic article spoofing British news outlet The Independent was posted to Reddit. This article falsely quoted former U.K. Deputy Prime Minister Nick Clegg as saying that then-Prime Minister Theresa May was “kissing up to Arab regimes.” Savvy Redditors were quick to call out the post as dubious and false. Not only was it hosted on indepnedent.co as opposed to www.independent.co.uk, but the original poster was a shallow persona who had also posted several other inauthentic articles on Reddit.

From that initial inauthentic article, domain and persona, researchers at Citizen Lab spent the next 22 months tracking and investigating the network behind this multifaceted online information operation. Called Endless Mayfly, the goal of the operation was to target journalists and activists with inauthentic websites by spoofing established outlets's websites, and disseminating false and divisive information.

Broadly speaking, the network would spoof a reputable news outlet with an inauthentic article, amplify it through a network of websites and fake Twitter personas, and then either delete or redirect the inauthentic article once some online buzz was created. Below is an example of a spoofed article that masqueraded as Bloomberg.com by typosquatting on bloomberq.com:

This image shows two fake online personas affiliated with Endless Mayfly tweeting a link to a copycat version of the Daily Sabah, a Turkish news outlet. Note that the persona on the right, “jolie prevoit,” is using a photo of actor Elisha Cuthbert as its profile photo.

By the time we published our report in May 2019, our dataset included 135 inauthentic articles, 72 domains, 11 personas, one fake organization and a pro-Iran publishing network that amplified the falsehoods found in the inauthentic articles. In the end, we concluded with moderate confidence that Endless Mayfly was an Iran-aligned information operation.

Endless Mayfly illustrates how you can combine network and narrative analysis with external reporting to arrive at attribution. It also highlights the difficulty involved in attributing information operations to a specific actor, why multiple indicators are required, and how to use a confidence level to indicate your level of certitude for the attribution.

Ultimately, attribution is a difficult task often constrained by imperfect information, unless you’re able to elicit a confession or secure definitive proof. This is why attribution is often expressed as a probabilistic estimate in many media manipulation cases.

Triangulating multiple data points and analyses

Due to the clandestine nature of information operations, the ability for actors to engage in “false flag” campaigns, and the ephemeral nature of evidence, attribution should be the result of a combination of analysis and evidence. With Endless Mayfly, we concluded with moderate confidence that it was an Iran-aligned operation because of indicators derived from three types of analysis:

  1. Narrative analysis
  2. Network analysis
  3. External reporting and analysis

1. Narrative analysis

Using content and discourse analysis on the 135 inauthentic articles collected in our investigation, we determined that the narratives being propagated were aligned with Iran’s interests. Each article was coded into categories that were determined after an initial reading of all the articles. Two rounds of coding were conducted: The first round was executed independently by two researchers, and a second round was conducted together by the same researchers to resolve any discrepancies. This table represents the results of our coding process.

After all the articles were coded, we were able to determine the most common narratives propagated by Endless Mayfly. We compared these with our preliminary research on the region. This involved extensive research to understand the region’s rivalries and alliances, geopolitical interests and threats, and history of information controls. This was necessary for us to contextualize the evidence and situate the narratives in the broader political context. With the results of the coding in hand, we determined that these narratives were most likely serving the interests of Iran.

2. Network analysis

Network analysis was carried out to determine which domains or platforms were responsible for amplifying the content. For Endless Mayfly, two networks were involved in disseminating the inauthentic articles and their falsehoods: a network of pro-Iran websites, and a cluster of pro-Iran personas on Twitter. Both factored into Endless Mayfly’s attribution because they consistently pushed stories that were in line with official Iranian policies, public statements and positions with regards to Saudi Arabia, Israel and the United States.

The publishing network — The publishing network consisted of a number of seemingly pro-Iran websites portraying themselves as independent news outlets. In total, we found 353 webpages across 132 domains that referenced or linked back to Endless Mayfly’s inauthentic articles. This process involved a Google search of all the inauthentic articles’ URLs and their headlines. In addition, we scanned the links tweeted by the personas in our network, identifying webpages that contained references or links to the articles.

Following this process, we identified the top 10 domains that most frequently referenced the inauthentic articles. Of these 10 domains, eight shared the same IP address or registration details, indicating they may be controlled by the same actor. The content of these sites was also skewed toward promoting Iranian interests. For example, IUVM Press, which linked to or referenced Endless Mayfly’s inauthentic articles 57 times, hosted a PDF document titled “Statute” that explicitly stated they are against “the activities and projects of global arrogance states, the imperialism and Zionism,” and that “The headquarters of the Union is located in the Tehran — capital of Islamic Republic of Iran.”

The persona network — Similar to the inauthentic articles and the publishing network, the personas affiliated with Endless Mayfly on Twitter were decidedly critical of Saudi Arabia, Israel and Western nations in general. An analysis of their Twitter activity found these accounts pushed a combination of credible and inauthentic articles that were highly critical of Iran’s political rivals. Take, for example, the Twitter account for the “Peace, Security, Justice Community,” a fake organization identified by our investigation, shown below. Not only did it propagate content that was against Saudi Arabia, Israel and the U.S., the profile photo and header image also targeted Saudi Arabia. Note the cross hairs over Saudi Arabia in the profile photo, and the map used in the header image. The account’s bio also explicitly calls out Saudi Arabia and Wahhabi ideology as the cause of extremism.

Similarly, this tweet from another Endless Mayfly persona, “Mona A. Rahman,” mentions journalist and Saudi critic Ali al-Ahmed while criticizing Saudi Arabia’s crown prince, Mohammad bin Salman.

3. External reporting and analysis

We also compared our findings and data with external reporting. Following a tip from FireEye in August 2018, for example, Facebook deactivated some accounts and pages linked to the publishing network used by Endless Mayfly. In its analysis, FireEye identified several domains that were part of the publishing network we had identified, like institutomanquehue.org and RPFront.com. Like us, they also concluded with moderate confidence that the “suspected influence operation” appears to originate from Iran. Facebook, in its announcement, similarly noted the operations most likely originated from Iran.

In addition, Twitter released a dataset of Iran-linked accounts that had been suspended for “coordinated manipulation.” Although accounts with fewer than 5,000 followers at the time of suspension were anonymized, we were able to identify one Endless Mayfly persona (@Shammari_Tariq) in Twitter’s dataset.

The assessments by Twitter, Facebook and FireEye were useful in corroborating our hypothesis because they surfaced evidence that was not part of our data collection efforts, and overlapped with Endless Mayfly assets we identified. For example, FireEye’s analysis identified phone numbers and registration information connected to Twitter accounts and domains associated with Endless Mayfly — evidence that was not part of our dataset. Likewise, Facebook and Twitter presumably had account registration information, such as IP addresses, that we don’t have access to. The additional data points identified by these external reports therefore helped expand the body of evidence.

Arriving at moderate confidence

In Endless Mayfly’s case, the evidence we collected — the pro-Iran narratives, personas and publishing network — pointed to Iran as a plausible source of the information operation. This body of evidence was then compared to credible external reporting and research from FireEye, Facebook and Twitter, which corroborated our findings. Each individual piece of evidence, while insufficient on its own for attribution, helped confirm and strengthen our hypothesis when assessed holistically, and when compared to the totality of the evidence our investigation surfaced.

Despite the multiple indicators pointing to Iran, we still did not have definitive evidence. As such, we used a framework of cyber-attribution that’s common within the intelligence community. It makes use of multiple indicators and probabilistic confidence (low, moderate, high), allowing researchers to convey their findings while qualifying their level of uncertainty.

Ultimately, we concluded that Endless Mayfly is an Iran-aligned operation with moderate confidence, which the U.S. Office of the Director of National Intelligence defines as meaning “the information is credibly sourced and plausible but not of sufficient quality or corroborated sufficiently to warrant a higher level of confidence.” We did not opt for a higher level of confidence because we felt that there was insufficient evidence to completely rule out a false flag operation — meaning someone trying to make it look like Iran was behind this operation — or a third party sympathetic with Iranian interests.

Attributing information operations like Endless Mayfly will almost always rely on incomplete and imperfect information. Attaching confidence levels to findings is therefore an important component of attribution — as it operates with an abundance of caution. Incorrect attribution or an inflated confidence level can have dire consequences, especially if government policies and retaliatory measures result from the faulty assessment. To avoid hasty and poor attribution practices, it’s important to consider multiple indicators, types of evidence and analyses, and to make use of a confidence level that considers alternative hypotheses and missing data.

subscribe figure