Name: DataJournalism.com
Price range: $

3a. Case study: Finding evidence of automated Twitter activity during the Hong Kong protests

Written by: Charlotte Godart , Johanna Wild

Charlotte Godart is an investigator and trainer for Bellingcat. Before Bellingcat, she was at the Human Rights Center at UC Berkeley, working within its Investigations Lab, teaching students to conduct open-source research on global conflicts for international humanitarian entities.

Johanna Wild is an open-source investigator at Bellingcat, where she also focuses on tech and tool development for digital investigations. She has an online journalism background and previously worked with journalists in (post-)conflict regions. One of her roles was to support journalists in Eastern Africa to produce broadcasts for the Voice of America.

In August 2019, Twitter announced the removal of thousands of Twitter accounts it said helped spread disinformation about the Hong Kong protests and were part of “a coordinated state-backed operation.” Soon, Facebook and YouTube released statements saying they also removed accounts engaging in coordinated inauthentic behavior about the protests.

Unlike Facebook and YouTube, Twitter released a list of the accounts it removed, offering an opportunity to further investigate the activity. With a participant of a Bellingcat workshop, our team decided to investigate the remaining Twitter content about the protests in Hong Kong to try to identify signs of coordinated inauthentic behavior.

Finding suspicious activity

We started by searching for relevant hashtags about the protests. A simple keyword search for “Hong Kong Riots” brought up many tweets, some containing multiple hashtags.

We wanted to focus on pro-China accounts and content, since these were the ones Twitter had already found engaging in inauthentic activity. We tried keyword formulations like:

“Shame on Hong Kong” -police -government

This search yields results that contain the phrase “Shame on Hong Kong” but not the words police or government. The goal was to filter out tweets such as “shame on hong kong police” and keep tweets such as “shame on hong kong protesters.” Other search terms were “Hong Kong roaches” and “Hong Kong mobs,” which were common descriptors of the protesters by pro-Chinese Twitter accounts.

After using those and other search terms, we examined recent tweets about Hong Kong that received many retweets and likes. You can filter by engagement simply by adding “min_retweets:500” or “min_faves:500” to your search query. This will get only tweets with at least 500 retweets or likes.

We then looked at the Twitter accounts that had interacted with those tweets. For example, there was this tweet from verified user Hu Xijin, editor-in-chief of the Chinese and English editions of the Global Times, a Chinese state-run media outlet:

We clicked on the “Retweets” and “Likes” hyperlinks next to each engagement number to display a list of accounts that performed the relevant action.

Our hypothesis was that inauthentic pro-China accounts would amplify tweets from prominent Chinese state media personnel. In this case, a lot of usernames stood out because they had an eight-digit number after the name, which indicated that the user had accepted the default username generated by Twitter when they signed up. That warranted further research into their behavior and characteristics.

As we examined these accounts, we saw they had a tiny number of followers, were following few accounts, offered no bio, were retweeting other people’s tweets and sending almost none of their own, and almost exclusively promoted content in opposition to the protests.

We also noticed that the creation dates for these accounts were very recent, around August 2019. Because Twitter released a list of the pro-China accounts it removed, we could check the creation dates on those accounts and see if they showed a similar trend.

With the help of Luigi Gubello, a coder who is engaged in the online open-source community, we used a simple Python script (you can find the code on his GitHub and more about him here) to identify patterns in the data. The below graph shows that the removed accounts were all created in recent months, which aligned with the characteristics of the set of active accounts we were investigating.

Automating the process

Now that we had identified a sample of tweets that exhibited suspicious characteristics and behavior, we wanted to conduct a much larger analysis. This required some automation. One Bellingcat workshop participant had a background in software development, so he wrote a small piece of JavaScript code — the regular expression (\w+\d{8}) — to perform two functions: extract the usernames of accounts that had retweeted or liked a specific tweet, and then quickly filter the username list so that it focused only on the usernames that matched a pattern. The pattern he filtered for was a name followed by an eight-digit number.

By loading this script in the Chrome developer tools console, which provides web developer tools directly in the browser, it would run in the background whenever he clicked on the “Retweets” or “Likes” hyperlink for a specific tweet. Then it would return results that highlighted usernames fitting the pattern. Go here to see what this looks like.

We could now use his script to examine the accounts interacting with other prominent pro-China tweets. In the midst of the Hong Kong protests, Chinese American actress Liu Yifei shared a Weibo post in support of the police, which led some people on social networks to call for a boycott of her new movie, “Mulan.” However, we also noticed that many Twitter accounts supported the actress and her movie using the hashtag #SupportMulan. (CNN also reported on this.) We decided to use the script to examine the users who retweeted or liked the pro-Mulan tweets.

We collected the account names that fit our pattern and then identified their creation dates. This revealed that most of the accounts were created on Aug. 16.

We gathered the exact creation date and time of the accounts by simply hovering over the profile’s “joined” information, as shown below:

With the set of accounts in front of us, we began the manual analysis of the content they were sharing. It quickly became clear that the accounts in our list had all tweeted in favor of Yifei and against the Hong Kong protesters.

Many of the accounts in our list became inactive after Aug. 17 or 18, which again showed an element of coordination. We don’t know exactly why they went dormant, but it’s possible that Twitter required additional verification steps for the creators to log in and they were unable to comply. Another option is that they simply stopped tweeting because the account creators did not want to raise further suspicion after Twitter started suspending pro-China accounts.

However, a few months later, we noticed that several of the accounts were active again. This time they spread positive messages about Yifei and her film, “Mulan.”

We also found pro-“Mulan” accounts with other username patterns or creation dates that were continuously spreading messages in favor of Yifei. We did this by searching for tweets that included hashtags like #SupportMulan or #liuyifei

It seems the accounts changed their strategy from criticizing the Hong Kong protesters to promoting the actress and her movie, perhaps to avoid being blocked from Twitter.

The case study shows how it’s possible to combine manual and automated techniques to quickly uncover a network of suspicious Twitter accounts. It also illustrates that it’s useful to look for additional accounts and activity even after a platform announces a takedown of accounts.

Here, we were able to use some simple search techniques and account details to identify a larger set of accounts that showed strong indicators of being engaged in coordinated inauthentic activity.

Longform reads

Verification Handbook

Data Journalism Handbook 2

New course

Quality journalism

Countering hate speech

New course

Video course

Fundamental search for journalists

Popular course

Coding

Python for journalists

3a. Case study: Finding evidence of automated Twitter activity during the Hong Kong protests

Written by: Charlotte Godart , Johanna Wild

Time to have your say

Longform reads

Verification Handbook

Data Journalism Handbook 2

New course

Quality journalism

Countering hate speech

New course

Video course

Fundamental search for journalists

Popular course

Coding

Python for journalists

3a. Case study: Finding evidence of automated Twitter activity during the Hong Kong protests

Written by: Charlotte Godart , Johanna Wild

Time to have your say

Sign up for our Conversations with Data newsletter

Review your cookie settings for the optimal site experience.