Name: DataJournalism.com
Price range: $

3. Spotting bots, cyborgs and inauthentic activity

Written by: Johanna Wild , Charlotte Godart

Charlotte Godart is an investigator and trainer for Bellingcat. Before Bellingcat, she was at the Human Rights Center at UC Berkeley, working within its Investigations Lab, teaching students to conduct open-source research on global conflicts for international humanitarian entities.

Johanna Wild is an open-source investigator at Bellingcat, where she also focuses on tech and tool development for digital investigations. She has an online journalism background and previously worked with journalists in (post-)conflict regions. One of her roles was to support journalists in Eastern Africa to produce broadcasts for the Voice of America.

In late August 2019, Benjamin Strick, a Bellingcat contributor and BBC Africa EYE investigator, was analyzing tweets spreading the hashtags #WestPapua and #FreeWestPapua when he noticed accounts exhibiting abnormal behavior. These accounts were all spreading Indonesian pro-government messages at a moment when the conflict in West Papua was gaining international visibility: A local independence movement had taken to the streets to fight for freedom from Indonesian control, leading to violence between the Indonesian police and protesters.

The accounts Strick saw exhibited multiple odd similarities. Soon, he would realize that these were the early indicators of coordinated inauthentic behavior. But at first, he started by noticing the small stuff.

For one, many of the accounts had stolen profile pictures. Take this account for instance, which claimed to be of someone named Marco:

Using Yandex’s reverse image search tool, Strick found that the account’s profile picture had been previously used on other websites under different names. None of the accounts using the photo were for a real person named “Marco.” This proved that the accounts were, at the very least, misleading about their true identities.

Beyond faking their identities, Strick also found the accounts published similar or even identical content while often retweeting one another. Even more striking was that some of them showed precise synchronization in the timecode patterns of their tweets. For example, @bellanow1 and @kevinma40204275 mostly published their tweets at minute 7 or minute 32 of any particular hour.

It’s unlikely that a human would adopt this kind of tweet rhythm. This synchronization across multiple accounts, combined with their misleading photos, suggested the accounts were not linked to real identities, and could be automated. By analyzing suspicious account patterns such as these, Strick eventually concluded that the accounts were part of a pro-Indonesian Twitter bot network that was spreading one-sided and misleading information about the conflict in West Papua. (You can read more about the larger network these accounts were part of in the chapter 11b case study, “Investigating an Information Operation In West Papua.”)

What’s a bot? The answer is more complicated than you might think

The West Papua case is far from being the only information operation to use social bots. Other operations have been much more widely publicized and criticized, although at their core they contain similarities in how they operate.

A bot is a software application that can automatically perform tasks assigned to it by humans. Whether a bot does good or bad completely depends on the intentions of its “owner.”

The bots most often referred to in public debates are social bots, active on social networks including Facebook, Twitter and LinkedIn. On these platforms, they can be used to spread specific ideological messages, often with the aim to make it look as if there is a groundswell of support for a particular topic, person, piece of content or hashtag.

Social media bots tend to fall into three main categories: the scheduled bot, the watcher bot and the amplifier bot. It’s important to know which kind of bot you’re interested in because each type has a specific purpose. With each purpose comes a different language and communication pattern. In the context of disinformation, we’re most interested in looking into the amplifier bot.

The amplifier bot exists to do exactly what it sounds like: amplify and spread content, with the goal of shaping online public opinion. It can also be used to make individuals and organizations appear to have a larger following than they really do. Its power comes in numbers. A network of amplifier bots can attempt to influence hashtags, spread links or visual content, or gang up to mass spam or harass an individual online in an attempt to discredit them or to make them seem controversial or under siege.

By working together in large numbers, amplifier bots seem more legitimate and therefore help shape the online public opinion landscape. Amplifier bots that spread disinformation do it mainly through hashtag campaigns or by sharing news in the form of links, videos, memes, photos or other content types. Hashtag campaigns involve bots constantly tweeting the same hashtag, or set of hashtags, in coordination. The goal is often to trick Twitter’s trending algorithm into adding a specific hashtag to the trending topics list. An example is “#Hillarysick,” which was propagated widely by bots after Hillary Clinton stumbled in September 2016, shortly before the presidential election. (It’s also important to note that hashtag campaigns don’t require bots, and can be more effective without them. See this investigation of human “hashtag mills” in Pakistan from Dawn.)

Purchasing and creating bots is relatively easy. Countless sites will sell you your own bot army for just a couple of hundred dollars or even less. But a sophisticated, humanlike botnet is much harder to create and maintain.

How to recognize bots

Developers and researchers have created many tools to help assess whether an account might be automated. These tools can be useful in gathering information, but a score from one tool is by no means definitive and should never form the sole basis of any reporting or conclusion.

One of the most well-known tools is Botometer, created by researchers at Indiana University. Based on various criteria, it calculates a score for how likely it is that a Twitter account and its followers are bots.

For Reddit, Jason Skowronski has created a real-time dashboard. After you set it up for a chosen subreddit, it tries to assess whether the comments were made by bots, trolls or humans.

While there are exceptions, most publicly available bot detection tools have been created for Twitter. The reason is that many social networks — including Facebook — restrict their APIs (application programming interfaces) in a way that prevents the public from analyzing and using their data to create such public tools.

As noted earlier, bot detection tools are a great starting point but they should not be your sole evidence. One reason for their varying degree of accuracy is there is simply no universal list of criteria for recognizing bots with 100% certainty. There’s also little agreement about how to classify something as a bot. Researchers at the Oxford Internet Institute’s Computational Propaganda Project classify accounts that post more than 50 times a day as having “heavy automation.” The Atlantic Council’s Digital Forensics Research Lab considers “72 tweets per day (one every ten minutes for twelve hours at a stretch) as suspicious, and over 144 tweets per day as highly suspicious.”

It can often be challenging to determine whether a disinformation campaign is conducted by social bots or by humans who are motivated or paid to post large amounts of content about a specific topic. The BBC, for instance, found that accounts who posted similar Facebook messages amplifying favorable content about Boris Johnson in November 2019 were managed by people who pretended to be social bots.

You might also encounter cyborgs, social media accounts that are partly automated and partly managed by humans, which display a combination of natural and inauthentic behavior. Journalists must avoid falsely labeling suspicious accounts as bots without proper evidence and analysis, as a mistaken accusation can undermine your credibility.

One way to deal with these different types of bots, cyborgs and hyperactive human accounts is to focus your investigation on monitoring all inauthentic or bot-like behavior, instead of trying to identify only one type of suspicious account.

For example, Bot Sentinel provides a publicly available database containing (U.S.) Twitter accounts that exhibit suspicious behavior. Their creators decided to collect “accounts that were repeatedly violating Twitter rules” instead of specifically searching for social bots.

Steps to investigate inauthentic behavior

In general, we suggest the following approach for identifying inauthentic and potentially automated behavior on social networks:

1. Manually check the accounts for suspicious behavior.

2. Combine this with the use of tools or more technical network analyses.

3. Investigate their activity, content and network of other accounts they interact with. Combine this with traditional investigation techniques, such as trying to contact them or people they claim to know.

4. Consult with outside experts who specialize in bots and inauthentic activity.

To learn how to manually assess suspicious accounts, it’s important to understand the typical warning signs of automated accounts on Twitter, or other social networks.

Every social media bot needs an identity. Bot creators want to make their accounts appear as convincing as possible, but it takes time to set up and maintain credible-looking profiles, in particular if the goal is to run a large bot network. The more accounts someone has, the more time-consuming it is to create and manage them in a way that makes them seem authentic. This is where these accounts slip up. In many cases, their creators do the bare minimum to establish a profile, and a good investigator can detect this.

Here are a few things to look for:

No real profile picture

A stolen profile picture (as seen in Benjamin Strick’s West Papua investigation) or no profile picture at all can be an indicator of inauthenticity. Since bot creators want to create many accounts at once, they have to obtain a collection of photos and often copy them from other websites. However, doing so creates inconsistencies. For instance, an account with the profile photo of a male but a username implying that a female is the owner of the account could be a signal that something isn’t right. To get around this issue, many bot creators choose cartoons or animals as profile pictures, but again this tactic becomes another pattern to use to detect inauthentic or bot accounts.

Automatically created usernames

Next, look out for names and usernames. Every Twitter handle is unique, which means the username you want is often already taken. This is an inconvenience to the average person, but becomes a real challenge when you’re trying to create 50, 500 or 5,000 accounts in a short period of time.

Bot creators often deploy a strategy to help them easily find unused usernames. Scripts with criteria like the following are used to automatically create usernames:

When you notice several Twitter accounts with handles consisting of the same number of characters and digits, you can manually search for more accounts with that pattern in each of the accounts’ followers list to potentially identify a network.

In this example, the accounts have something else in common: They all were created in September 2019. When combined with other signals this can be an indicator that the accounts were all done at the same time by the same person.

Account activity does not fit age

You should become even more suspicious if a new account already has a relatively large number of followers or if it has published a large number of tweets within a short time. The same applies if an older account has very few followers despite being very active.

If you come across such an account, analyze the account’s tweet activity more deeply. Take the number of tweets located at the top of the page, and divide this by the number of days the account has been active. For example, take an account that has 3,489 tweets as of Nov. 11, 2019, and was created on Aug. 15, 2019. Divide 3,489 by 89 (the days it’s been active), and you get 39.2 tweets per day.

Looking at the tweets made over the lifetime of the account, does the number seem too high, unrealistic or not maintainable?

Suspicious tweet patterns

Another element to examine is tweet rhythm. Humans might show slight preferences for the days and times they usually tweet, but it is unlikely that a person posts consistently only on Monday, Tuesday and Wednesday and is completely silent on all other days of the week over a long period of time.

If you want to see these patterns visualized for one specific account, check out the account analysis tool built by Luca Hammer:

Visualization as part of your investigation

To get a better understanding of the activity of a whole bot network, you can use a visualization platform like Gephi. Bellingcat contributor Benjamin Strick used this tool to analyze the connections between Twitter accounts belonging to a pro-Indonesian bot network.

By looking at the visual representation of the connections between a large number of Twitter accounts, he noticed that the structure on the left side of the picture (in red) stood out.

By zooming in on this area, he could see which Twitter accounts were part of this specific structure.

Each red circle represents a Twitter account and the lines are the relationships between them. Usually, smaller accounts are arranged around a bigger circle in the middle, which means that they all interact with the influential account. The accounts in the structure above, however, did not interact in that way with one another. This led Strick to analyze those abnormal account’s behavior.

The future of social bots: Can we out-trick them?

The technology behind social bots has become much more advanced in the last few years, allowing these small software applications to become more adept at simulating human behavior. We are getting to the point where people are predicting that artificial users could engage in sophisticated online communications without their human counterparts realizing that they’re actually having a long conversation with a bot.

However, as of now there is no proof that high-level, machine-learning-empowered social bots exist or are being deployed. For now, it seems that many disinformation campaigns are currently still receiving support from less-complex amplifier bots.

“I don’t think that there are many sophisticated social bots out there that are able to have real conversations with people and to convince them of certain political positions,” said Dr. Ole Pütz, a researcher for the project “Unbiased Bots that Build Bridges” at Bielefeld University in Germany.

According to him, the best way to help the public recognize inauthentic behavior on social networks is to use a detection method that catalogs and weighs all the factors that make an account suspicious. As an example, he says, “This account uses a script to retweet news, it automatically follows others, and that one never uses speech patterns that humans would normally use.”

For now, a methodical analysis of account behavior, content, interactions and patterns remains the best approach for identifying inauthentic behavior.

In our case study chapter, we provide a more in-depth and technical explanation of how we analyzed the different factors in a suspicious Twitter network related to the Hong Kong protests.

Longform reads

Verification Handbook

Data Journalism Handbook 2

New course

Quality journalism

Countering hate speech

New course

Video course

Fundamental search for journalists

Popular course

Coding

Python for journalists

3. Spotting bots, cyborgs and inauthentic activity

Written by: Johanna Wild , Charlotte Godart

Time to have your say

Longform reads

Verification Handbook

Data Journalism Handbook 2

New course

Quality journalism

Countering hate speech

New course

Video course

Fundamental search for journalists

Popular course

Coding

Python for journalists

3. Spotting bots, cyborgs and inauthentic activity

Written by: Johanna Wild , Charlotte Godart

Time to have your say

Sign up for our Conversations with Data newsletter

Review your cookie settings for the optimal site experience.