COVID-19 Vaccine Hesitancy on Social Media: Building a Public Twitter Data Set of Antivaccine Content, Vaccine Misinformation, and Conspiracies

University of Southern California
"The vaccine-related misinformation on social media may exacerbate the levels of vaccine hesitancy, hampering progress toward vaccine-induced herd immunity, and could potentially increase the number of infections related to new COVID-19 variants."
Although the inoculation of large populations is increasingly important during the COVID-19 pandemic, antivaccine narratives are spreading rapidly, leading some to reject the vaccine. Social media can amplify the effects of antivaccination misinformation; multiple studies have shown links between susceptibility to misinformation and both a reduced likelihood to comply with health guidance measures and vaccine hesitancy. This paper describes a data set of Twitter posts and Twitter accounts that publicly exhibit a strong antivaccine stance. The data set, which includes tweet IDs of publicly available posts, is available via an AvaxTweets data set GitHub repository. The researchers characterise the collected accounts in terms of prominent hashtags, shared news sources, and most likely political leaning.
The researchers started the ongoing data collection on October 18 2020, leveraging the Twitter streaming application programming interface (API) to follow a set of specific antivaccine-related keywords. Then, they collected the historical tweets of the set of accounts that engaged in spreading antivaccination narratives between October 2020 and December 2020, leveraging the Academic Track Twitter API. The political leaning of the accounts was estimated by measuring the political bias of the media outlets they shared: left, lean left, centre, lean right, right. ("Knowing users' position on a political spectrum can be useful in identifying their most likely moral values and possible stances toward specific societal issues. This knowledge can be used to design appropriate future messaging and campaigns.")
This process led to two curated, publicly available Twitter data collections, which may help researchers seeking to understand vaccine hesitancy through the lens of social media:
- A streaming keyword-centred data collection with more than 1.8 million tweets created by 719,000 unique accounts between October 18 2020, and April 21 2021. The number of relevant tweets in the streaming collection gradually increases from the start date. In addition to hashtags such as #vaccine and #covid19, there was a high proportion of hashtags that carry strong antivaccine sentiment, such as #novaccineforme, #vaxxed, and #vaccineinjury, as well as a large set of common hashtags related to some debunked conspiracy theories that claim there is a global plot to reduce the world population through vaccination. The majority of tweets originated from countries with predominantly English-speaking populations. Vaccine hesitancy is fueled by misinformation originating from websites with already questionable credibility.
- A historical account-level data collection with more than 135 million tweets published by over 78,000 unique accounts from March 3 2007 to February 8 2021. The accounts engaged in the antivaccination narratives lean to the right (conservative) direction of the political spectrum, and far-right news media sites appear frequently in the account collection.
In conclusion: "The data sets collected and provided here could be useful for researchers interested in tracking the longitudinal characteristics of accounts engaging with antivaccine narratives. It can help provide better insights into the socioeconomic, political, and cultural determinants of vaccine hesitancy."
JMIR Public Health Surveillance 2021 (Nov 17); 7(11):e30642.
- Log in to post comments











































