Share your Twitter datasets!


Joseph Noonan


September 29, 2023

Twitter research as we knew it is dead. The academic API has been dismantled and researchers are left with a cost prohibitive API option aimed at for-profit companies. Researchers can no longer programmatically access Twitter at scale. This affects future research. The things that we could do before are no longer possible. This also affects the reproducibility of old research, the longevity of the data that this research is built upon, and and in turn, new questions that could be answered with old datasets. The Twitter terms of service of the API, limited researchers to only sharing tweet ids which were then used to redownload the tweets still available on the platform. This means, when you download a Twitter dataset used for research, all you get is a column of IDs ready to be redownloaded. This requires access to the Twitter API. We do not have reasonable access to the Twitter API. This means these datasets are stored in data repositories, gathering digital dust, and are useless. Like floppy discs decaying in the closet, Twitter datasets have become impossible to use.This is not an insignificant amount of data. This is the loss of billions of tweets collected during the largest pandemic in the last 100 years, millions of Tweets from MPs across Europe, violent rhetoric before January 6th, and the reactions by co-partisans after January 6th. This is just a smattering of the thousands of social science research conducted using Twitter data.

My suggestion is simple: researchers should share the full textual content of their Twitter data sets.

While this suggestion is simple, the implementation is not. This needs to be done inline with rigorous ethical guidelines while following the law. This varies from jurisdiction to jurisdiction, with the European legislation GDPR making this a difficult, if not impossible task. However, the old Twitter terms of service should not prevent researchers from sharing this extremely valuable data. I believe it is worth the effort to preserve this type of data. It is of historical, political, social, and scholarly relevance. We should not allow a private company’s terms of service to dictate how we preserve and share these data.