The End of the Rehydration Era - The Problem of Sharing Harmful Twitter Research Data

Dennis Assenmacher, Indira Sen, Leon Fröhling, Claudia Wagner

January, 2023

Abstract

Social media research is currently confronted with a data-sharing problem, as social media platforms prohibit full data distribution in their terms of service. Until recent changes to the platform, Twitter was an exception, allowing academics to legally share Tweet and user IDs with peers, which could then be re-collected using the Academic API endpoints. This work investigates how Twitter data is currently shared in two domains of harmful online communication — abusive language and social bot detection. We find that the currently frequently utilized intermediate strategy of sharing Twitter IDs suffers from substantial data loss, leading to the incomparability of computational results. Moreover, recent changes in the API result in additional expenses and an increased collection time that may have an impact on the feasibility of research projects. All of these aspects further fuel the reproducibility crisis that social media analytics currently faces. To improve the current situation, we propose several best practices for research projects utilizing ID-based datasets for their experiments and provide recommendations for researchers who want to share their Twitter data with peers.

Type

Conference paper

Publication

Proceedings of the 17th International Conference on Web and Social Media. NEATCLasS, Association for the Advancement of Artificial Intelligence (AAI)