Social media platforms are essential for information sharing and, thus, prone to coordinated dis- and misinformation campaigns. Nevertheless, research in this area is hampered by strict data sharing regulations imposed by the platforms, resulting in a lack of benchmark data. Previous work focused on circumventing these rules by either pseudonymizing the data or sharing fragments. In this work, we will address the benchmarking crisis by presenting a methodology that can be used to create artificial campaigns out of original campaign building blocks. We conduct a proof-of-concept study using the freely available generative language model GPT-Neo in this context and demonstrate that the campaign patterns can flexibly be adapted to an underlying social media stream and evade state-of-the-art campaign detection approaches based on stream clustering. Thus, we not only provide a framework for artificial benchmark generation but also demonstrate the possible adversarial nature of such benchmarks for challenging and advancing current campaign detection methods.
Proceedings of the 16th International Conference on Web and Social Media – 1st Workshop on Novel Evaluation Approaches for Text Classification Systems on Social Media (NEATCLasS)