Hello there, I am Dennis, a computational social scientist currently working as a Post-Doc at GESIS - Leibniz Institute for the Social Sciences, at the department for Computational Social Science . In 2022 I received my doctoral degree from the University of Münster, working at the Chair of Data Science: Statistics and Optimitzation. My research interest focus on harmful communication
in online media. I am explicitly interested in state-of-the-art computational methods in the NLP domain, specifically focusing on developing detection mechanisms for this harmful content. In the past, I published various papers on different means of harmful online communication, such as social bots (automated social media accounts) and abusive language.
Building sophisticated models that are able to detect harmful online content relies not only on state-of-the-art machine-learning methods but also on the underlying training data. Data quality, including data availability and appropriate construct definitions, are therefore important variables that impact the predictive power of models, especially when social media data are concerned. I explore all of these data quality aspects within my research from different perspectives.
If you are interested in my work, I recommend checking out the publications section.
Apart from my academic life, I am quite interested in 3D printing and micro-controller DIY projects. I am a vivid fan of flight simulation, and on rainy weekends you will probably find me in my (virtual) A320 approaching EGLL on 27L. Maybe someday, I will be flying my own plane (of course, on a smaller scale).
Recent advances in the field of generative artificial intelligence (AI) have blurred the lines between authentic and machine-generated content, making it almost impossible for humans to distinguish between such media. One notable consequence is the use of AI-generated images for fake profiles on social media. While several types of disinformation campaigns and similar incidents have been reported in the past, a systematic analysis has been lacking. In this work, we conduct the first large-scale investigation of the prevalence of AI-generated profile pictures on Twitter. We tackle the challenges of a real-world measurement study by carefully integrating various data sources and designing a multi-stage detection pipeline. Our analysis of nearly 15 million Twitter profile pictures shows that 0.052% were artificially generated, confirming their notable presence on the platform. We comprehensively examine the characteristics of these accounts and their tweet content, and uncover patterns of coordinated inauthentic behavior. The results also reveal several motives, including spamming and political amplification campaigns. Our research reaffirms the need for effective detection and mitigation strategies to cope with the potential negative effects of generative AI in the future.
Machine learning (ML)-based content moderation tools are essential to keep online spaces free from hateful communication. Yet ML tools can only be as capable as the quality of the data they are trained on allows them. While there is increasing evidence that they underperform in detecting hateful communications directed towards specific identities and may discriminate against them, we know surprisingly little about the provenance of such bias. To fill this gap, we present a systematic review of the datasets for the automated detection of hateful communication introduced over the past decade, and unpack the quality of the datasets in terms of the identities that they embody: those of the targets of hateful communication that the data curators focused on, as well as those unintentionally included in the datasets. We find, overall, a skewed representation of selected target identities and mismatches between the targets that research conceptualizes and ultimately includes in datasets. Yet, by contextualizing these findings in the language and location of origin of the datasets, we highlight a positive trend towards the broadening and diversification of this research space.
The characterization and detection of social bots with their presumed ability to manipulate society on social media platforms have been subject to many research endeavors over the last decade, leaving a research gap on the impact of bots and accompanying phenomena on platform users and society. In this systematic data-driven study, we explore the users’ perception of the construct bot at a large scale, focusing on the evolution of bot accusations over time. We create and analyze a novel dataset consisting of bot accusations that have occurred on the social media platform Twitter since 2007, providing insights into the meanings and contexts of these particular communication situations. We find evidence that over time the term bot has moved away from its technical meaning to become an ‘insult’ specifically used in polarizing discussions to discredit and ultimately dehumanize the opponent.
NLP models are used in a variety of critical social computing tasks, such as detecting sexist, racist, or otherwise hateful content. Therefore, it is imperative that these models are robust to spurious features. Past work has attempted to tackle such spurious features using training data augmentation, including Counterfactually Augmented Data (CADs). CADs introduce minimal changes to existing training data points and flip their labels; training on them may reduce model dependency on spurious features. However, manually generating CADs can be time-consuming and expensive. Hence in this work, we assess if this task can be automated using generative NLP models. We automatically generate CADs using Polyjuice, ChatGPT, and Flan-T5, and evaluate their usefulness in improving model robustness compared to manually-generated CADs. By testing both model performance on multiple out-of-domain test sets and individual data point efficacy, our results show that while manual CADs are still the most effective, CADs generated by ChatGPT come a close second. One key reason for the lower performance of automated methods is that the changes they introduce are often insufficient to flip the original label.
Are you interested in a collaboration on investigating harmful online communication? You can always reach me through Twitter or ResearchGate . You can also drop me an e-mail using the following address