Hello there, I am Dennis, a computational social scientist currently working as a Post-Doc at GESIS - Leibnitz Institute for the Social Sciences, at the department for Computational Social Science . In 2022 I received my doctoral degree from the University of Münster, working at the Chair of Data Science: Statistics and Optimitzation. My research interest focus on harmful communication
in online media. I am explicitly interested in state-of-the-art computational methods in the NLP domain, specifically focusing on developing detection mechanisms for this harmful content. In the past, I published various papers on different means of harmful online communication, such as social bots (automated social media accounts) and abusive language.
Building sophisticated models that are able to detect harmful online content relies not only on state-of-the-art machine-learning methods but also on the underlying training data. Data quality, including data availability and appropriate construct definitions, are therefore important variables that impact the predictive power of models, especially when social media data are concerned. I explore all of these data quality aspects within my research from different perspectives.
If you are interested in my work, I recommend checking out the publications section.
Apart from my academic life, I am quite interested in 3D printing and micro-controller DIY projects. I am a vivid fan of flight simulation, and on rainy weekends you will probably find me in my (virtual) A320 approaching EGLL on 27L. Maybe someday, I will be flying my own plane (of course, on a smaller scale).
NLP models are used in a variety of critical social computing tasks, such as detecting sexist, racist, or otherwise hateful content. Therefore, it is imperative that these models are robust to spurious features. Past work has attempted to tackle such spurious features using training data augmentation, including Counterfactually Augmented Data (CADs). CADs introduce minimal changes to existing training data points and flip their labels; training on them may reduce model dependency on spurious features. However, manually generating CADs can be time-consuming and expensive. Hence in this work, we assess if this task can be automated using generative NLP models. We automatically generate CADs using Polyjuice, ChatGPT, and Flan-T5, and evaluate their usefulness in improving model robustness compared to manually-generated CADs. By testing both model performance on multiple out-of-domain test sets and individual data point efficacy, our results show that while manual CADs are still the most effective, CADs generated by ChatGPT come a close second. One key reason for the lower performance of automated methods is that the changes they introduce are often insufficient to flip the original label.
The characterization and detection of social bots with their presumed ability to manipulate society on social media platforms have been subject to many research endeavors over the last decade, leaving a research gap on the impact of bots and accompanying phenomena on platform users and society. In this systematic data-driven study, we explore the users’ perception of the construct bot at a large scale, focusing on the evolution of bot accusations over time. We create and analyze a novel dataset consisting of bot accusations that have occurred on the social media platform Twitter since 2007, providing insights into the meanings and contexts of these particular communication situations. We find evidence that over time the term bot has moved away from its technical meaning to become an ‘insult’ specifically used in polarizing discussions to discredit and ultimately dehumanize the opponent.
Are you interested in a collaboration on investigating harmful online communication? You can always reach me through Twitter or ResearchGate . You can also drop me an e-mail using the following address