The distribution of hate speech and its implications for content moderation

Abstract

Hate speech is widely seen as a significant obstacle to constructive online discourse, but the most effective strategies to mitigate its effects remain unclear. We claim that understanding its distribution across users is key to developing and evaluating effective content moderation strategies. We address this missing link by first examining the distribution of hate speech in five original datasets that collect user-generated posts across multiple platforms (social media and online newspapers) and countries (Switzerland and the United States). Across these diverse samples, the vast majority of hate speech is produced by a small fraction of users. Second, results from a pre-registered field experiment on Twitter indicate that counterspeech strategies obtain only small reductions of future hate speech, mainly because this approach proves ineffective against the most prolific contributors of hate. These findings suggest that complementary content moderation strategies may be necessary to effectively address the problem.

Publication
Political Science Research and Methods, Conditionally Accepted
Avatar
Gloria Gennaro
Assistant Professor

Political Economy, Comparative Politics, Text as Data.