Collective moderation of hate, toxicity, and extremity in online discussions
Hate speech - a definition

Insults, discrimination, or intimidation, spreading fearful, negative, and harmful stereotypes, calling for exclusion or segregation, inciting hatred, and encouraging violence against individuals or groups on the grounds of their supposed race, ethnic origin, gender, religion, or political beliefs.

Source: Citizens for Justice and Peace website.
Why is hate speech a problem?
  • Hate speech is widespread: 3 in 10 people in Austria report having come across content online in the last 3 months that they consider hostile or degrading [1].
  • Hate speech threatens civic discourse: Targets of online hate leave social media [2] or restrict their expression of opinion [3].
  • Online hate turns into offline violence: A study from Germany estimates that 10% of violent crimes against refugees can be attributed to hate comments on Facebook [4].
[1] Press release by Statistik Austria, Ein Drittel der Bevölkerung berichtet von Hass im Netz (2023).
[2] Andrea Siegel, Online hate speech, Social Media and Democracy (2020).
[4] Müller & Schwarz, Fanning the Flames of Hate: Social Media and Hate Crime, Journal of the European Economic Association (2020).
What can we do against online hate speech?
  • Delete content, block accounts (censoring)
  • Professional moderation (expensive)
  • Criminal prosecution (tedious)
  • Authentification with real names (dangerous for dissidents)
  • Collective moderation
Source: STANDARD / Lukas Friesenbichler; Getty Images, Adobe Stock
Outline
  • Counter speech as collective moderation
  • Reconquista Germanica vs. Reconquista Internet - a case study
  • Under the hood: using machine learning for text classification
  • Staying constructive, being sarcastic or insulting: what helps against online hate?
Source: Jon Tyson, Unsplash.
Counter speech as collective moderation
Counter speech - a definition
  • Counter speech counters hate by presenting an alternative narrative.
  • Counter speech aims to reduce hate and increase the quality of discussions.
  • Counter speech is a bottom-up reaction by citizens.
Source: Amaddeu Antonio Stiftung Website.
Hate and counter speech in the wild

hate speech

spreading misinformation about members of a religion

counter speech

providing an alternative
narrative by giving facts

Example tweets were anonymised to protect the identities of Twitter users.
Does counter speech work?
  • Counter speech can reduce the prevalence of hateful comments, especially if the counter speakers involved are organized [1].
  • Counter speech can improve the quality of online discussions [2].
  • An experiment conducted with Swiss Twitter users shows that especially empathy-based counter speech reduces the prevalence of hate [3].

Which counter speech strategies are most effective?

[1] Garland et al., Impact and dynamics of hate and counter speech online, EPJ Data Science (2022).
[2] Cathy Buerger, Counterspeech: A Literature Review, SSRN (2021).
Reconquista Germanica vs. Reconquista Internet
- a case study
  • right-wing extremist group
  • founded before German federal
    elections in 2017
  • about 5000 members
  • aim: attacking Twitter accounts of newspapers and journalists by posting hate
  • civil rights movement
  • founded in April 2018
  • 4000-5000 members
  • motto: For love and reason on the Internet and a civilization of social discourse on social networks
Measuring the effectiveness of counter speech

A "conversation tree" on Twitter.

Which counter speech strategies are there?

We identified existing strategies based on a literature review [1, 2, 3, 4] and deep-reading of 1000 randomly selected tweets.

[1] Benesch et al., Considerations for successful counterspeech, Dangerous Speech Project; University of Connecticut (2016).
Under the hood: using machine learning for text classification

Machine learning for pattern recognition
  • Challenge: We want to automatically detect counter-speech strategies (+hate, toxicity and "extremity" of users) in 1.3 million tweets.
  • Solution: We fine-tune a machine learning model for sequence prediction (transformer) using examples that can identify similar tweets with a certain probability.
Transformer. Source: RONGWALLE Robot Toys for Kids, Amazon.
How to fine-tune a transformer model
Staying constructive, being sarcastic or insulting: what helps against online hate?

Three indicators of "effectiveness"

Reduction of hate: We classify tweets as hateful with a custom machine learning model and measure prevalence.

Reduction of toxicity: A rude, disrespectful or inappropriate comment that could cause someone to leave a discussion. We use a classifier trained by Google Jigsaw to identify toxic tweets (Perspective API).

Reduction of political extremity: We measure the extremity of users using a custom machine learning model trained on RI and RG account characteristics.

Illustrations: Stable Diffusion Online, 2023.



Statistical analysis

Micro scale: pairwise analysis of individual hate-counter-reply sequences.

Nonparametric matching

Meso scale: time-resolved statistical model of all tweets in a conversation tree.

Autoregressive Distributed Lag (ARDL) model

Meso scale: time-resolved statistical model of all tweets in a conversation tree.

Autoregressive Distributed Lag (ARDL) Modell

Macro scale: ARDL model of all tweets in the observation period
(2015 to 2018).

Interpretation of results

For every tweet we get a probability that it contains a certain strategy.

For a given strategy we can only say that it is more effective against hate, toxicity or extremity on average.

We cannot say anything about the effect of individual tweets.

Source: M. Fatchurofi, New York Times.
Summary
  • Counter speech is an effective self-organised strategy of citizens to counter hate. It is an alternative to censoring, moderation and authentification.
  • Digital trace data and machine learning can help to understand complex societal phenomena that are hard to study experimentally.
  • Stating an opinion (maybe with a touch of humor) is the most effective strategy against hate, toxicity and extremity.