Hate Speech Detection

Hate speech detection is a general term that can include several different tasks. The most common is the identification of offensive language which aims at detecting whether a text is offensive or not (e.g. any type of comment that should be moderated on a social media platform such as containing bad language or attacking an individual). Once a text is detected as offensive, one can detect whether the content is hateful or not.

Here are definitions of the previous concepts:

  • offensive : contains profanity or insult

  • hateful : targets a group or an individual with the intent to be harmful or to cause social chaos.

Model Train Data License Trained by Tags DaNLP
BERT Offensive DKHate CC BY 4.0 Alexandra Instittut OFF / NOT ✔️
A&ttack Facebook comments CC BY-NC-SA 4.0 Analyse & Tal OFF / NOT
BERT Hatespeech Facebook comments CC BY 4.0 Alexandra Instittut OFF / NOT ✔️
ELECTRA Offensive Facebook comments CC BY 4.0 Alexandra Instittut OFF / NOT ✔️

Use cases

Hate speech detection is mostly used with the aim of providing support to moderators of social media platform.

Models

🔧 BERT Offensive

The offensive language identification model is intended to solve the binary classification problem of identifying whether a text is offensive or not (contains profanity or insult), therefore, given a text, can predict two classes: OFF (offensive) or NOT (not offensive). Its architecture is based on BERT (Devlin et al. 2019). In particular, it is based on the pretrained Danish BERT trained by BotXO and finetuned on the DKHate data using the Transformers library.

The BERT Offensive model can be loaded with the load_bert_offensive_model() method. Please note that it can maximum take 512 tokens as input at a time. The sentences are automatically truncated if longer.

Below is a small snippet for getting started using the BERT Offensive model.

from danlp.models import load_bert_offensive_model

# load the offensive language identification model
offensive_model = load_bert_offensive_model()

sentence = "Han ejer ikke respekt for nogen eller noget... han er megaloman og psykopat"

# apply the model on the sentence to get the class in which it belongs
pred = offensive_model.predict(sentence)
# or to get its probability of being part of each class
proba = offensive_model.predict_proba(sentence)

🔧 BERT HateSpeech

The BERT HateSpeech model can detect offensive language and hate speech.

It has been developed in collaboration with Danmarks Radio (DR). It is based on the pre-trained Danish BERT trained by BotXO, and finetuned on facebook data (non publicly available) annotated by DR.

It can predict:

  • whether a text is offensive or not : OFF (offensive) or NOT (not offensive);

  • the hate speech category it falls in : Særlig opmærksomhed, Personangreb, Sprogbrug, Spam & indhold.

The BERT HateSpeech model can be loaded with the load_bert_hatespeech_model() method. Please note that it can maximum take 512 tokens as input at a time. The sentences are automatically truncated if longer.

Below is a small snippet for getting started using the BERT HateSpeech model.

from danlp.models import load_bert_hatespeech_model

# load the HateSpeech model
hatespeech_model = load_bert_hatespeech_model()

sentence = "Han ejer ikke respekt for nogen eller noget... han er megaloman og psykopat"

# apply the model on the sentence to get the class in which it belongs
pred = hatespeech_model.predict(sentence)
# or to get its probability of being part of each class
proba = hatespeech_model.predict_proba(sentence)

🔧 ELECTRA Offensive

The ELECTRA Offensive model can detect offensive language.

It has been developed in collaboration with Danmarks Radio (DR). It is based on the pre-trained Danish Ælæctra, and finetuned on facebook data (non publicly available) annotated by DR.

It can predict whether a text is offensive or not : OFF (offensive) or NOT (not offensive).

The ELECTRA Offensive model can be loaded with the load_electra_offensive_model() method. Please note that it can maximum take 512 tokens as input at a time. The sentences are automatically truncated if longer.

Below is a small snippet for getting started using the ELECTRA Offensive model.

from danlp.models import load_electra_offensive_model

# load the model
offensive_model = load_electra_offensive_model()

sentence = "Han ejer ikke respekt for nogen eller noget... han er megaloman og psykopat"

# apply the model on the sentence to get the class in which it belongs
pred = offensive_model.predict(sentence)
# or to get its probability of being part of each class
proba = offensive_model.predict_proba(sentence)

A&ttack (Analyse & Tal)

The A&ttack model detects whether a text is offensive or not. It has been developed by Analyse & Tal and is based on the pretrained Ælectra model. It has been trained on social media data (Facebook, 67,188 tokens). See the github repo for more details and the report of the project.

📈 Benchmarks

See detailed scoring of the benchmarks in the example folder.

The benchmarking has been performed on the test part of the DKHate dataset.

The scores presented here describe the performance (F1) of the models for the task of offensive language identification.

Model OFF NOT AVG F1 Sentences per second (CPU*)
BERT Offensive 61.9 95.4 78.7 ~6
A&ttack (A&T) 34.2 91.4 62.8 ~2
BERT HateSpeech 46.2 92.8 69.5 ~10
ELECTRA Offensive 48.6 93.9 71.2 ~60

*Sentences per second is based on a Macbook Pro with Apple M1 chip.

The evaluation script hatespeech_benchmarks.py can be found here.

🎓 References