Title: Hate Speech Detection in Turkish News Using a Transformer-based Model Enhanced with Linguistic Features
Advisor: Arzucan Özgür
Abstract: Hate speech directed at ethnicities, nationalities, religious identities, and specific groups has increased not only in social media, but also in print media. This creates a need for automated hate speech detection systems that can quickly review and filter print media content before it is provided to readers if it contains hate speech. However, most of the existing automatic hate speech detection models are limited to detecting hate speech without considering the hate speech target group-specific discourse that is often used in news articles. Moreover, there are few datasets that include Turkish print media articles in the hate speech domain. In this study, a new BERT -based model enriched with a set of target-oriented linguistic features for hate speech detection is proposed. The effects of weighting different BERT hidden vectors are also investigated, instead of using only the first hidden vector of the BERT -encoder, which is the classical approach. New BERT -based models that integrate different attention techniques are proposed for combining hidden vectors. A new preprocessed Turkish dataset for hate speech is also published, in which the target group for all hate speech articles is annotated. Experiments on a comprehensive Turkish dataset of news articles labeled for hate speech show that competitive performance in terms of accuracy and F1-score is achieved compared to previous approaches.
 
            

