TY - GEN
T1 - All-in-One Hate Speech Detectors May not be what You Want
AU - Bouazizi, Mondher
AU - Niida, Natsuho
AU - Ohtsuki, Tomoaki
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/1/16
Y1 - 2021/1/16
N2 - The detection of Hate speech has been an increasingly active research topic. The results reported by the state-of-the-art systems to automatically detect hateful contents achieved almost perfect performance on common data sets. However, "hate speech"is a very subjective term, and people with different backgrounds have different levels of tolerance to what constitutes hate. In this paper, we show the limitations of having a single classifier handling the problem of hate speech detection. We then propose to build classifiers customized for different people, instead of a single classifier. The main obstacle towards achieving such a goal is the scarcity of data. Therefore, we use transfer learning to overcome this issue and use very limited amount of annotated data to build these customized classifiers. In a first stage, we build a classifier on a large data set which classifies tweets into 3 classes: hate, offensive, clean, and which we refer to as the general classifier. In the second stage, we asked 3 annotators with different backgrounds to re-annotate a small sub-set of tweets (600 tweets) from the original one. We refer to this newly created data set as "the customized data set."We then fine-tune the general classifier on the customized data set and build the customized classifier for each annotator. The accuracy of classification of corresponding customized data set got 0.08, 0.06 and 0.11 higher than the general classifier. The result shows that it is possible to start with a general classifier, and adjusted it to each individual despite the very limited amount of the training data for him/her.
AB - The detection of Hate speech has been an increasingly active research topic. The results reported by the state-of-the-art systems to automatically detect hateful contents achieved almost perfect performance on common data sets. However, "hate speech"is a very subjective term, and people with different backgrounds have different levels of tolerance to what constitutes hate. In this paper, we show the limitations of having a single classifier handling the problem of hate speech detection. We then propose to build classifiers customized for different people, instead of a single classifier. The main obstacle towards achieving such a goal is the scarcity of data. Therefore, we use transfer learning to overcome this issue and use very limited amount of annotated data to build these customized classifiers. In a first stage, we build a classifier on a large data set which classifies tweets into 3 classes: hate, offensive, clean, and which we refer to as the general classifier. In the second stage, we asked 3 annotators with different backgrounds to re-annotate a small sub-set of tweets (600 tweets) from the original one. We refer to this newly created data set as "the customized data set."We then fine-tune the general classifier on the customized data set and build the customized classifier for each annotator. The accuracy of classification of corresponding customized data set got 0.08, 0.06 and 0.11 higher than the general classifier. The result shows that it is possible to start with a general classifier, and adjusted it to each individual despite the very limited amount of the training data for him/her.
KW - Deep Learning
KW - Hate Speech Detection
KW - Transfer Learning
KW - Twitter
UR - http://www.scopus.com/inward/record.url?scp=85112532542&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85112532542&partnerID=8YFLogxK
U2 - 10.1145/3451471.3451498
DO - 10.1145/3451471.3451498
M3 - Conference contribution
AN - SCOPUS:85112532542
T3 - ACM International Conference Proceeding Series
SP - 165
EP - 170
BT - ICSIM 2021 - Proceedings of the 2021 4th International Conference on Software Engineering and Information Management
PB - Association for Computing Machinery
T2 - 4th International Conference on Software Engineering and Information Management, ICSIM 2021
Y2 - 16 January 2021 through 18 January 2021
ER -