TY - JOUR
T1 - Multi-class sentiment analysis on twitter
T2 - Classification performance and challenges
AU - Bouazizi, Mondher
AU - Ohtsuki, Tomoaki
N1 - Publisher Copyright:
© 2020 The author(s).
PY - 2019/9
Y1 - 2019/9
N2 - Sentiment analysis refers to the automatic collection, aggregation, and classification of data collected online into different emotion classes. While most of the work related to sentiment analysis of texts focuses on the binary and ternary classification of these data, the task of multi-class classification has received less attention. Multi-class classification has always been a challenging task given the complexity of natural languages and the difficulty of understanding and mathematically "quantifying"how humans express their feelings. In this paper, we study the task of multi-class classification of online posts of Twitter users, and show how far it is possible to go with the classification, and the limitations and difficulties of this task. The proposed approach of multi-class classification achieves an accuracy of 60.2% for 7 different sentiment classes which, compared to an accuracy of 81.3% for binary classification, emphasizes the effect of having multiple classes on the classification performance. Nonetheless, we propose a novel model to represent the different sentiments and show how this model helps to understand how sentiments are related. The model is then used to analyze the challenges that multi-class classification presents and to highlight possible future enhancements to multi-class classification accuracy.
AB - Sentiment analysis refers to the automatic collection, aggregation, and classification of data collected online into different emotion classes. While most of the work related to sentiment analysis of texts focuses on the binary and ternary classification of these data, the task of multi-class classification has received less attention. Multi-class classification has always been a challenging task given the complexity of natural languages and the difficulty of understanding and mathematically "quantifying"how humans express their feelings. In this paper, we study the task of multi-class classification of online posts of Twitter users, and show how far it is possible to go with the classification, and the limitations and difficulties of this task. The proposed approach of multi-class classification achieves an accuracy of 60.2% for 7 different sentiment classes which, compared to an accuracy of 81.3% for binary classification, emphasizes the effect of having multiple classes on the classification performance. Nonetheless, we propose a novel model to represent the different sentiments and show how this model helps to understand how sentiments are related. The model is then used to analyze the challenges that multi-class classification presents and to highlight possible future enhancements to multi-class classification accuracy.
KW - Machine learning
KW - Sentiment analysis
KW - Twitter
UR - http://www.scopus.com/inward/record.url?scp=85080479073&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85080479073&partnerID=8YFLogxK
U2 - 10.26599/BDMA.2019.9020002
DO - 10.26599/BDMA.2019.9020002
M3 - Article
AN - SCOPUS:85080479073
SN - 2096-0654
VL - 2
SP - 181
EP - 194
JO - Big Data Mining and Analytics
JF - Big Data Mining and Analytics
IS - 3
M1 - 8681053
ER -