TY - JOUR
T1 - Improving Arabic Sentiment Analysis Using CNN-Based Architectures and Text Preprocessing
AU - Mhamed, Mustafa
AU - Sutcliffe, Richard
AU - Sun, Xia
AU - Feng, Jun
AU - Almekhlafi, Eiad
AU - Retta, Ephrem Afele
N1 - Publisher Copyright:
© 2021 Mustafa Mhamed et al.
PY - 2021
Y1 - 2021
N2 - Sentiment analysis is an essential process which is important to many natural language applications. In this paper, we apply two models for Arabic sentiment analysis to the ASTD and ATDFS datasets, in both 2-class and multiclass forms. Model MC1 is a 2-layer CNN with global average pooling, followed by a dense layer. MC2 is a 2-layer CNN with max pooling, followed by a BiGRU and a dense layer. On the difficult ASTD 4-class task, we achieve 73.17%, compared to 65.58% reported by Attia et al., 2018. For the easier 2-class task, we achieve 90.06% with MC1 compared to 85.58% reported by Kwaik et al., 2019. We carry out experiments on various data splits, to match those used by other researchers. We also pay close attention to Arabic preprocessing and include novel steps not reported in other works. In an ablation study, we investigate the effect of two steps in particular, the processing of emoticons and the use of a custom stoplist. On the 4-class task, these can make a difference of up to 4.27% and 5.48%, respectively. On the 2-class task, the maximum improvements are 2.95% and 3.87%.
AB - Sentiment analysis is an essential process which is important to many natural language applications. In this paper, we apply two models for Arabic sentiment analysis to the ASTD and ATDFS datasets, in both 2-class and multiclass forms. Model MC1 is a 2-layer CNN with global average pooling, followed by a dense layer. MC2 is a 2-layer CNN with max pooling, followed by a BiGRU and a dense layer. On the difficult ASTD 4-class task, we achieve 73.17%, compared to 65.58% reported by Attia et al., 2018. For the easier 2-class task, we achieve 90.06% with MC1 compared to 85.58% reported by Kwaik et al., 2019. We carry out experiments on various data splits, to match those used by other researchers. We also pay close attention to Arabic preprocessing and include novel steps not reported in other works. In an ablation study, we investigate the effect of two steps in particular, the processing of emoticons and the use of a custom stoplist. On the 4-class task, these can make a difference of up to 4.27% and 5.48%, respectively. On the 2-class task, the maximum improvements are 2.95% and 3.87%.
UR - http://www.scopus.com/inward/record.url?scp=85115768624&partnerID=8YFLogxK
U2 - 10.1155/2021/5538791
DO - 10.1155/2021/5538791
M3 - Article
C2 - 34545281
AN - SCOPUS:85115768624
SN - 1687-5265
VL - 2021
SP - 5538791
JO - Computational Intelligence and Neuroscience
JF - Computational Intelligence and Neuroscience
M1 - 5538791
ER -