TY - JOUR
T1 - Word representation learning based on bidirectional GRUs with drop loss for sentiment classification
AU - Sun, Xia
AU - Gao, Yi
AU - Sutcliffe, Richard
AU - Guo, Shou Xi
AU - Wang, Xin
AU - Feng, Jun
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2021/7
Y1 - 2021/7
N2 - Sentiment classification is a fundamental task in many natural language processing applications. Neural networks have achieved great successes on the sentiment classification task in recent years, since recurrent neural networks and long-short-term memory networks have the ability to deal with sequences of different lengths and to capture contextual semantic information. However, the effectiveness of these methods is limited when used to extract contextual information from relatively long texts. Therefore, in our model, we apply bidirectional gated recurrent units to capture contextual information as far as possible when learning word representations, which may effectively reduce the noise compared to other methods. We also propose a novel loss function namely drop loss (DL) which makes the model focus on the hard examples - examples which are easily classified incorrectly - in order to improve the accuracy of the model. We experiment on four commonly used datasets, and the results show that the proposed method has a good performance on four datasets, and needs fewer parameters compared with recent benchmarks, such as CoVe, ULMFiT, embeddings from language models, and bidirectional encoder representations from transformers. Furthermore, we demonstrate that the classification performance of existing shallow network models can be significantly improved by using DL. In particular, the accuracy of the CNN+LSTM model improves 9% on the IMDB-10 dataset.
AB - Sentiment classification is a fundamental task in many natural language processing applications. Neural networks have achieved great successes on the sentiment classification task in recent years, since recurrent neural networks and long-short-term memory networks have the ability to deal with sequences of different lengths and to capture contextual semantic information. However, the effectiveness of these methods is limited when used to extract contextual information from relatively long texts. Therefore, in our model, we apply bidirectional gated recurrent units to capture contextual information as far as possible when learning word representations, which may effectively reduce the noise compared to other methods. We also propose a novel loss function namely drop loss (DL) which makes the model focus on the hard examples - examples which are easily classified incorrectly - in order to improve the accuracy of the model. We experiment on four commonly used datasets, and the results show that the proposed method has a good performance on four datasets, and needs fewer parameters compared with recent benchmarks, such as CoVe, ULMFiT, embeddings from language models, and bidirectional encoder representations from transformers. Furthermore, we demonstrate that the classification performance of existing shallow network models can be significantly improved by using DL. In particular, the accuracy of the CNN+LSTM model improves 9% on the IMDB-10 dataset.
KW - Loss function
KW - neural networks
KW - sentiment classification
KW - word presentation
UR - http://www.scopus.com/inward/record.url?scp=85112145460&partnerID=8YFLogxK
U2 - 10.1109/TSMC.2019.2940097
DO - 10.1109/TSMC.2019.2940097
M3 - Article
AN - SCOPUS:85112145460
SN - 2168-2216
VL - 51
SP - 4532
EP - 4542
JO - IEEE Transactions on Systems, Man, and Cybernetics: Systems
JF - IEEE Transactions on Systems, Man, and Cybernetics: Systems
IS - 7
M1 - 8862934
ER -