TY - GEN
T1 - Combining words and concepts for automatic arabic text classification
AU - Alahmadi, Alaa
AU - Joorabchi, Arash
AU - Mahdi, Abdulhussain E.
N1 - Publisher Copyright:
© Springer International Publishing AG 2018.
PY - 2018
Y1 - 2018
N2 - The paper examines combining words and concepts for text representation for Arabic Automatic Text Classification (ATC) and its impact on the accuracy of the classification, when used with various stemming methods and classifiers. An experimental Arabic ATC system was developed and the effects of its main components on the classification accuracy are assessed. Firstly, variants of the standard Bag-of-Words model with different stemming methods are examined and compared. Arabic Wikipedia and WordNet were examined and compared for providing concepts for effective Bag-of-Concepts representation. Based on this, Wikipedia was then utilized to provide concepts, and different strategies for combining words and concepts, including two new in-house developed approaches, were examined for effective Arabic text representation in terms of their impact on the overall classification accuracy. Our experimental results show that text representation is a key element in the performance of Arabic ATC, and combining words and concepts to represent Arabic text enhances the classification accuracy as compared to using words or concepts alone.
AB - The paper examines combining words and concepts for text representation for Arabic Automatic Text Classification (ATC) and its impact on the accuracy of the classification, when used with various stemming methods and classifiers. An experimental Arabic ATC system was developed and the effects of its main components on the classification accuracy are assessed. Firstly, variants of the standard Bag-of-Words model with different stemming methods are examined and compared. Arabic Wikipedia and WordNet were examined and compared for providing concepts for effective Bag-of-Concepts representation. Based on this, Wikipedia was then utilized to provide concepts, and different strategies for combining words and concepts, including two new in-house developed approaches, were examined for effective Arabic text representation in terms of their impact on the overall classification accuracy. Our experimental results show that text representation is a key element in the performance of Arabic ATC, and combining words and concepts to represent Arabic text enhances the classification accuracy as compared to using words or concepts alone.
KW - Arabic text classification
KW - Bag of concepts
KW - Bag of words
KW - Text representation models
KW - Wikipedia
KW - WordNet
UR - http://www.scopus.com/inward/record.url?scp=85041094747&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-73500-9_8
DO - 10.1007/978-3-319-73500-9_8
M3 - Conference contribution
AN - SCOPUS:85041094747
SN - 9783319734996
T3 - Communications in Computer and Information Science
SP - 105
EP - 119
BT - Arabic Language Processing
A2 - Bouzoubaa, Karim
A2 - Hamdani, Abdelfettah
A2 - Lachkar, Abdelmonaime
A2 - Lachkar, Abdelmonaime
A2 - Mazroui, Azzedine
A2 - Lekhouaja, Abdelhak
PB - Springer Verlag
T2 - 6th International Conference on Arabic Language Processing, ICALP 2017
Y2 - 11 October 2017 through 12 October 2017
ER -