TY - GEN
T1 - Combining bag-of-words and bag-of-concepts representations for Arabic text classification
AU - Alahmadi, Alaa
AU - Joorabchi, Arash
AU - Mahdi, Abdulhussain E.
PY - 2014
Y1 - 2014
N2 - This paper introduces a set of new approaches for text representation for automatic classification of Arabic textual documents. These approaches are based on combining the well-known Bag-of-Words (BOW) and the Bag-of-Concepts (BOC) text representation schemes and utilizing Wikipedia as a knowledge base. The proposed representations are used to generate a vector space model, which in turn is fed into a classifier to categorize a collection of Arabic textual documents. Three different machine learning based classifiers have been utilized in this work. Performance of proposed text representation models is evaluated in comparison to using a standard BOW scheme and a concept-based scheme, as well as recently reported similar text representation schemes that are based on augmenting the standard BOW with the BOC.
AB - This paper introduces a set of new approaches for text representation for automatic classification of Arabic textual documents. These approaches are based on combining the well-known Bag-of-Words (BOW) and the Bag-of-Concepts (BOC) text representation schemes and utilizing Wikipedia as a knowledge base. The proposed representations are used to generate a vector space model, which in turn is fed into a classifier to categorize a collection of Arabic textual documents. Three different machine learning based classifiers have been utilized in this work. Performance of proposed text representation models is evaluated in comparison to using a standard BOW scheme and a concept-based scheme, as well as recently reported similar text representation schemes that are based on augmenting the standard BOW with the BOC.
KW - Arabic text classification
KW - Natural language processing
KW - Wikipedia
UR - http://www.scopus.com/inward/record.url?scp=84946063526&partnerID=8YFLogxK
U2 - 10.1049/cp.2014.0711
DO - 10.1049/cp.2014.0711
M3 - Conference contribution
AN - SCOPUS:84946063526
SN - 9781849199247
T3 - IET Conference Publications
SP - 343
EP - 348
BT - IET Conference Publications
PB - Institution of Engineering and Technology
T2 - 25th IET Irish Signals and Systems Conference, ISSC 2014 and China-Ireland International Conference on Information and Communications Technologies, CIICT 2014
Y2 - 26 June 2014 through 27 June 2014
ER -