Combining words and concepts for automatic arabic text classification

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The paper examines combining words and concepts for text representation for Arabic Automatic Text Classification (ATC) and its impact on the accuracy of the classification, when used with various stemming methods and classifiers. An experimental Arabic ATC system was developed and the effects of its main components on the classification accuracy are assessed. Firstly, variants of the standard Bag-of-Words model with different stemming methods are examined and compared. Arabic Wikipedia and WordNet were examined and compared for providing concepts for effective Bag-of-Concepts representation. Based on this, Wikipedia was then utilized to provide concepts, and different strategies for combining words and concepts, including two new in-house developed approaches, were examined for effective Arabic text representation in terms of their impact on the overall classification accuracy. Our experimental results show that text representation is a key element in the performance of Arabic ATC, and combining words and concepts to represent Arabic text enhances the classification accuracy as compared to using words or concepts alone.

Original languageEnglish
Title of host publicationArabic Language Processing
Subtitle of host publicationFrom Theory to Practice - 6th International Conference, ICALP 2017, Proceedings
EditorsKarim Bouzoubaa, Abdelfettah Hamdani, Abdelmonaime Lachkar, Abdelmonaime Lachkar, Azzedine Mazroui, Abdelhak Lekhouaja
PublisherSpringer Verlag
Pages105-119
Number of pages15
ISBN (Print)9783319734996
DOIs
Publication statusPublished - 2018
Event6th International Conference on Arabic Language Processing, ICALP 2017 - Fez, Morocco
Duration: 11 Oct 201712 Oct 2017

Publication series

NameCommunications in Computer and Information Science
Volume782
ISSN (Print)1865-0929

Conference

Conference6th International Conference on Arabic Language Processing, ICALP 2017
Country/TerritoryMorocco
CityFez
Period11/10/1712/10/17

Keywords

  • Arabic text classification
  • Bag of concepts
  • Bag of words
  • Text representation models
  • Wikipedia
  • WordNet

Fingerprint

Dive into the research topics of 'Combining words and concepts for automatic arabic text classification'. Together they form a unique fingerprint.

Cite this