Combining bag-of-words and bag-of-concepts representations for Arabic text classification

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This paper introduces a set of new approaches for text representation for automatic classification of Arabic textual documents. These approaches are based on combining the well-known Bag-of-Words (BOW) and the Bag-of-Concepts (BOC) text representation schemes and utilizing Wikipedia as a knowledge base. The proposed representations are used to generate a vector space model, which in turn is fed into a classifier to categorize a collection of Arabic textual documents. Three different machine learning based classifiers have been utilized in this work. Performance of proposed text representation models is evaluated in comparison to using a standard BOW scheme and a concept-based scheme, as well as recently reported similar text representation schemes that are based on augmenting the standard BOW with the BOC.

Original languageEnglish
Title of host publicationIET Conference Publications
PublisherInstitution of Engineering and Technology
Pages343-348
Number of pages6
EditionCP639
ISBN (Print)9781849199247
DOIs
Publication statusPublished - 2014
Event25th IET Irish Signals and Systems Conference, ISSC 2014 and China-Ireland International Conference on Information and Communications Technologies, CIICT 2014 - Limerick, Ireland
Duration: 26 Jun 201427 Jun 2014

Publication series

NameIET Conference Publications
NumberCP639
Volume2014

Conference

Conference25th IET Irish Signals and Systems Conference, ISSC 2014 and China-Ireland International Conference on Information and Communications Technologies, CIICT 2014
Country/TerritoryIreland
CityLimerick
Period26/06/1427/06/14

Keywords

  • Arabic text classification
  • Natural language processing
  • Wikipedia

Fingerprint

Dive into the research topics of 'Combining bag-of-words and bag-of-concepts representations for Arabic text classification'. Together they form a unique fingerprint.

Cite this