Arabic text classification using bag-of-concepts representation

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

With the exponential growth of Arabic text in digital form, the need for efficient organization, navigation and browsing of large amounts of documents in Arabic has increased. Text Classification (TC) is one of the important subfields of data mining. The Bag-of-Words (BOW) representation model, which is the traditional way to represent text for TC, only takes into account the frequency of term occurrence within a document. Therefore, it ignores important semantic relationships between terms and treats synonymous words independently. In order to address this problem, this paper describes the application of a Bag-of- Concepts (BOC) text representation model for Arabic text. The proposed model is based on utilizing the Arabic Wikipedia as a knowledge base for concept detection. The BOC model is used to generate a Vector Space Model, which in turn is fed into a classifier to categorize a collection of Arabic text documents. Two different machine-learning based classifiers have been deployed to evaluate the effectiveness of the proposed model in comparison to the traditional BOW model. The results of our experiment show that the proposed BOC model achieves an improved performance with respect to BOW in terms of classification accuracy.

Original languageEnglish
Title of host publicationKDIR 2014 - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval
EditorsAna Fred, Joaquim Filipe, Joaquim Filipe
PublisherINSTICC Press
Pages374-380
Number of pages7
ISBN (Electronic)9789897580482
DOIs
Publication statusPublished - 2014
Event6th International Conference on Knowledge Discovery and Information Retrieval, KDIR 2014 - Rome, Italy
Duration: 21 Oct 201424 Oct 2014

Publication series

NameKDIR 2014 - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval

Conference

Conference6th International Conference on Knowledge Discovery and Information Retrieval, KDIR 2014
Country/TerritoryItaly
CityRome
Period21/10/1424/10/14

Keywords

  • Arabic text
  • Automatic text classification
  • Bag-of-concepts
  • Bag-of-words
  • Wikipedia

Fingerprint

Dive into the research topics of 'Arabic text classification using bag-of-concepts representation'. Together they form a unique fingerprint.

Cite this