A new text representation scheme combining Bag-of-Words and Bag-of-Concepts approaches for automatic text classification

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This paper introduces a new approach to creating text representations and apply it to a standard text classification collections. The approach is based on supplementing the well-known Bag-of-Words (BOW) representational scheme with a concept-based representation that utilises Wikipedia as a knowledge base. The proposed representations are used to generate a Vector Space Model, which in turn is fed into a Support Vector Machine classifier to categorise a collection of textual documents from two publically available datasets. Experimental results for evaluating the performance of our model in comparison to using a standard BOW scheme and a concept-based scheme, as well as recently reported similar text representations that are based on augmenting the standard BOW approach with concept-based representations.

Original languageEnglish
Title of host publication2013 7th IEEE GCC Conference and Exhibition, GCC 2013
Pages108-113
Number of pages6
DOIs
Publication statusPublished - 2013
Event2013 7th IEEE GCC Conference and Exhibition, GCC 2013 - Doha, Qatar
Duration: 17 Nov 201320 Nov 2013

Publication series

Name2013 7th IEEE GCC Conference and Exhibition, GCC 2013

Conference

Conference2013 7th IEEE GCC Conference and Exhibition, GCC 2013
Country/TerritoryQatar
CityDoha
Period17/11/1320/11/13

Keywords

  • Bag-of-Concepts
  • Bag-of-Words
  • Text Classification
  • Wikipedia

Fingerprint

Dive into the research topics of 'A new text representation scheme combining Bag-of-Words and Bag-of-Concepts approaches for automatic text classification'. Together they form a unique fingerprint.

Cite this