Context-aware text classification system to improve the quality of text: A detailed investigation and techniques

Zeeshan Saleem, Adi Alhudhaif, Kashif Naseer Qureshi, Gwanggil Jeon

Research output: Contribution to journalArticlepeer-review

Abstract

Text classification is one of the most important tasks to extract information from the Internet and identifying the best text representation settings. With the increase of data volume on the world wide web, the significance of text classification increases. This situation requires huge human efforts to understand and classify the digital data available on the Internet. Text classification is classifying the number of text files into different classes. The data or text available on the Internet is in an unstructured form which increases the difficulty to understand and classify it for useful purposes. This paper proposes a context-aware text classification system to improve text quality. We use a content-aware recommendation system to extract the data from well-known news databases. Text preprocessing techniques like tokenization, stemming, and stop words removal are studied in detail. Furthermore, unigram, bigram, and trigram attributes are also being tested. Attribute selection methods are also examined and their impact on the text classification results. To carry out a detailed investigation, 11 versions are created of each dataset to save the time in experimentation process and applied the different preprocessing techniques to understand the impact of each technique on classification results. The proposed system is compared with the existing approach to check the accuracy where the proposed system achieved better performance.

Original languageEnglish
Article numbere6489
JournalConcurrency and Computation: Practice and Experience
Volume35
Issue number15
DOIs
Publication statusPublished - 10 Jul 2023
Externally publishedYes

Keywords

  • accuracy
  • algorithm
  • classification
  • context-aware
  • data mining
  • dataset
  • methods | computer

Fingerprint

Dive into the research topics of 'Context-aware text classification system to improve the quality of text: A detailed investigation and techniques'. Together they form a unique fingerprint.

Cite this