Improving document clustering performance: The use of an automatically generated ontology to augment document representations

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Clustering documents is a common task in a range of information retrieval systems and applications. Many approaches for improving the clustering process have been proposed. One approach is the use of an ontology to better inform the classifier of word context, by expanding the items to be clustered. Wordnet is commonly cited as an appropriate source from which to draw the additional terms; however, it may not be sufficient to achieve strong performance. We have two aims in this paper: first, we show that the use of Wordnet may lead to suboptimal performance. This problem may be accentuated when a document set has been drawn from comments made in social forums; due to the unstructured nature of online conversations compared to standard document sets. Second, we propose a novel method which involves constructing a bespoke ontology that facilitates better clustering. We present a study of clustering applied to a sample of threads from a social forum and investigate the effectiveness of the application of these methods.

Original languageEnglish
Title of host publicationIC3K 2017 - Proceedings of the 9th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management
EditorsAna Fred, Joaquim Filipe, Joaquim Filipe
PublisherSciTePress
Pages215-223
Number of pages9
ISBN (Print)9789897582714
DOIs
Publication statusPublished - 2017
Event9th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K 2017 - Funchal, Madeira, Portugal
Duration: 1 Nov 20173 Nov 2017

Publication series

NameIC3K 2017 - Proceedings of the 9th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management
Volume1

Conference

Conference9th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K 2017
Country/TerritoryPortugal
CityFunchal, Madeira
Period1/11/173/11/17

Keywords

  • Classification
  • Data mining
  • Document clustering
  • Graph theory
  • Word sense disambiguation
  • WordNet

Fingerprint

Dive into the research topics of 'Improving document clustering performance: The use of an automatically generated ontology to augment document representations'. Together they form a unique fingerprint.

Cite this