An unsupervised approach to automatic classification of scientific literature utilizing bibliographic metadata

Research output: Contribution to journalArticlepeer-review

Abstract

This article describes an unsupervised approach for automatic classification of scientific literature archived in digital libraries and repositories according to a standard library classification scheme. The method is based on identifying all the references cited in the document to be classified and, using the subject classification metadata of extracted references as catalogued in existing conventional libraries, inferring the most probable class for the document itself with the help of a weighting mechanism. We have demonstrated the application of the proposed method and assessed its performance by developing a prototype software system for automatic classification of scientific documents according to the Dewey Decimal Classification scheme. A dataset of 1000 research articles, papers, and reports from a well-known scientific digital library, CiteSeer, were used to evaluate the classification performance of the system. Detailed results of this experiment are presented and discussed.

Original languageEnglish
Pages (from-to)499-514
Number of pages16
JournalJournal of Information Science
Volume37
Issue number5
DOIs
Publication statusPublished - Oct 2011

Keywords

  • citation networks
  • Dewey Decimal Classification (DDC)
  • Digital library organization
  • Google Book Search (GBS)
  • library classification schemes
  • library Online Public Access Catalogues (OPACs)
  • scientific literature classification
  • WorldCat

Fingerprint

Dive into the research topics of 'An unsupervised approach to automatic classification of scientific literature utilizing bibliographic metadata'. Together they form a unique fingerprint.

Cite this