Automatic subject metadata generation for scientific documents using wikipedia and genetic algorithms

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Topical annotation of documents with keyphrases is a proven method for revealing the subject of scientific and research documents. However, scientific documents that are manually annotated with keyphrases are in the minority. This paper describes a machine learning-based automatic keyphrase annotation method for scientific documents, which utilizes Wikipedia as a thesaurus for candidate selection from documents' content and deploys genetic algorithms to learn a model for ranking and filtering the most probable keyphrases. Reported experimental results show that the performance of our method, evaluated in terms of inter-consistency with human annotators, is on a par with that achieved by humans and outperforms rival supervised methods.

Original languageEnglish
Title of host publicationKnowledge Engineering and Knowledge Management - 18th International Conference, EKAW 2012, Proceedings
Pages32-41
Number of pages10
DOIs
Publication statusPublished - 2012
Event18th International Conference on Knowledge Engineering and Knowledge Management, EKAW 2012 - Galway City, Ireland
Duration: 8 Oct 201212 Oct 2012

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7603 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference18th International Conference on Knowledge Engineering and Knowledge Management, EKAW 2012
Country/TerritoryIreland
CityGalway City
Period8/10/1212/10/12

Keywords

  • genetic algorithms
  • keyphrase annotation
  • keyphrase indexing
  • scientific digital libraries
  • subject metadata
  • text mining
  • Wikipedia

Fingerprint

Dive into the research topics of 'Automatic subject metadata generation for scientific documents using wikipedia and genetic algorithms'. Together they form a unique fingerprint.

Cite this