TY - GEN
T1 - Automatic subject metadata generation for scientific documents using wikipedia and genetic algorithms
AU - Joorabchi, Arash
AU - Mahdi, Abdulhussain E.
PY - 2012
Y1 - 2012
N2 - Topical annotation of documents with keyphrases is a proven method for revealing the subject of scientific and research documents. However, scientific documents that are manually annotated with keyphrases are in the minority. This paper describes a machine learning-based automatic keyphrase annotation method for scientific documents, which utilizes Wikipedia as a thesaurus for candidate selection from documents' content and deploys genetic algorithms to learn a model for ranking and filtering the most probable keyphrases. Reported experimental results show that the performance of our method, evaluated in terms of inter-consistency with human annotators, is on a par with that achieved by humans and outperforms rival supervised methods.
AB - Topical annotation of documents with keyphrases is a proven method for revealing the subject of scientific and research documents. However, scientific documents that are manually annotated with keyphrases are in the minority. This paper describes a machine learning-based automatic keyphrase annotation method for scientific documents, which utilizes Wikipedia as a thesaurus for candidate selection from documents' content and deploys genetic algorithms to learn a model for ranking and filtering the most probable keyphrases. Reported experimental results show that the performance of our method, evaluated in terms of inter-consistency with human annotators, is on a par with that achieved by humans and outperforms rival supervised methods.
KW - genetic algorithms
KW - keyphrase annotation
KW - keyphrase indexing
KW - scientific digital libraries
KW - subject metadata
KW - text mining
KW - Wikipedia
UR - http://www.scopus.com/inward/record.url?scp=84867681960&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-33876-2_6
DO - 10.1007/978-3-642-33876-2_6
M3 - Conference contribution
AN - SCOPUS:84867681960
SN - 9783642338755
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 32
EP - 41
BT - Knowledge Engineering and Knowledge Management - 18th International Conference, EKAW 2012, Proceedings
T2 - 18th International Conference on Knowledge Engineering and Knowledge Management, EKAW 2012
Y2 - 8 October 2012 through 12 October 2012
ER -