TY - JOUR
T1 - A citation-based approach to automatic topical indexing of scientific literature
AU - Mahdi, Abdulhussain E.
AU - Joorabchi, Arash
PY - 2010/12
Y1 - 2010/12
N2 - Topical indexing of documents with keyphrases is a common method used for revealing the subject of scientific and research documents to both human readers and information retrieval tools, such as search engines. However, scientific documents that are manually indexed with keyphrases are still in the minority. This article describes a new unsupervised method for automatic keyphrase extraction from scientific documents which yields a performance on a par with human indexers. The method is based on identifying references cited in the document to be indexed and, using the keyphrases assigned to those references, for generating a set of high-likelihood keyphrases for the document. We have evaluated the performance of the proposed method by using it to automatically index a third-party testset of research documents. Reported experimental results show that the performance of our method, measured in terms of consistency with human indexers, is competitive with that achieved by state-of-the-art supervised methods.
AB - Topical indexing of documents with keyphrases is a common method used for revealing the subject of scientific and research documents to both human readers and information retrieval tools, such as search engines. However, scientific documents that are manually indexed with keyphrases are still in the minority. This article describes a new unsupervised method for automatic keyphrase extraction from scientific documents which yields a performance on a par with human indexers. The method is based on identifying references cited in the document to be indexed and, using the keyphrases assigned to those references, for generating a set of high-likelihood keyphrases for the document. We have evaluated the performance of the proposed method by using it to automatically index a third-party testset of research documents. Reported experimental results show that the performance of our method, measured in terms of consistency with human indexers, is competitive with that achieved by state-of-the-art supervised methods.
KW - citation-based indexing
KW - data mining
KW - keyphrase extraction
KW - topical indexing
UR - http://www.scopus.com/inward/record.url?scp=78650090016&partnerID=8YFLogxK
U2 - 10.1177/0165551510388080
DO - 10.1177/0165551510388080
M3 - Article
AN - SCOPUS:78650090016
SN - 0165-5515
VL - 36
SP - 798
EP - 811
JO - Journal of Information Science
JF - Journal of Information Science
IS - 6
ER -