TY - JOUR
T1 - Improving the visibility of library resources via mapping library subject headings to Wikipedia articles
AU - Joorabchi, Arash
AU - Mahdi, Abdulhussain E.
N1 - Publisher Copyright:
© 2018, Emerald Publishing Limited.
PY - 2018/2/7
Y1 - 2018/2/7
N2 - Purpose: Linking libraries and Wikipedia can significantly improve the quality of services provided by these two major silos of knowledge. Such linkage would enrich the quality of Wikipedia articles and at the same time increase the visibility of library resources. To this end, the purpose of this paper is to describe the design and development of a software system for automatic mapping of FAST subject headings, used to index library materials, to their corresponding articles in Wikipedia. Design/methodology/approach: The proposed system works by first detecting all the candidate Wikipedia concepts (articles) occurring in the titles of the books and other library materials which are indexed with a given FAST subject heading. This is then followed by training and deploying a machine learning (ML) algorithm designed to automatically identify those concepts that correspond to the FAST heading. In specific, the ML algorithm used is a binary classifier which classifies the candidate concepts into either “corresponding” or “non-corresponding” categories. The classifier is trained to learn the characteristics of those candidates which have the highest probability of belonging to the “corresponding” category based on a set of 14 positional, statistical, and semantic features. Findings: The authors have assessed the performance of the developed system using standard information retrieval measures of precision, recall, and F-score on a data set containing 170 FAST subject headings manually mapped to their corresponding Wikipedia articles. The evaluation results show that the developed system is capable of achieving F-scores as high as 0.65 and 0.99 in the corresponding and non-corresponding categories, respectively. Research limitations/implications: The size of the data set used to evaluate the performance of the system is rather small. However, the authors believe that the developed data set is large enough to demonstrate the feasibility and scalability of the proposed approach. Practical implications: The sheer size of English Wikipedia makes the manual mapping of Wikipedia articles to library subject headings a very labor-intensive and time-consuming task. Therefore, the aim is to reduce the cost of such mapping and integration. Social implications: The proposed mapping paves the way for connecting libraries and Wikipedia as two major silos of knowledge, and enables the bi-directional movement of users between the two. Originality/value: To the best of the authors’ knowledge, the current work is the first attempt at automatic mapping of Wikipedia to a library-controlled vocabulary.
AB - Purpose: Linking libraries and Wikipedia can significantly improve the quality of services provided by these two major silos of knowledge. Such linkage would enrich the quality of Wikipedia articles and at the same time increase the visibility of library resources. To this end, the purpose of this paper is to describe the design and development of a software system for automatic mapping of FAST subject headings, used to index library materials, to their corresponding articles in Wikipedia. Design/methodology/approach: The proposed system works by first detecting all the candidate Wikipedia concepts (articles) occurring in the titles of the books and other library materials which are indexed with a given FAST subject heading. This is then followed by training and deploying a machine learning (ML) algorithm designed to automatically identify those concepts that correspond to the FAST heading. In specific, the ML algorithm used is a binary classifier which classifies the candidate concepts into either “corresponding” or “non-corresponding” categories. The classifier is trained to learn the characteristics of those candidates which have the highest probability of belonging to the “corresponding” category based on a set of 14 positional, statistical, and semantic features. Findings: The authors have assessed the performance of the developed system using standard information retrieval measures of precision, recall, and F-score on a data set containing 170 FAST subject headings manually mapped to their corresponding Wikipedia articles. The evaluation results show that the developed system is capable of achieving F-scores as high as 0.65 and 0.99 in the corresponding and non-corresponding categories, respectively. Research limitations/implications: The size of the data set used to evaluate the performance of the system is rather small. However, the authors believe that the developed data set is large enough to demonstrate the feasibility and scalability of the proposed approach. Practical implications: The sheer size of English Wikipedia makes the manual mapping of Wikipedia articles to library subject headings a very labor-intensive and time-consuming task. Therefore, the aim is to reduce the cost of such mapping and integration. Social implications: The proposed mapping paves the way for connecting libraries and Wikipedia as two major silos of knowledge, and enables the bi-directional movement of users between the two. Originality/value: To the best of the authors’ knowledge, the current work is the first attempt at automatic mapping of Wikipedia to a library-controlled vocabulary.
KW - Controlled vocabularies
KW - Data integration
KW - FAST subject headings
KW - Library catalogues
KW - Semantic mapping
KW - Wikipedia
UR - http://www.scopus.com/inward/record.url?scp=85038859574&partnerID=8YFLogxK
U2 - 10.1108/LHT-04-2017-0066
DO - 10.1108/LHT-04-2017-0066
M3 - Article
AN - SCOPUS:85038859574
SN - 0737-8831
VL - 36
SP - 57
EP - 74
JO - Library Hi Tech
JF - Library Hi Tech
IS - 1
ER -