TY - GEN
T1 - Extracting Drug-drug Interactions from Biomedical Texts using Knowledge Graph Embeddings and Multi-focal Loss
AU - Jin, Xin
AU - Sun, Xia
AU - Chen, Jiacheng
AU - Sutcliffe, Richard
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/10/17
Y1 - 2022/10/17
N2 - The field of Drug-drug interaction (DDI) aims to detect descriptions of interactions between drugs from biomedical texts. Currently, researchers have extracted DDIs using pre-trained language models such as BERT, which often misclassify two kinds of DDI types, "Effect"and "Int", on the DDIExtraction 2013 corpus because of highly similar expressions. The use of knowledge graphs can alleviate this problem by incorporating different relationships for each, thus allowing them to be distinguished. Thus, we propose a novel framework to integrate the neural network with a knowledge graph, where the features from these components are complementary. Specifically, we take text features at different levels into account in the neural network part. This is done by firstly obtaining a word-level position feature using PubMedBERT together with a convolution neural network, secondly, getting a phrase-level key path feature using a dependency parsing tree, thirdly, using PubMedBERT with an attention mechanism to obtain a sentence-level language feature, and finally, fusing these three kinds of representation into a synthesized feature. We also extract a knowledge feature from a drug knowledge graph which takes just a few minutes to construct, then concatenate the synthesized feature with the knowledge feature, feed the result into a multi-layer perceptron and obtain the result by a softmax classifier. In order to achieve a good integration of the synthesized feature and the knowledge feature, we train the model using a novel multifocal loss function, KGE-MFL, which is based on a knowledge graph embedding. Finally we attain state-of-the-art results on the DDIExtraction 2013 dataset (micro F-score 86.24%) and on the ChemProt dataset (micro F-score 77.75%), which proves our framework to be effective for biomedical relation extraction tasks. In particular, we fill the performance gap (more than 5.57%) between methods that rely on and do not rely on knowledge graph embedding on the DDIExtraction 2013 corpus, when predicting the "Int"type. The implementation code is available at https://github.com/NWU-IPMI/DDIE-KGE-MFL.
AB - The field of Drug-drug interaction (DDI) aims to detect descriptions of interactions between drugs from biomedical texts. Currently, researchers have extracted DDIs using pre-trained language models such as BERT, which often misclassify two kinds of DDI types, "Effect"and "Int", on the DDIExtraction 2013 corpus because of highly similar expressions. The use of knowledge graphs can alleviate this problem by incorporating different relationships for each, thus allowing them to be distinguished. Thus, we propose a novel framework to integrate the neural network with a knowledge graph, where the features from these components are complementary. Specifically, we take text features at different levels into account in the neural network part. This is done by firstly obtaining a word-level position feature using PubMedBERT together with a convolution neural network, secondly, getting a phrase-level key path feature using a dependency parsing tree, thirdly, using PubMedBERT with an attention mechanism to obtain a sentence-level language feature, and finally, fusing these three kinds of representation into a synthesized feature. We also extract a knowledge feature from a drug knowledge graph which takes just a few minutes to construct, then concatenate the synthesized feature with the knowledge feature, feed the result into a multi-layer perceptron and obtain the result by a softmax classifier. In order to achieve a good integration of the synthesized feature and the knowledge feature, we train the model using a novel multifocal loss function, KGE-MFL, which is based on a knowledge graph embedding. Finally we attain state-of-the-art results on the DDIExtraction 2013 dataset (micro F-score 86.24%) and on the ChemProt dataset (micro F-score 77.75%), which proves our framework to be effective for biomedical relation extraction tasks. In particular, we fill the performance gap (more than 5.57%) between methods that rely on and do not rely on knowledge graph embedding on the DDIExtraction 2013 corpus, when predicting the "Int"type. The implementation code is available at https://github.com/NWU-IPMI/DDIE-KGE-MFL.
KW - drug-drug interactions
KW - imbalance problem
KW - knowledge graph
KW - pubmedbert
UR - http://www.scopus.com/inward/record.url?scp=85140832159&partnerID=8YFLogxK
U2 - 10.1145/3511808.3557318
DO - 10.1145/3511808.3557318
M3 - Conference contribution
AN - SCOPUS:85140832159
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 884
EP - 893
BT - CIKM 2022 - Proceedings of the 31st ACM International Conference on Information and Knowledge Management
PB - Association for Computing Machinery
T2 - 31st ACM International Conference on Information and Knowledge Management, CIKM 2022
Y2 - 17 October 2022 through 21 October 2022
ER -