Font Size: a A A

Research On Mutation-disease Relation Extraction From Biomedical Literature

Posted on:2021-04-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y W SongFull Text:PDF
GTID:2428330626960368Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Mutation information is closely related to some complex diseases,and is an important research object in the research of disease and drug discovery.Nowadays,the number of biomedical literatures is growing rapidly,and to mine the massive mutation information,the mutation information extraction technology becomes the focus of attention.It can automatically extract information related to mutations from unstructured texts of biomedical literature and translated them into structured data,which is convenient for research and management.Mutation information extraction research in this paper includes mutation named entity recognition and mutation-disease relation extraction.Among the current mutation named entity recognition methods,the method based on,conditional random field is the mainstream method.However,it relies on complex and timeconsuming feature engineering.To solve this problem,we propose a method based on character-level convolutional neural network,called CharCNN-CNN-CRF,for mutation named entity recognition from biomedical literature.In this method,we utilize a multi-window convolutional neural network to obtain the character-level word representation,then we encode the context information with a multi-layer convolutional neural network and obtain the label sequence through the conditional random field layer.The experimental results show that the proposed method can recognize the mutation entity effectively and efficiently with only random initialized character vectors as the input.The CharCNN-CNN-CRF method achieves state-ofthe-art results on both the tmVar and MutationFinder datasets with F-scores of 88.34% and 93.57%,respectively.Most current document-level mutation-disease relation extraction methods are based on classification approaches,and suffer the complex engineering and the lack of the ability to extract inter-sentential relations.To solve this problem,we regard the document-level mutationdisease relation extraction as a sequence tagging task and propose a neural network-based method called Star-BiLSTM-LAN.Star-BiLSTM-LAN combines the star transformer and the long short term memory network and,therefore,achieves a strong ability to capture both semantic and syntactic information at document level from different aspects,which can extract both intra-sentential and inter-sentential relations.In addition,the label attention network is applied as the decoder to learn the transferring rules of labels and is more efficient than the conditional random field.Star-BiLSTM-LAN was evaluated on EMU BCa and PCa,and achieves the state-of-the-art F-scores of 89.20% and 90.43%,respectively.According to the above research,we developed a mutation information extraction system based on Browser/Server mode and Flask framework.The client of the system is a browser and exchange data with server through socket network communication,and the server utilizes StarBiLSTM-LAN to extract information.The mutation information system we developed allows users to enter biomedical text on the home page and submit it to the server to extract mutation entities and their related disease entities,and finally visualizes the corresponding results on the displaying page.
Keywords/Search Tags:Mutation, Deep learning, Named Entity Recognition, Relation Extraction, Sequence Tagging
PDF Full Text Request
Related items